File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2121_metho.xml
Size: 19,628 bytes
Last Modified: 2025-10-06 14:10:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2121"> <Title>HAL-based Cascaded Model for Variable-Length Semantic Pattern Induction from Psychiatry Web Resources</Title> <Section position="5" start_page="946" end_page="949" type="metho"> <SectionTitle> 3 HAL Space Construction </SectionTitle> <Paragraph position="0"> The HAL model represents each word in the vocabulary using a vector representation. Each dimension of the vector is a weight representing the strength of association between the target word and its context word. The weights are computed by applying an observation window of length l over the corpus. All words within the window are considered as co-occurring with each other. Thus, for any two words of distance d within the window, the weight between them is computed as 1ld[?] + . Figure 2 shows an example. The HAL space views the corpus as a sequence of words. Thus, after moving the window by one word increment over the whole corpus, the HAL space is constructed. The resultant HAL space is an N Nx matrix, where N is the vocabulary size. In addition, each word in the HAL space is called a concept. Table 1 presents the HAL space for the example text &quot;Two years ago, I lost my parents.&quot;</Paragraph> <Section position="1" start_page="946" end_page="947" type="sub_section"> <SectionTitle> 3.1 Representation of a Single Concept </SectionTitle> <Paragraph position="0"> For each concept in Table 1, the corresponding row vector represents its left context information, i.e., the weights of the words preceding it.</Paragraph> <Paragraph position="1"> Similarly, the corresponding column vector represents its right context information. Accordingly, each concept can be represented by a pair of vectors. That is,</Paragraph> <Paragraph position="3"> left right icc left left left right right right ct ct ct ct ct ct</Paragraph> <Paragraph position="5"> v represent the vectors of the left context information and right context information of a concept</Paragraph> <Paragraph position="7"> space.</Paragraph> <Paragraph position="8"> the weight of the j-th dimension ( j t ) of a vector, and N is the dimensionality of a vector, i.e., vocabulary size. The conceptual representation is depicted in Figure 3.</Paragraph> <Paragraph position="9"> The weighting scheme of the HAL model is frequency-based. For some extremely infrequent words, we consider them as noises and remove them from the vocabulary. On the other hand, a high frequent word tends to get a higher weight, but this does not mean the word is informative, because it may also appear in many other vectors. Thus, to measure the informativeness of a word, the number of the vectors the word appears in should be taken into account. In principle, the more vectors the word appears in, the less information it carries to discriminate the vectors. Here we use a weighting scheme analogous to TF-IDF (Baeza-Yates and Ribeiro-Neto, 1999) to reweight the dimensions of each vector, as described in Equation (2).</Paragraph> <Paragraph position="11"> N denotes the total number of vectors,</Paragraph> <Paragraph position="13"> as the dimension. After each dimension is reweighted, the HAL space is transformed into a probabilistic framework. Accordingly, each weight can be redefined as</Paragraph> <Paragraph position="15"/> </Section> <Section position="2" start_page="947" end_page="947" type="sub_section"> <SectionTitle> 3.2 Concept Combination </SectionTitle> <Paragraph position="0"> A semantic pattern is constituted by a set of concepts, thus it can be represented through concept combination over the HAL space. This forms a new concept in the HAL space. Let</Paragraph> <Paragraph position="2"> spc c= be a semantic pattern with S constituent concepts, i.e., length S. The concept combination is defined as</Paragraph> <Paragraph position="4"> where [?] denotes the symbol representing the combination operator over the HAL space,</Paragraph> <Paragraph position="6"> denotes a new concept generated by the concept combination. The new concept is the representation of a semantic pattern, also a vector representation. That is,</Paragraph> <Paragraph position="8"> The combination operator, [?] , is implemented by the product of the weights of the constituent concepts, described as follows.</Paragraph> <Paragraph position="9"> sp and sp , respectively, and ( , )Dist ii represents the distance between two semantic patterns. The main steps in the CIP include the initial set generation, distance measure, and relevance feedback.</Paragraph> </Section> <Section position="3" start_page="947" end_page="948" type="sub_section"> <SectionTitle> 4.1 Initial Set Generation </SectionTitle> <Paragraph position="0"> The initial set for a particular length contains a set of semantic patterns to be induced, i.e., the search space. Reducing the search space would be helpful for speeding up the induction process, especially for inducing those patterns with a larger length. For this purpose, we consider that the words and the semantic patterns similar to a given seed pattern are the better candidates for creating the initial sets. Therefore, we generate quality concepts, a set of semantically related words to a seed pattern, as the basis to create the initial set for each length. Thus, each seed pattern will be associated with a set of quality concepts. In addition, the better semantic patterns induced in the previous stage are also considered. The goodness of words and semantic patterns is measured by their distance to a seed pattern.</Paragraph> <Paragraph position="1"> Here, a word is considered as a quality concept if its distance is smaller than the average distance of the vocabulary. Similarly, only the semantic patterns with a distance smaller than the average distance of all semantic patterns in the previous stage are preserved to the next stage. By the way, the semantically unrelated patterns, possibly noisy patterns, will not be propagated to the next stage, and the search space can also be reduced.</Paragraph> <Paragraph position="2"> The principles of creating the initial sets of semantic patterns are summarized as follows.</Paragraph> <Paragraph position="3"> * In the beginning stage, the aim is to create the initial set for the semantic patterns with length two. Thus, the initial set is the all possible combinations of two quality concepts.</Paragraph> <Paragraph position="4"> * In the latter stages, each initial set is created by adding a quality concept to each of the better semantic patterns induced in the previous stage.</Paragraph> </Section> <Section position="4" start_page="948" end_page="948" type="sub_section"> <SectionTitle> 4.2 Distance Measure </SectionTitle> <Paragraph position="0"> The distance measure is to measure the distance between the seed patterns and semantic patterns to be induced. Let</Paragraph> <Paragraph position="2"> spcc= be a given seed pattern, their distance is defined as</Paragraph> <Paragraph position="4"> Dist c c[?][?] denotes the distance between two semantic patterns in the HAL space.</Paragraph> <Paragraph position="5"> As mentioned earlier, after concept combination, a semantic pattern becomes a new concept in the HAL space, which means the semantic pattern can be represented by its left and right contexts. Thus, the distance between two semantic patterns can be computed through their context distance.</Paragraph> <Paragraph position="6"> Equation (8) thereby can be written as</Paragraph> <Paragraph position="8"> Because the weights of the vectors are represented using a probabilistic framework, each vector of a concept can be considered as a probabilistic distribution of the context words. Accordingly, we use the Kullback-Liebler (KL) distance (Manning and Schutze, 1999) to compute the distance between two probabilistic distributions, as shown in the following.</Paragraph> <Paragraph position="10"> where ( )D ii denotes the KL distance between two probabilistic distributions. When Equation (10) is ill-conditioned, i.e., zero denominator, the denominator will be set to a small value (10 ). For the consideration of a symmetric distance, we use the divergence measure, shown as follows.</Paragraph> <Paragraph position="12"> By this way, the distance between two probabilistic distributions can be computed by their KL divergence. Thus, Equation (9) becomes</Paragraph> <Paragraph position="14"> After each semantic pattern is evaluated, a ranked list is produced for relevance judgment.</Paragraph> </Section> <Section position="5" start_page="948" end_page="949" type="sub_section"> <SectionTitle> 4.3 Relevance Feedback </SectionTitle> <Paragraph position="0"> In the induction process, some non-relevant semantic patterns may have smaller distance to a seed pattern, which may decrease the precision of the final results. To overcome the problem, one possible solution is to incorporate expert knowledge to guide the induction process. For this purpose, we use the technique of relevance feedback. In the IR community, the relevance feedback is to enhance the original query from the users by indicating which retrieved documents are relevant. For our task, the relevance feedback is applied after each semantic pattern is evaluated. Then, the health professionals judge which semantic patterns are relevant to the seed pattern. In practice, only the top n semantic patterns are presented for relevance judgment. Finally, the semantic patterns judged as relevant are considered to form the relevant set, and the others form the non-relevant set. According to the relevant and non-relevant information, the seed pattern can be refined to be more similar to the relevant set, such that the induction process can induce more relevant patterns and move away from noisy patterns in the future iterations.</Paragraph> <Paragraph position="1"> The refinement of the seed pattern is to adjust its context distributions (left and right). Such adjustment is based on re-weighting the dimensions of the context vectors of the seed pattern. The dimensions more frequently regarded as relevant patterns are more significant for identifying relevant patterns. Hence, such dimensions of the seed pattern should be emphasized. The significance of a dimension is measured as follows.</Paragraph> <Paragraph position="2"> c[?] , respectively. The higher the ratio, the more significant the dimension is. In order to smooth () k Sig t to the range from zero to one, the following formula is used:</Paragraph> <Paragraph position="4"> Once the context vectors of the seed pattern are re-weighted, they are also transformed into a probabilistic form using Equation (3). The refined seed pattern will be taken as the reference basis in the next iteration. The relevance feed-back is performed iteratively until no more semantic patterns are judged as relevant or a maximum number of iteration is reached. At the same time, the induction process for a particular length is also stopped. The whole CIP process is stopped until the seed patterns are exhausted</Paragraph> </Section> </Section> <Section position="6" start_page="949" end_page="949" type="metho"> <SectionTitle> 5 Experimental Results </SectionTitle> <Paragraph position="0"> To evaluate the performance of the CIP, we built a prototype system and provided a set of seed patterns. The seed patterns were collected by referring to the well-defined instruments for assessing negative life events (Brostedt and Pedersen, 2003; Pagano et al., 2004). A total of 20 seed patterns were selected by the health professionals. Then, the CIP randomly selects one seed pattern per run without replacement from the seed set, and iteratively induces relevant patterns from the psychiatry web corpora. The psychiatry web corpora used here include some professional mental health web sites, such as PsychPark (http://www.psychpark.org) (Bai, 2001) and John Tung Foundation (http://www.jtf.org.tw).</Paragraph> <Paragraph position="1"> In the following sections, we describe some experiments to in turn examine the effect of using relevance feedback or not, and the coverage on real data using the semantic patterns induced by different approaches. Because the semantic patterns with a length larger than 4 are very rare to express a negative life event, we limit the length k to the range of 2 to 4.</Paragraph> <Section position="1" start_page="949" end_page="949" type="sub_section"> <SectionTitle> 5.1 Evaluation on Relevance Feedback </SectionTitle> <Paragraph position="0"> The relevance feedback employed in this study provides the relevant and non-relevant information for the CIP so that it can refine the seed pattern to induce more relevant patterns. The relevance judgment is carried out by three experienced psychiatric physicians. For practical consideration, only the top 30 semantic patterns are presented to the physicians. During relevance judgment, a majority vote mechanism is used to handle the disagreements among the physicians.</Paragraph> <Paragraph position="1"> That is, a semantic pattern is considered as relevant if any two or more physicians judged it as relevant. Finally, the semantic patterns with majority votes are obtained to form the relevant set.</Paragraph> <Paragraph position="2"> To evaluate the effectiveness of the relevance feedback, we construct three variants of the CIP, RF(5), RF(10), and RF(20), implemented by applying the relevance feedback for 5, 10, and 20 iterations, respectively. These three CIP variants are then compared to the one without using the relevance feedback, denoted as RF(-) . We use the evaluation metric, precision at 30 (prec@30), over all seed patterns to examine if the relevance feedback can help the CIP induce more relevant patterns. For a particular seed pattern, prec@n is computed as the number of relevant semantic patterns ranked in the top n of the ranked list, divided by n. Table 2 presents the results for k=2.</Paragraph> <Paragraph position="3"> The results reveal that the relevance feedback can help the CIP induce more relevant semantic patterns. Another observation indicates that applying the relevance feedback for more iterations for different number of iterations or not.</Paragraph> <Paragraph position="4"> can further improve the precision. However, it is usually impractical for experts to involve in the guiding process for too many iterations. Consequently, we further consider pseudo-relevance feedback to automate the guiding process. The pseudo-relevance feedback carries out the relevance judgment based on the assumption that the top ranked semantic patterns are more likely to be the relevant ones. Thus, this approach usually relies on setting a threshold or selecting only the top n semantic patterns to form the relevant set.</Paragraph> <Paragraph position="5"> However, determining the threshold is not trivial, and the threshold may be different with different seed patterns. Therefore, we apply the pseudo-relevance feedback only after certain expertguided iterations, rather than applying it throughout the induction process. The notion is that we can get a more reliable threshold value by observing the behavior of the relevant semantic patterns in the ranked list for a few iterations. To further examine the effectiveness of the combined approach, we additionally construct a CIP variant, RF(10)+pseudo, by applying the pseudo-relevance feedback after 10 expertguided iterations. The threshold is determined by the physicians during their judgments in the 10th iteration. The results are presented in Figure 4. The precision of RF(10)+pseudo is inferior to that of RF(20) before the 25-th iteration. Meanwhile, after the 30-th iteration, RF(10)+pseudo achieves higher precision than the other methods.</Paragraph> <Paragraph position="6"> This indicates that the pseudo-relevance feed-back can also contribute to semantic pattern induction in the stage without expert intervention.</Paragraph> </Section> <Section position="2" start_page="949" end_page="949" type="sub_section"> <SectionTitle> 5.2 Coverage on Real Data </SectionTitle> <Paragraph position="0"> The final results of the semantic patterns are the relevant sets of the last iteration produced by RF(10)+pseudo, denoted as We compare</Paragraph> </Section> </Section> <Section position="7" start_page="949" end_page="949" type="metho"> <SectionTitle> CIP </SectionTitle> <Paragraph position="0"> SP to those created by a corpus-based approach. The corpus-based approach relies on an annotated domain corpus and a learning mechanism to induce the semantic patterns. Thus, we collected 300 consultation records from the PsychPark as the domain corpus, and each sentence in the corpus is annotated with a negative life event or not by the three physicians. After the annotation process, the sentences with negative life events are together to form the training set. Then, we adopt Mutual Information (Manning and Schutze, 1999) to learn variable-length semantic patterns. The mutual information between k words is defined as Pw is the probability of a single word occurring in the training set. Higher mutual information indicates that the k words are more likely to form a semantic pattern of length k.</Paragraph> <Paragraph position="1"> Here the length k also ranges from 2 to 4. For each k, we compute the mutual information for all possible combinations of words in the training set, and those with their mutual information above a threshold are selected to be the final results of the semantic patterns, denoted as</Paragraph> </Section> <Section position="8" start_page="949" end_page="949" type="metho"> <SectionTitle> MI SP . </SectionTitle> <Paragraph position="0"> In order to obtain reliable mutual information values, only words with at least the minimum number of occurrences (>5) are considered.</Paragraph> <Paragraph position="1"> To examine the coverage of</Paragraph> </Section> <Section position="9" start_page="949" end_page="951" type="metho"> <SectionTitle> CIP SP and MI SP on </SectionTitle> <Paragraph position="0"> real data, 15 human subjects are involved in creating a test set. The subjects provide their experienced negative life events in the form of natural language sentences. A total of 69 sentences are collected to be the test set, of which 39 sentences contain a semantic pattern of length two, 21 sentences contain a semantic pattern of length three, and 9 sentences contain a semantic pattern of length four. The evaluation metric used is out-ofpattern (OOP) rate, a ratio of unseen patterns occurring in the test set. Thus, the OOP can be defined as the number of test sentences containing the semantic patterns not occurring in the training set, divided by the total number of sentences in the test set. Table 4 presents the results. approach.</Paragraph> <Paragraph position="1"> The results show that the OOP of lack of a large enough domain corpus with annotated life events. In this circumstance, many semantic patterns, especially for those with a larger length, could not be learned, because the number of their occurrences would be very rare in the training set. With no doubt, one could collect a large amount of domain corpus to reduce the OOP rate. However, increasing the amount of domain corpus also increases the amount of annotation and computation complexity. Our approach, instead, exploits the quality concepts to reduce the search space, also applies the relevance feedback to guide the induction process, thus it can achieve better results with timelimited constraints.</Paragraph> </Section> class="xml-element"></Paper>