File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2121_intro.xml

Size: 2,335 bytes

Last Modified: 2025-10-06 14:03:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2121">
  <Title>HAL-based Cascaded Model for Variable-Length Semantic Pattern Induction from Psychiatry Web Resources</Title>
  <Section position="4" start_page="945" end_page="946" type="intro">
    <SectionTitle>
2 Framework for Variable-Length Se-
mantic Pattern Induction
</SectionTitle>
    <Paragraph position="0"> The overall framework, as illustrated in Figure 1, is divided into two parts: the HAL model and the cascaded induction process. First of all, the HAL space is constructed for the psychiatry web corpora after word segmentation. Then, each word in HAL space is evaluated by computing its distance to a given seed pattern. A smaller distance represents that the word is more  tic pattern induction.</Paragraph>
    <Paragraph position="1"> semantically related to the seed pattern.</Paragraph>
    <Paragraph position="2"> According to the distance measure, the CIP generates quality concepts, i.e., a set of semantically related words to the seed pattern. The quality concepts and the better semantic patterns induced in the previous stage are combined to generate the initial set for each length. For example, in the beginning stage, i.e., length two, the initial set is the all possible combinations of two quality concepts. In the later stages, each initial set is generated by adding a quality concept to each of the better semantic patterns. After the initial set for a particular length is created, each semantic pattern and the seed pattern are represented in the HAL space for further computing their distance. The more similar the context distributions between two patterns, the closer they are. Once all the semantic patterns are evaluated, the relevance feedback is applied to provide a set of relevant patterns judged by the health professionals.</Paragraph>
    <Paragraph position="3"> According to the relevant information, the seed pattern can be refined to be more similar to the relevant set. The refined seed pattern will be taken as the reference basis in the next iteration. The induction process for each stage is performed iteratively until no more patterns are judged as relevant or a maximum number of iteration is reached. The relevant set produced at the last iteration is considered as the result of the semantic patterns.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML