XML Viewer - x96-1057

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/x96-1057_intro.xml
Size: 3,796 bytes
Last Modified: 2025-10-06 14:06:10
<?xml version="1.0" standalone="yes"?>
<Paper uid="X96-1057">
  <Title>NTT DATA: DESCRIPTION OF THE ERIE SYSTEM USED FOR MUC-6</Title>
  <Section position="3" start_page="0" end_page="469" type="intro">
    <SectionTitle>
3 PATTERNS
</SectionTitle>
    <Paragraph position="0"> This section describes three types of patterns introduced in the previous section.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Dictionary patterns
</SectionTitle>
      <Paragraph position="0"> Majesty tags a part of speech, such as a noun or noun-suffix, as the major category of the word.</Paragraph>
      <Paragraph position="1"> Then the dictionary pattern is used to add a sub-category to the word. The sub-category, for example that it is an organization, is defined on the left side of the pattern. Words to which the sub-category is added are listed on the right side of the pattern. Figure 2 shows an example of a dictionary pattern. The words &amp;quot;~J:&amp;quot; (a corporation) and &amp;quot;~&amp;quot; (a government ministry) are tagged as nounsuffixes (SUFFIX) by Majesty, while the dictionary pattern augments it by adding ORGANIZATION as its sub-category.</Paragraph>
      <Paragraph position="3"/>
    </Section>
    <Section position="2" start_page="0" end_page="469" type="sub_section">
      <SectionTitle>
3.2 Segmentation lJatterns
</SectionTitle>
      <Paragraph position="0"> The segmentation pattern is used to further segment a word whose word boundary is given by Majesty. The word to be segmented is written on the left side of the pattern. Newly-segmented words and their parts of speech are defined in the right side of the pattern. The pattern matching conditions of the matched word can be described in parenthesis. These conditions can be the part of speech of the word, the word preceding or following the word, or the word length. The character  '_' is a wild card that can match any number of characters within the word.</Paragraph>
      <Paragraph position="1"> Figure 3 shows an example of the segmentation patterns. The first pattern divides a word &amp;quot;1~:~C/&amp;quot; (Japan and the U.S.) into &amp;quot;H&amp;quot; (Japan) and &amp;quot;)1C/&amp;quot; (the U.S.), and gives each word a NOUN-PLACE tag as the part of speech. The second pattern divides a word whose last character is &amp;quot;~tt&amp;quot; (a government minister) into &amp;quot;~11&amp;quot; and the rest of the word, if the word consists of more than three characters.</Paragraph>
      <Paragraph position="3"/>
    </Section>
    <Section position="3" start_page="469" end_page="469" type="sub_section">
      <SectionTitle>
3.3 Name recognition patterns
</SectionTitle>
      <Paragraph position="0"> The name recognition patterns recognize proper names, times, and numeric expressions that appear in the text. A pattern name is written on the left side of the pattern, and the word sequence to be searched for is defined on the right side. The defined pattern can be referred to from other patterns by using the character '$' followed by the pattern name. A pattern can be any combination of words, their parts of speech, character type, and the pattern name. Regular expressions such as '*' and '+' can also be used in the pattern. Two angle brackets on the right side of the pattern specify the first and last of the words that comprise the identified name or expression. Figure 4 shows an example of a name recognition pattern that identifies a person's name.</Paragraph>
      <Paragraph position="1">  Erie's pattern matching engine processes the patterns in the order of definition. The first pattern that matches is chosen for the string currently being processed. Thus, pattern developers must pay special attention to the order of the patterns.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML