File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1068_metho.xml
Size: 3,479 bytes
Last Modified: 2025-10-06 14:14:58
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1068"> <Title>Japanese Morphological Analyzer using Word Co-occurrence -- JTAG- Takeshi FUCHI NTT Information and Communication Systems Laboratories</Title> <Section position="3" start_page="409" end_page="410" type="metho"> <SectionTitle> 2 Overview of our system </SectionTitle> <Paragraph position="0"> We developed the Japanese morphological analyzer, JTAG, paying attention to simple algorithm, straightforward adjustment, and flexible grammar.</Paragraph> <Paragraph position="1"> The features of JTAG are the followings. * An attribute value is an atom.</Paragraph> <Paragraph position="2"> In our system, each word has several attribute values. An attribute value is limited so as not to have structure. Giving an attribute value to words is equivalent to naming the words as a group. * New attribute values can be introduced easily. An attribute value is a simple character string. When a new attribute value is required, the user writes a new string in the attribute field of a record in a dictionary.</Paragraph> <Paragraph position="3"> * The number of attribute values is unlimited. * A part-of-speech is a kind of attribute value. * Grammar is a set of connection rules.</Paragraph> <Paragraph position="4"> Grammar is implemented with connection rules between attribute values. List 1 is an example 2. One connection rule is written in one line. The fields are separated by commas. Attribute values of a word on the left are written in the first field. Attribute values of a word on the right are written in the second field. In the last field, the cost 3 of the rule is written. Attribute values are separated by colons. A minus sign '-' means negation.</Paragraph> <Paragraph position="5"> For example, the fn'st rule shows that a word with 'Noun' can be followed by a word with The second rule shows that a word with 'Noun' and 'Name' can be followed by a word with 'Postfix' and 'Noun'. The cost is 100. The third rule shows that a word that has 'Noun' and does not have 'Name' can be followed by a word with 'Postfix' and 'Noun'. The cost is 90.</Paragraph> <Paragraph position="6"> Only the word '&quot;C' has the combination of 'Copula' and 'de', so the fourth rule is specific to * The co-occurrence of words.</Paragraph> <Paragraph position="7"> In our system, the sequence of words that includes the maximum number of co-occurrence of words is selected. Table I shows examples of records in a dictionary.</Paragraph> <Paragraph position="8"> '~' means 'amount', 'frame', 'forehead' or a human name 'Gaku'. In the co-occurrence field, words are presented directly. If there are no co-occurrence words in a sentence that includes '~\[~', 'amount' is selected because its cost is the smallest. If ',~'(picture) is in the sentence, 'frame' is selected.</Paragraph> </Section> <Section position="4" start_page="410" end_page="410" type="metho"> <SectionTitle> * Selection Algorithm </SectionTitle> <Paragraph position="0"> JTAG selects the correct sequence of words using connective-cost, the number of cooccurrences, the priority of words, and the length of words. The precise description of the algolithm is shown in the Appendix.</Paragraph> <Paragraph position="1"> This algolithrn is too simple to analyze Japanese sentences perfectly. However, it is sufficient in practice.</Paragraph> <Paragraph position="2"> sequence is done by the co-occurrence of words.</Paragraph> </Section> class="xml-element"></Paper>