File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/a92-1018_concl.xml
Size: 5,192 bytes
Last Modified: 2025-10-06 13:56:44
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1018"> <Title>A Practical Part-of-Speech Tagger</Title> <Section position="7" start_page="137" end_page="138" type="concl"> <SectionTitle> 6 Applications </SectionTitle> <Paragraph position="0"> We have used the tagger in a number of applications. Wc describe three applications here: phrase recognition; word sense disambiguation; and grammatical function assignment. These projects are part of a research effort to use shallow analysis techniques to extract content from unrestricted text.</Paragraph> <Section position="1" start_page="137" end_page="138" type="sub_section"> <SectionTitle> 6.1 Phrase Recognition </SectionTitle> <Paragraph position="0"> We have constructed a system that recognizes simpl~ phrases when given as input the sequence of tags for a sentence. There are recognizers for noun phrases, verb groups adverbial phrases, and prepositional phrases. Each of thes~ phrases comprises a contiguous sequence of tags that satis.</Paragraph> <Paragraph position="1"> ties a simple grammar. For example, a noun phrase can b~ a unary sequence containing a pronoun tag or an arbitrar.</Paragraph> <Paragraph position="2"> ily long sequence of noun and adjective tags, possibly pre.</Paragraph> <Paragraph position="3"> ceded by a determiner tag and possibly with an embeddec possessive marker. The longest possible sequence is fount (e.g., &quot;the program committee&quot; but not &quot;the program&quot;) Conjunctions are not recognized as part of any phrase; for example, in the fragment &quot;the cats and dogs,&quot; &quot;the cats&quot; and &quot;dogs&quot; will be recognized as two noun phrases. Prepositional phrase attachment is not performed at this stage of processing. This approach to phrase recognition in some cases captures only parts of some phrases; however, our approach minimizes false positives, so that we can rely on the recognizers' results.</Paragraph> </Section> <Section position="2" start_page="138" end_page="138" type="sub_section"> <SectionTitle> 6.2 Word Sense Disamblguatlon </SectionTitle> <Paragraph position="0"> Part-of-speech tagging in and of itself is a useful tool in lexical disambiguation; for example, knowing that &quot;dig&quot; is being used as a noun rather than as a verb indicates the word's appropriate meaning. But many words have multiple meanings even while occupying the same part of speech.</Paragraph> <Paragraph position="1"> To this end, the tagger has been used in the implementation of an experimental noun homograph disambiguation algorithm \[Hearst, 1991\]. The algorithm (known as Catch-Word) performs supervised training over a large text corpus, gathering lexical, orthographic, and simple syntactic evidence for each sense of the ambiguous noun. After a period of training, CatchWord classifies new instances of the noun by checking its context against that of previously observed instances and choosing the sense for which the most evidence is found. Because the sense distinctions made are coarse, the disambiguation can be accomplished without the expense of knowledge bases or inference mechanisms.</Paragraph> <Paragraph position="2"> Initial tests resulted in accuracies of around 90% for nouns with strongly distinct senses.</Paragraph> <Paragraph position="3"> This algorithm uses the tagger in two ways: (i) to determine the part of speech of the target word (filtering out the non-noun usages) and (ii) as a step in the phrase recognition analysis of the context surrounding the noun.</Paragraph> </Section> <Section position="3" start_page="138" end_page="138" type="sub_section"> <SectionTitle> 6.3 Grammatical Function Assignment </SectionTitle> <Paragraph position="0"> The phrase recognizers also provide input to a system, Sopa \[Sibun, 1991\], which recognizes nominal arguments of verbs, specifically, Subject, Object, and Predicative Arguments. Sopa does not rely on information (such as arity or voice) specific to the particular verbs involved. The first step in assigning grammatical functions is to partition the tag sequence of each sentence into phrases. The phrase types include those mentioned in section 6.1, additional types to account for conjunctions, complementizers, and indicators of sentence boundaries, and an &quot;unknown&quot; type. After a sentence has been partitioned, each simple noun phrase is examined in the context of the phrase to its left and the phrase to its right. On the basis of this local context and a set of rules, the noun phrase is marked as a syntactic Subject, Object, Predicative, or is not marked at all. A label of Predicative is assigned only if it can be determined that the governing verb group is a form of a predicating verb (e.g., a form of &quot;be&quot;). Because this cannot always be determined, some Predicatives are labeled Objects. If a noun phrase is labeled, it is also annotated as to whether the governing verb is the closest verb group to the right or to the left. The algorithm has an accuracy of approximately 800&quot;/o in assigning grammatical functions.</Paragraph> </Section> </Section> class="xml-element"></Paper>