File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2182_intro.xml
Size: 3,380 bytes
Last Modified: 2025-10-06 14:06:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2182"> <Title>Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction</Title> <Section position="3" start_page="0" end_page="1110" type="intro"> <SectionTitle> 2 Noun Co-Occurrence </SectionTitle> <Paragraph position="0"> The first question that must be answered in investigating this task is why one would expect it to work at all. Why would one expect that members of the same semantic category would co-occur in discourse? In the word sense disambiguation task, no such claim is made: words can serve their disambiguating purpose regardless of part-of-speech or semantic characteristics. In motivating their investigations, Riloff and Shepherd (henceforth R~S) cited several very specific noun constructions in which co-occurrence between nouns of the same semantic class would be expected, including conjunctions (cars and trucks), lists (planes, trains, and automobiles), appositives (the plane, a twin-engined Cessna.) and noun compounds (pickup truck).</Paragraph> <Paragraph position="1"> Our algorithm focuses exclusively on these constructions. Because the relationship between nouns in a compound is quite different than that between nouns in the other constructions, the algorithm consists of two separate components: one to deal with conjunctions, lists, and appositives; and the other to deal with noun compounds. All compound nouns in the former constructions are represented by the head of the compound. We made the simplifying assumptions that a compound noun is a string of consecutive nouns (or, in certain cases, adjectives - see discussion below), and that the head of the compound is the rightmost noun.</Paragraph> <Paragraph position="2"> To identify conjunctions, lists, and appositives, we first parsed the corpus, using an efficient statistical parser (Charniak et al., 1998), trMned on the Penn Wall Street Journal Tree-bank (Marcus et al., 1993). We defined co-occurrence in these constructions using the standard definitions of dominance and precedence. The relation is stipulated to be transitive, so that all head nouns in a list co-occur with each other (e.g. in the phrase planes, trains, and automobiles all three nouns are counted as co-occuring with each other). Two head nouns co-occur in this algorithm if they meet the following four conditions: 1. they are both dominated by a common NP node 2. no dominating S or VP nodes are dominated by that same NP node 3. all head nouns that precede one, precede the other 4. there is a comma or conjunction that precedes one and not the other In contrast, R&S counted the closest noun to the left and the closest noun to the right of a head noun as co-occuring with it. Consider the following sentence from the MUC-4 (1992) corpus: &quot;A cargo aircraft may drop bombs and a truck may be equipped with artillery for war.&quot; In their algorithm, both cargo and bombs would be counted as co-occuring with aircraft. In our algorithm, co-occurrence is only counted within a noun phrase, between head nouns that are separated by a comma or conjunction. If the sentence had read: &quot;A cargo aircraft, fighter plane, or combat helicopter ...&quot;, then aircraft, plane, and helicopter would all have counted as co-occuring with each other in our algorithm.</Paragraph> </Section> class="xml-element"></Paper>