File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1018_intro.xml
Size: 4,142 bytes
Last Modified: 2025-10-06 14:06:56
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1018"> <Title>Ordering Among Premodifiers</Title> <Section position="2" start_page="0" end_page="135" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Sequential ordering among premodifiers affects the fluency of text, e.g., &quot;large foreign financial firms&quot; or &quot;zero-coupon global bonds&quot; are desirable, while &quot;foreign large financial firms&quot; or &quot;global zero-coupon bonds&quot; sound odd. The difficulties in specifying a consistent ordering of adjectives have already been noted by linguists \[Whorf 1956; Vendler 1968\]. During the process of generating complex sentences by combining multiple clauses, there are situations where multiple adjectives or nouns modify the same head noun. The text generation system must order these modifiers in a similar way as domain experts use them to ensure fluency of the text. For example, the description of the age of a patient precedes his ethnicity and gender in medical domain as in % 50 year-old white female patient&quot;.</Paragraph> <Paragraph position="1"> Yet, general lexicons such as WordNet \[Miller et al. 1990\] and COMLEX \[Grishman et al. 1994\], do not store such information.</Paragraph> <Paragraph position="2"> In this paper, we present automated techniques for addressing this problem of determining, given two premodifiers A and B, the preferred ordering between them. Our methods rely on and generalize empirical evidence obtained from large corpora, and are evaluated objectively on such corpora. They are informed and motivated by our practical need for ordering multiple premodifiers in the MAGIC system \[Dalal et al. 1996\]. MAGIC utilizes co-ordinated text, speech, and graphics to convey information about a patient's status after coronary bypass surgery; it generates concise but complex descriptions that frequently involve four or more premodifiers in the same noun phrase.</Paragraph> <Paragraph position="3"> To demonstrate that a significant portion of noun phrases have multiple premodifiers, we extracted all the noun phrases (NPs, excluding pronouns) in a two million word corpus of medical discharge summaries and a 1.5 million word Wall Street Journal (WSJ) corpus (see Section 4 for a more detailed description of the corpora). In the medical corpus, out of 612,718 NPs, 12% have multiple premodifiers and 6% contain solely multiple adjectival premodifiers.</Paragraph> <Paragraph position="4"> In the WSJ corpus, the percentages are a little lower, 8% and 2%, respectively. These percentages imply that one in ten NPs contains multiple premodifiers while one in 25 contains just multiple adjectives.</Paragraph> <Paragraph position="5"> Traditionally, linguists study the premodifier ordering problem using a class-based approach.</Paragraph> <Paragraph position="6"> Based on a corpus, they propose various semantic classes, such as color, size, or nationality, and specify a sequential order among the classes. However, it is not always clear how to map premodifiers to these classes, especially in domain-specific applications. This justifies the exploration of empirical, corpus-based alternatives, where the ordering between A and B is determined either from direct prior evidence in the corpus or indirectly through other words whose relative order to A and B has already been established. The corpus-based approach lacks the ontological knowledge used by linguists, but uses a much larger amount of di- null rect evidence, provides answers for many more premodifier orderings, and is portable to different domains.</Paragraph> <Paragraph position="7"> In the next section, we briefly describe prior linguistic research on this topic. Sections 3 and 4 describe the methodology and corpus used in our analysis, while the results of our experiments are presented in Section 5. In Section 6, we demonstrate how we incorporated our ordering results in a general text generation system. Finally, Section 7 discusses possible improvements to our current approach.</Paragraph> </Section> class="xml-element"></Paper>