File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/n01-1007_intro.xml

Size: 2,957 bytes

Last Modified: 2025-10-06 14:01:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="N01-1007">
  <Title>Unsupervised Learning of Name Structure From Coreference Data</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> We present two methods for the unsupervised learning of the structure of personal names as found in Wall Street Journal text. More specifically, we consider a \name&amp;quot; to be a sequence of proper nouns from a single noun-phrase (as indicated by Penn treebank-style parse trees). For example, \Defense Secretary George W. Smith&amp;quot; would be a name and we would analyze it into the components \Defense Secretary&amp;quot; (a descriptor), \George&amp;quot; (a rst name), \W.&amp;quot; (a middle name, we do not distinguish between initials and \true&amp;quot; names), and \Smith&amp;quot; (a last name). We consider two unsupervised models for learning this information. The rst simply uses a few implicit constraints governing this structure to gain a toehold on the problem  |e.g., descriptors come before rst names, which come This research was supported in part by NSF grant LIS SBR 9720368. The author would like to thank Mark Johnson and the rest of the Brown Laboratory for Linguistic Information Processing (BLLIP) for general advice and encouragement.</Paragraph>
    <Paragraph position="1"> before middle names, etc. We henceforth call this the \name&amp;quot; model. The second model also uses possible coreference information. Typically the same individual is mentioned several times in the same article (e.g., we might later encounter \Mr. Smith&amp;quot;), and the pattern of such references, and the mutual constraints among them, could very well help our unsupervised methods determine the correct structure. We call this the \coreference&amp;quot; model. We were attracted to this second model as it might o er a small example of how semantic information like coreference could help in learning structural information. null To the best of our knowledge there has not been any previous work on learning personal structure. We are aware of one previous case of unsupervised learning of lexical information from possible coreference, namely that of Ge et.</Paragraph>
    <Paragraph position="2"> al. [5] where possible pronoun coreference was used to learn the gender of nouns. In this case a program with an approximately 65% accuracy in determining the correct antecedent was used to collect information on pronouns and their possible antecedents. The gender of the pronoun was then used to suggest the gender of the noun-phrase that was proposed as the antecedent. The current work is quite di erent in both goal and methods, but similar in spirit.</Paragraph>
    <Paragraph position="3"> More generally this work is part of a growing body of work on learning language-related information from unlabeled corpora [1,2,3,8,9,10, 11].</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML