XML Viewer - p06-1055

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1055_metho.xml
Size: 11,078 bytes
Last Modified: 2025-10-06 14:10:19
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1055">
  <Title>Learning Accurate, Compact, and Interpretable Tree Annotation</Title>
  <Section position="5" start_page="436" end_page="439" type="metho">
    <SectionTitle>
3 Analysis
</SectionTitle>
    <Paragraph position="0"> So far, we have presented a split-merge method for learning to iteratively subcategorize basic symbols like NP and VP into automatically induced subsymbols (subcategories in the original sense of Chomsky (1965)). This approach gives parsing accuracies of up to 90.7% on the development set, substantially higher than previous symbol-splitting approaches, while starting from an extremely simple base grammar. However, in general, any automatic induction system is in danger of being entirely uninterpretable. In this section, we examine the learned grammars, discussing what is learned. We focus particularly on connections with the linguistically motivated annotations of Klein and Manning (2003), which we do generally recover.</Paragraph>
    <Paragraph position="1"> Inspecting a large grammar by hand is difficult, but fortunately, our baseline grammar has less than 100 nonterminal symbols, and even our most complicated grammar has only 1043 total (sub)symbols. It is there- null fore relatively straightforward to review the broad behavior of a grammar. In this section, we review a randomly-selected grammar after 4 SM cycles that produced an F1 score on the development set of 89.11. We feel it is reasonable to present only a single grammar because all the grammars are very similar. For example, after 4 SM cycles, the F1 scores of the 4 trained grammars have a variance of only 0.024, which is tiny compared to the deviation of 0.43 obtained by Matsuzaki et al. (2005)). Furthermore, these grammars allocate splits to nonterminals with a variance of only 0.32, so they agree to within a single latent state.</Paragraph>
    <Section position="1" start_page="437" end_page="438" type="sub_section">
      <SectionTitle>
3.1 Lexical Splits
</SectionTitle>
      <Paragraph position="0"> One of the original motivations for lexicalization of parsers is the fact that part-of-speech (POS) tags are usually far too general to encapsulate a word's syntactic behavior. In the limit, each word may well have its own unique syntactic behavior, especially when, as in modern parsers, semantic selectional preferences are lumped in with traditional syntactic trends. However, in practice, and given limited data, the relationship between specific words and their syntactic contexts may be best modeled at a level more fine than POS tag but less fine than lexical identity.</Paragraph>
      <Paragraph position="1"> In our model, POS tags are split just like any other grammar symbol: the subsymbols for several tags are shown in Table 1, along with their most frequent members. In most cases, the categories are recognizable as either classic subcategories or an interpretable division of some other kind.</Paragraph>
      <Paragraph position="2"> Nominal categories are the most heavily split (see Table 2), and have the splits which are most semantic in nature (though not without syntactic correlations).</Paragraph>
      <Paragraph position="3"> For example, plural common nouns (NNS) divide into the maximum number of categories (16). One category consists primarily of dates, whose typical parent is an NP subsymbol whose typical parent is a root S, essentially modeling the temporal noun annotation discussed in Klein and Manning (2003). Another category specializes in capitalized words, preferring as a parent an NP with an S parent (i.e. subject position).</Paragraph>
      <Paragraph position="4"> A third category specializes in monetary units, and so on. These kinds of syntactico-semantic categories are typical, and, given distributional clustering results like those of Schuetze (1998), unsurprising. The singular nouns are broadly similar, if slightly more homogenous, being dominated by categories for stocks and trading. The proper noun category (NNP, shown) also splits into the maximum 16 categories, including months, countries, variants of Co. and Inc., first names, last names, initials, and so on.</Paragraph>
      <Paragraph position="5"> Verbal categories are also heavily split. Verbal sub-categories sometimes reflect syntactic selectional preferences, sometimes reflect semantic selectional preferences, and sometimes reflect other aspects of verbal syntax. For example, the present tense third person verb subsymbols (VBZ) are shown. The auxiliaries get three clear categories: do, have, and be (this pattern repeats in other tenses), as well a fourth category for the ambiguous 's. Verbs of communication (says) and  our split-merge procedure after 6 SM cycles propositional attitudes (beleives) that tend to take inflected sentential complements dominate two classes, while control verbs (wants) fill out another.</Paragraph>
      <Paragraph position="6"> As an example of a less-split category, the superlative adjectives (JJS) are split into three categories, corresponding principally to most, least, and largest, with most frequent parents NP, QP, and ADVP, respectively. The relative adjectives (JJR) are split in the same way. Relative adverbs (RBR) are split into a different three categories, corresponding to (usually metaphorical) distance (further), degree (more), and time (earlier). Personal pronouns (PRP) are well-divided into three categories, roughly: nominative case, accusative case, and sentence-initial nominative case, which each correlate very strongly with syntactic position. As another example of a specific trend which was mentioned by Klein and Manning (2003), adverbs (RB) do contain splits for adverbs under ADVPs (also), NPs (only), and VPs (not).</Paragraph>
      <Paragraph position="7"> Functional categories generally show fewer splits, but those splits that they do exhibit are known to be strongly correlated with syntactic behavior. For example, determiners (DT) divide along several axes: definite (the), indefinite (a), demonstrative (this), quantificational (some), negative polarity (no, any), and various upper- and lower-case distinctions inside these types. Here, it is interesting to note that these distinctions emerge in a predictable order (see Figure 2 for DT splits), beginning with the distinction between demonstratives and non-demonstratives, with the other distinctions emerging subsequently; this echoes the result of Klein and Manning (2003), where the authors chose to distinguish the demonstrative constrast, but not the additional ones learned here.</Paragraph>
      <Paragraph position="8"> Another very important distinction, as shown in Klein and Manning (2003), is the various subdivisions in the preposition class (IN). Learned first is the split between subordinating conjunctions like that and proper prepositions. Then, subdivisions of each emerge: wh-subordinators like if, noun-modifying prepositions like of, predominantly verb-modifying ones like from, and so on.</Paragraph>
      <Paragraph position="9">  latent annotations.</Paragraph>
      <Paragraph position="10"> many classical distinctions not specifically mentioned or modeled in previous work. For example, the whdeterminers (WDT) split into one class for that and another for which, while the wh-adverbs align by reference type: event-based how and why vs. entity-based when and where. The possesive particle (POS) has one class for the standard 's, but another for the plural-only apostrophe. As a final example, the cardinal number nonterminal (CD) induces various categories for dates, fractions, spelled-out numbers, large (usually financial) digit sequences, and others.</Paragraph>
    </Section>
    <Section position="2" start_page="438" end_page="439" type="sub_section">
      <SectionTitle>
3.2 Phrasal Splits
</SectionTitle>
      <Paragraph position="0"> Analyzing the splits of phrasal nonterminals is more difficult than for lexical categories, and we can merely give illustrations. We show some of the top productions of two categories in Table 3.</Paragraph>
      <Paragraph position="1"> A nonterminal split can be used to model an otherwise uncaptured correlation between that symbol's external context (e.g. its parent symbol) and its internal context (e.g. its child symbols). A particularly clean example of a split correlating external with internal contexts is the inverted sentence category (SINV), which has only two subsymbols, one which usually has the ROOT symbol as its parent (and which has sentence final puncutation as its last child), and a second subsymbol which occurs in embedded contexts (and does not end in punctuation). Such patterns are common, but often less easy to predict. For example, possesive NPs get two subsymbols, depending on whether their possessor is a person / country or an organization. The external correlation turns out to be that people and countries are more likely to possess a subject NP, while organizations are more likely to possess an object NP.</Paragraph>
      <Paragraph position="2"> Nonterminal splits can also be used to relay information between distant tree nodes, though untangling this kind of propagation and distilling it into clean examples is not trivial. As one example, the subsymbol S-12 (matrix clauses) occurs only under the ROOT symbol. S-12's children usually include NP-8, which in turn usually includes PRP-0, the capitalized nominative pronouns, DT-{1,2,6} (the capitalized determin- null ers), and so on. This same propagation occurs even more frequently in the intermediate symbols, with, for example, one subsymbol of NP symbol specializing in propagating proper noun sequences.</Paragraph>
      <Paragraph position="3"> Verb phrases, unsurprisingly, also receive a full set of subsymbols, including categories for infinitive VPs, passive VPs, several for intransitive VPs, several for transitive VPs with NP and PP objects, and one for sentential complements. As an example of how lexical splits can interact with phrasal splits, the two most frequent rewrites involving intransitive past tense verbs (VBD) involve two different VPs and VBDs: VP-14 VBD-13 and VP-15 - VBD-12. The difference is that VP-14s are main clause VPs, while VP-15s are subordinate clause VPs. Correspondingly, VBD-13s are verbs of communication (said, reported), while VBD12s are an assortment of verbs which often appear in subordinate contexts (did, began).</Paragraph>
      <Paragraph position="4"> Other interesting phenomena also emerge. For example, intermediate symbols, which in previous work were very heavily, manually split using a Markov process, end up encoding processes which are largely Markov, but more complex. For example, some classes of adverb phrases (those with RB-4 as their head) are 'forgotten' by the VP intermediate grammar. The relevant rule is the very probable VP-2 - VP-2 ADVP-6; adding this ADVP to a growing VP does not change the VP subsymbol. In essense, at least a partial distinction between verbal arguments and verbal adjucts has been learned (as exploited in Collins (1999), for example).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML