File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/w94-0107_abstr.xml

Size: 6,230 bytes

Last Modified: 2025-10-06 13:48:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W94-0107">
  <Title>Complexity of Description of Primitives: Relevance to Local Statistical Computations</Title>
  <Section position="2" start_page="56" end_page="57" type="abstr">
    <SectionTitle>
Experiments.and Results
</SectionTitle>
    <Paragraph position="0"> Table (1) shows the data required for the dependency model of supertag disambiguation. Ideally each entry would be indexed by a (word, supertag) pair but, due to sparseness of data, we have backed-off to a (POS,  supertag) pair. Each entry contains the following information. null * POS and Supertag pair.</Paragraph>
    <Paragraph position="1"> * List of + and -, representing the direction of the dependent supertags with respect to the indexed supertag. (Size of this list indicates the total number of dependent supertags required.) * Dependent supertag.</Paragraph>
    <Paragraph position="2"> * Signed number representing the direction and the ordinal position of the particular dependent supertag mentioned in the entry from the position of the indexed supertag.</Paragraph>
    <Paragraph position="3"> * A probability of occurrence of such a dependency.  The sum probability over all the dependent supcrt:ags at all ordinal positions in the same direction is one. For example, the fourth entry in the Table 1 reads that the tree a2, anchored by a verb (V), has a left, and a right dependent (-, +) and the first word to the left (-1) with the tree as serves as a dependent of the current word. The strength of this association is represented by the probabilit3/0.300.</Paragraph>
    <Paragraph position="4"> The dependency model of disambiguation works as follows. Suppose a2 is a member of the set of supertags associated with a word at position n in the sentence. The algorithm proceeds to satis|~ the dependency requirement of a2 by picking up the dependency entries for each of the directions. It picks a dependency data entry (fourth entry, say) from the database that is indexed by a2 and proceeds to sct up a path with the first word to the left that has the dependent supertag (as) as a member of its set of supertags. If the first. word that has as as a member of its set of supertags is at position m, then an arc is set up between c~ and as.. Also, the arc is verified so that it does not kite-string-tangle s with any other arcs in the path up to a2. The path probability up to a2 is incremcntcd by log0.300 to reflect the success of the match. The path probability up to as incorporates the unigram probability of as. On the other hand, if no word is found that has as as a member of its set of supertags then the entry is ignored. A successflH supertag sequence is one which assigns a supertag to each position such that STwo arcs (a,c) and (b,d) kite-string-tangle if a &lt; b &lt; c&lt;dorb&lt;a&lt;d&lt;c.</Paragraph>
    <Paragraph position="5">  ,'m'h supertag has all of its dependents and maximizes the accumulated path probability. The direction of the dcp~mdcmt supertag and the probability information are usC/.,d t.o prune the search. A more detailed and formal description of this algorithm will appear elsewhere. &amp;quot;l'l/t. implementation and testing of this model of su-I,,'rl.ag disanlbiguation is underway. Preliminary experilm,ld.s oil short fragments show a success rate of 88% i.e.a, sequence of correct supertags is assigned.</Paragraph>
    <Section position="1" start_page="57" end_page="57" type="sub_section">
      <SectionTitle>
Data Collection
</SectionTitle>
      <Paragraph position="0"> The data needed for disambiguating supertags (Sect.ion ) have been collected by parsing the Wall Street Journal s. IBM-manual and ATIS corpora using the wide-cow:rag c English grammar being developed as part of the XTAG system (XTAG Tech. Report, 1994).</Paragraph>
      <Paragraph position="1"> The parses generated for these sentences are not sub.iectcd to any kind of filtering or selection. All the derivation structures are used in the collection of the sta.l.istics.</Paragraph>
      <Paragraph position="2"> XTAG is a large ongoing project to develop a widecov,.rage grammar for English, based on the LTAG forrealism. It also serves as an LTAG grammar develolnuent system and includes a predictive left-to-right parser, a morphological analyzer and a POS tagger.</Paragraph>
      <Paragraph position="3"> The wide-coverage English grammar of the XTAG syst,.m contains 317,000 inflected items in the morphology (21;L000 h~r nouns amt 46,500 for verbs among others) and 37,00(I eul.ries in the syntactic lexicon. The syntactic h,xicon associates words with the trees that they an,'l,,r. There arc 385 l.rt'cs in all, in the grammar which is ,',,,Ul,.scd of 411 dilG'rcut sul~catcgorization frames. '~S~.ntuuces of length &lt;_ 15 words.</Paragraph>
      <Paragraph position="4"> Each word in the syntactic lexicon, on the average, depending on the standard POS of the word, is an anchor for about 8 to 40 elementary trees.</Paragraph>
      <Paragraph position="5"> Conclusion In this paper we have shown that increasing the complexity of descriptions of primitive objects, lexical items in the linguistic context, enables more complex constraints to be applied locally. However, increasing the complexity of descriptions greatly increases the number of such descriptions for the primitive object. In a lexicalized grammar such as LTAG each lexical item is associated with complex descriptions (supertags) on the average of 10 descriptions. A parser for LTAG, given a sentence, disambiguates a large set of supertags to select one supertag for each lexical item before combining them to derive a parse of the sentence. We have presented a new technique that performs the disambiguation of supertags using local information such as lexical preference and local lexical dependencies as an illustration of our main theme of the relationship of complexity of descriptions of primitives to local statistical computations. This technique, like POS disambiguation, reduces the disambiguation task that needs to be done by the parser. After the disambiguation, we have effectively completed the parse of the sentence and the parser needs 'only' to complete the adjunctions and substitutions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML