File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/j02-3005_metho.xml

Size: 10,149 bytes

Last Modified: 2025-10-06 14:07:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="J02-3005">
  <Title>c(c) 2002 Association for Computational Linguistics Squibs and Discussions A Note on Typing Feature Structures</Title>
  <Section position="3" start_page="390" end_page="392" type="metho">
    <SectionTitle>
2. The Problem
</SectionTitle>
    <Paragraph position="0"> XTAG is organized such that feature structures are specified in three different components of the grammar: a Tree database defines feature structures attached to tree families;aSyn database defines feature structures attached to lexically anchored trees; and a Morph database defines feature structures attached to (possibly inflected) lexical entries.</Paragraph>
    <Paragraph position="1"> As an example, consider the verb seems. This verb can anchor several trees, among which are trees of auxiliary verbs, such as the tree bVvx, depicted in Figure 1. This tree, which is common to all auxiliary verbs, is associated with the feature structure descriptions listed in Figure 1 (independently of the word that happens to anchor it).  When the tree bVvx is anchored by seems, the lexicon specifies additional constraints on the feature structures in this tree:</Paragraph>
    <Paragraph position="3"> Finally, since &amp;quot;seems&amp;quot; is an inflected form, the morphological database specifies more constraints on the node that this word instantiates, as shown in Figure 2.</Paragraph>
    <Paragraph position="4"> The actual feature structures that are associated with the lexicalized tree anchored by &amp;quot;seems&amp;quot; are the combination of the three sets of path equations. This organization leaves room for several kinds of errors, inconsistencies, and typos in feature structure manipulation. Nothing in the system can eliminate the following possible errors: Undefined features: Every grammar makes use of a finite set of features in the feature structure specification. As the features do not have to be declared, however, certain bogus features can be introduced unintentionally, either through typos or because of poor maintenance. In a grammar that has an assign-case feature, the following statement is probably erroneous:</Paragraph>
    <Paragraph position="6"> tree is associated with two feature structures, &amp;quot;top&amp;quot; (.t) and &amp;quot;bottom&amp;quot; (.b) (Vijay-Shanker and Joshi 1991; XTAG Research Group 2001). Angular brackets delimit feature paths, and slashes denote disjunctive (atomic) values.</Paragraph>
    <Paragraph position="8"> An example tree and its associated feature structure descriptions.</Paragraph>
    <Paragraph position="9"> seems seem V &lt;agr pers&gt; = 3,</Paragraph>
    <Paragraph position="11"> The morphological database entry for seems.</Paragraph>
    <Paragraph position="12"> Undefined values: The same problem can be manifested in values, rather than features. In a grammar where nom is a valid value for the assign-case feature, the following statement is probably erroneous: V.b:&lt;assign-case&gt; = non.</Paragraph>
    <Paragraph position="13"> Incompatible feature equations: The grammar designer has a notion of what paths can be equated, but this notion is not formally defined. Thus, it is possible to find erroneous path equations such as VP.b:&lt;assign-case&gt; = V.t:&lt;tense&gt;.</Paragraph>
    <Paragraph position="14"> Such cases go undetected by XTAG and result in parsing errors. For example, the statement V.b:&lt;asign-case&gt; = acc was presumably supposed to constrain the grammatical derivations to those in which the assign-case feature had the value acc. With the typo, this statement never causes unification to fail (assuming that the feature asign-case occurs nowhere else in the grammar); the result is overgeneration. On the other hand, if the statement V.b:&lt;assign-case&gt; = non is part of the lexical entry of some verb, and some derivations require that certain verbs have nom as their value of assign-case, then that verb would never be a grammatical candidate for those derivations. The result here is undergeneration.</Paragraph>
    <Paragraph position="15">  Computational Linguistics Volume 28, Number 3 Note that nothing in the above description hinges on the particular linguistic formalism or its implementation. The same problems are likely to occur in every system that manipulates untyped feature structures.</Paragraph>
  </Section>
  <Section position="4" start_page="392" end_page="393" type="metho">
    <SectionTitle>
3. Introducing Typing
</SectionTitle>
    <Paragraph position="0"> The problems discussed above are reminiscent of similar problems in programming languages; in that domain, the solution lies in typing: a stricter type discipline provides means for more compile-time checks to be performed, thus tracking potential errors as soon as possible. Fortunately, such a solution is perfectly applicable to the case of feature structures, as typed feature structures (TFSs) are well understood (Carpenter 1992). We briefly survey this concept below.</Paragraph>
    <Paragraph position="1"> TFSs are defined over a signature consisting of a set of of types (Types) and a set of features (Feats). Types are partially ordered by subsumption (denoted &amp;quot;subsetsqequal&amp;quot;). The least upper bound with respect to subsumption of t  . Each type is associated with a set of appropriate features through a function Approp: Typesx Feats - Types. The appropriate values of a feature F in a type t have to be of specified (appropriate) types. Features are inherited by subtypes: whenever F is appropriate for a type t, it is also appropriate for all the types t' such that t subsetsqequal t</Paragraph>
    <Paragraph position="3"> feature F has to be introduced by some most general type Intro(F) (and be appropriate for all its subtypes).</Paragraph>
    <Paragraph position="4"> Figure 3 graphically depicts a type signature in which greater (more specific) types are presented higher and the appropriateness specification is displayed above the types. For example, for every feature structure of type verb, the feature assign-case is appropriate, with values that are at least of type cases: Approp(verb, assigncase)=cases. null A formal introduction to the theory of TFSs is given by Carpenter (1992). Informally, a TFS over a signature &lt;Types,subsetsqequal, Feats, Approp&gt; differs from an untyped feature structure in two aspects: a TFS has a type; and the value of each feature is a TFS--there is no need for atoms in a typed system. A TFS A whose type is t is well-typed iff every feature F in A is such that Approp(t, F) is defined; every feature F in A has value of type t prime such that Approp(t, F) subsetsqequal t prime ; and all the substructures of A are well-typed. It is totally well-typed if, in addition, every feature F such that Approp(t, F) is defined occurs in A. In other words, a TFS is totally well-typed if it has all and only the features that are appropriate for its type, with appropriate values, and the same holds for all its substructures.</Paragraph>
    <Paragraph position="5"> Totally well-typed TFSs are informative and efficient to process. It might be practically difficult, however, for the writer of a grammar to specify the full information such a structure encodes. To overcome this problem, type inference algorithms have been devised that enable a system to infer a totally well-typed TFS automatically from a partial description. Partial descriptions can specify  Wintner and Sarkar A Note on Typing Feature Structures Figure 3 A simple type signature.</Paragraph>
    <Paragraph position="6"> * a feature-value pair: NP.b:case:acc * a conjunction of descriptions: V.t:(sign,assign-case:none) The inferred feature structure is the most general TFS that is consistent with the partial description. The inference fails iff the description is inconsistent (i.e., describes no feature structure). See Figure 4 for some examples of partial descriptions and the TFSs they induce, based on the signature of Figure 3.</Paragraph>
  </Section>
  <Section position="5" start_page="393" end_page="393" type="metho">
    <SectionTitle>
4. Implementation
</SectionTitle>
    <Paragraph position="0"> To validate feature structure specifications in XTAG we have implemented the type inference algorithm suggested by Carpenter (1992, chapter 6). We manually constructed a type signature suitable for the current use of feature structures in the XTAG grammar of English (XTAG Research Group 2001). Then, we applied the type inference algorithm to all the feature structure specifications of the grammar, such that each feature structure was expanded with respect to the signature.</Paragraph>
    <Paragraph position="1"> Type inference is applied off-line, before the grammar is used for parsing. As is the case with other off-line applications, efficiency is not a critical issue. It is worth noting, however, that for the grammar we checked (in which, admittedly, feature structures are flat and relatively small), the validation procedure is highly efficient. As a benchmark, we checked the consistency of 1,000 trees, each consisting of two to fourteen nodes.</Paragraph>
    <Paragraph position="2"> The input file, whose size approached 1MB, contained over 33,000 path equations.</Paragraph>
    <Paragraph position="3"> Validating the consistency of the benchmark trees took less than 33 seconds (more than a thousand path equations per second).</Paragraph>
    <Section position="1" start_page="393" end_page="393" type="sub_section">
      <SectionTitle>
4.1 The Signature
</SectionTitle>
      <Paragraph position="0"> The signature for the XTAG grammar was constructed manually, by observing the use of feature equations in the grammar and consulting its documentation. As noted above, most feature structures used in the grammar are flat, but the number of features in the top level is relatively high. The signature consists of 58 types and 56 features, and its construction took a few hours. In principle, it should be possible to construct signatures for untyped feature structures automatically, but such signatures will of course be less readable than manually constructed ones.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML