XML Viewer - w98-1238

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1238_metho.xml
Size: 8,120 bytes
Last Modified: 2025-10-06 14:15:15
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1238">
  <Title>2 Feature identification</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
* -deg ~.degdeg
</SectionTitle>
    <Paragraph position="0"> are applied in an analysis of the WST in an effort to garner a set of possible inflectional suffixes. First, any suffix which has a suffix itself cannot be inflectional, based on the assumption that inflectional suffixes always occur at the end of a word. Second, root categories must have at least two inflected forms, thus a prefix may only be a possible root category if it appears to have at least two inflectional suffixes.</Paragraph>
    <Paragraph position="1"> The corollary is that a suffix is not inflectional if it is the only inflectional suffix. Under these restrictions, the suffix set for &amp;quot;aim&amp;quot; in Figure 1 would be {$, ed, ing, s}.</Paragraph>
    <Paragraph position="2"> Inflection in context To identify which suffixes are grammatically significant, a contextual analysis of how they are used must be carried out. This can be done by generalising over the trigrams of a large sample of text in which each word has been replaced by its corresponding suffix as given by the WST analysis. Almost all functional categories and irregularly inflected words have no inflectional suffixes associated with them and are therefore left unchanged.</Paragraph>
    <Paragraph position="3"> The trigrams are sorted and processed in decreasing order according to their frequency. Feature structures are hypothesised to reconcile trigrams that differ in only one term. For example, some of the most frequent trigrams might be as follows:  1) the -s were 2) the -s -8 3) the -$ was 4) the -8 -s 5) the -$ -ed 6) the -s -ed  Assume in this instance that -s has replaced dogs in phrase 1, 2 and 6, and $ has replaced dog in phrases 3, 4, and 5. In addition, assume that the prefix is walk for the suffixes given in the third position for phrases 2, 4, 5 and 6.</Paragraph>
    <Paragraph position="4"> These six related trigrams imply an agreement constraint that can be captured with a feature structure. For example, were and -$ appear after the context the -s, but not after the -$, and was and -s occur after the $ but not after -s, indicating a possible dependency between the last two terms. Phrases 5 and 6 imply a common syntactic role for $ and -s, thus we might infer that the dependency is one of feature agreement. As the second term is uniformly a suffix, we might assume that it projects the agreement and is therefore inflectional. To characterise this, we associate a feature with the words appearing in the second position, and assign it the value of the suffix in each instance, giving the following lexical entries:</Paragraph>
    <Paragraph position="6"> To characterise the dependency in the first four phrases, we project the feature structure on to the words in the third position, assigning the corresponding feature value needed to preserve the dependency, as follows:</Paragraph>
    <Paragraph position="8"> Phrases 5 and 6 must be made to have the same feature structure, but this appears to entail assigning two different values to the feature structure for walked. However, given that walked is not constrained by this particular feature, its value can be left ungrounded, giving:</Paragraph>
    <Paragraph position="10"> From this limited set of phrases, it appears unnecessary to extend the inflectional constraint to the word the. However, given a trigram of the form &amp;quot;a $ was&amp;quot; without the complementary trigram &amp;quot;a $ were&amp;quot;, agreement would force projection of the feature structure on to the determiner.</Paragraph>
    <Paragraph position="11"> Once a word has been identified as an inflected form, this provides additional information for the generalisation of subsequent trigrams. If a term is known to project an agreement constraint in one instance, this curtails the number of hypotheses that must be tested to determine the source of any new constraints. That is, if were and walk come up in another set of related trigrams, the existing feature F1 can be trialed first as a possible explanation.</Paragraph>
    <Paragraph position="12"> Capturing the syntactic constraints Deriving features in the manner described in the previous section provides an account of inflectional agreement. To translate this into syntactic constraints requires the addition of corresponding unification rules. Thus, as each trigram is processed, any changes to the feature structure must generate a rule that captures the linear precedence relation.</Paragraph>
    <Paragraph position="13"> This can be done efficiently with logic programs, such as Prolog DCGs. Initially, the grammar is formed by generating clauses to cover dependencies between pairs of terms, annotated with the appropriate feature structures and values. The grammar is built up by combining adjacent clauses and unifying their variables (i.e. features). The unification also allows rules for irregular inflections to be transformed into a more general form.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Irregular inflection
</SectionTitle>
      <Paragraph position="0"> Irregular words may follow some of the same inflectional patterns as regular words, such as the present and gerundive forms in English verbs, and thus can be generalised with the same mechanism. In other instances, they may force the creation of a new feature structure which captures the same agreement constraint. To avoid this, every new rule is compared against existing rules to see if they have a common structure which can be generalised. Only rules which differ by a single term need to be examined, and only if the features of that term are grounded in the established case. If the new rule can be unified with an old rule by a consistent change in its corresponding feature values, then the lexicon is adjusted and the new rule is discarded. Since irregular forms do not differ in their usage, sufficiently large samples of text (enough to cause a match between rules) will allow the same agreement constraint to be captured by one rule. This solution also applies to words whose Smith 293 Learning Feature- Value Grammars suffixing is irregular because of orthographic conventions, as when abated and abates are categorised by the suffix set {d, s} instead of the more common {ed, s}.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Remarks
</SectionTitle>
    <Paragraph position="0"> To the extent that inflectional agreement morphology and syntactic agreement structures are linked, generalisation over inflectional suffixes is likely the only means by which a unification grammar can be learned from plain text. This work represents an initial attempt at doing just that.</Paragraph>
    <Paragraph position="1"> The WST is a suitable data structure for uncovering suffixes, but is insufficient for identifying those which mark inflection. This requires a characterisation of how individual suffixes are used contextually, and identification of instances where they appear to impose agreement constraints.</Paragraph>
    <Paragraph position="2"> Limiting context analysis to trigrams has the obvious disadvantage that long distance dependencies cannot be reliably inferred unless they happen to percolate up through a series of unification operations between smaller phrases. It is possible that some statistical techniques for finding lexical dependencies, such as those used in constructing link grammars, would be a more effective way to build feature structures and the grammar.</Paragraph>
    <Paragraph position="3"> Perhaps the most appealing aspect to this approach is that it attempts to combine morphological constraints and syntactic constraints within a single model for grammar induction. In so doing it has uncovered a number of interesting problems and ideas which should generate interesting discussions in a language learning workshop.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML