File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/e91-1017_metho.xml

Size: 13,388 bytes

Last Modified: 2025-10-06 14:12:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="E91-1017">
  <Title>A UNIFIED MANAGEMENT AND PROCESSING OF WORD-FORMS, IDIOMS AND ANALYTICAL COMPOUNDS</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE MORPHO-LEXICAL
KNOWLEDGE BASE
</SectionTitle>
    <Paragraph position="0"> Obviously, the main depository of morpho-lexical knowledge is the dictionary, to be discussed in the following.</Paragraph>
    <Paragraph position="1"> Other morphological knowledge sources are the endings tree and the paradigms table. These data structures do not depend on a specific lexical stock because they encode general linguistic knowledge for the language in ease (parts of speech, relevant categories for the inflexional behaviour, endings, paradigms, etc.). Since their organization and acquisition are described elsewhere ((Tufts 1989) and (Tufts 1990)) we will not dwell on them.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE DICTIONARY
</SectionTitle>
    <Paragraph position="0"> In our system, the dictionary is a two-way accessible collection of hierarchically structured entries. During parsing, the access is provided by a root index. Each root in this index is associated with one or (in case of root-homonymy) more dictionary entries. During generation, the access is ensured by a meaning index. Each symbol in this index labelling a meaning description structure (see below) is associated with one or (in case of synonymy) more dictionary entries.</Paragraph>
    <Paragraph position="1"> - 95 .</Paragraph>
    <Paragraph position="2"> The formal structure of a dictionary entry is described by the regular expression below:</Paragraph>
    <Paragraph position="4"> where: &lt;/emma&gt; and &lt;part-of-speech&gt; have the usual meaning.</Paragraph>
    <Paragraph position="5"> &lt;valency-model&gt; is a list of idiosyncratic features of interest mainly for syntactic processing (syntactic patterns, required prepositions, positions with respect to the dominant constituent for adjectives and adverbs, etc.).</Paragraph>
    <Paragraph position="6"> &lt;semantic-description &gt; is the name of a case. frame structure placed in a generic-specific hierarchy. The actual semantic descriptions reside in a different data space than the rest of the dictionary. This separation is motivated by various reasons, among them being: - the intention to enable for a meaning-based transfer, via the semantic descriptions area, between monolingual dictionaries; - the capability of interchanging domain-oriented semantic descriptions; -- the lexical stock independence from the meaning representation :formalism; - a more precise treatment of synonymy, antonymy and generalization-specialization relations. null Concerning the last reasoil invoked above, it is quite obvious that synonymy, antonymy or generalization-specialization relations cannot be established directly between dictionary entries. This is because such relations, more often than not, are defined over specific meanings of a pair of words and rarely a word is monosemantic. On the other hand, such relations are frequently domain dependent. Therefore, we let them be expressed between semantic case-frames (descriptors of individual meanings), but, because the meaning representation of the lexical stock is beyond the purpose of this paper, we will not refer to it.</Paragraph>
    <Paragraph position="7"> &lt;non-regular-root&gt; and the &lt;paradigmaticdescription&gt;s describe- for non-regular :inflecting words -- the conditions under which the &lt;non-regular-root&gt; may be considered in forming a word-form. A formal definition of what we call non-regular inflecting, as opposed tO the regular inflecting, is given in Tufts (1989). Informally, a word is a regular-inflecting one iff any grammatical form of it may be written as &lt;constant-part&gt; + &lt;ending&gt;. The &lt;constant-part&gt; is called the regular root of theword. If a word is not a regular-inflecting one, it is called non-regular. One may note that a non-regular inflecting word is characterized by more than one root. These roots are called non-regular-roots. A &lt;paradigmatic-description&gt; is a bit-map codification for the endings in a paradigm which may be combined, under a feature-values set of restrictions, with the &lt;nonregular-root&gt;. null &lt;phono-hyphen&gt; is a place-holder for the pronunciation transcription of the lemma or of the non-regular roots.iThis field also contains information about the hyphenation of the corresponding item.</Paragraph>
    <Paragraph position="8"> &lt;syntagmatic-description &gt; is a parameterized pattern, describing groups of words which are to be recognized or generated as stand-alone processing units. Given the importance of what we called syntagmatic processing (probably the most attractive feature of our system) we shall devote the next section to the presentation in greater detail of this topic.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE SYNTAGMS
</SectionTitle>
    <Paragraph position="0"> We mean by syntagm a sequence of at least two lcxical items which are to bc processed as a single unit. In accordance with this definition, the collocations, idioms and analytical compounds are syntagms. null A syntagm is represented in the dictionary as a pair (&lt;result&gt; &lt;pattern &gt;) and it is associated with the lcmma of the entry in case. This lcmma is called the pivot element of the syntagm and it may appear in whatever position of the sequence.</Paragraph>
    <Paragraph position="1"> In order to clarify the syntagm processing let us examine its formal structure:</Paragraph>
    <Paragraph position="3"> The replacement element of a syntagm is either the empty string or alemma which will be associated with the appropriate morpho-lexical features as resuited from its processing. This lemma may correspond to an element in the &lt;pattern&gt; specially marked as syntagm substituter and in this case &lt;syntagm-value&gt; is NULL (the empty replacement string corresponds to the NULL value of &lt;syntagm-value&gt; and no substituter element in the &lt;pattern&gt;).</Paragraph>
    <Paragraph position="4"> The &lt;element&gt; in the &lt;pattern&gt; of a &lt;syntagm&gt; may be a word-form, an (un)restricted lemma, an (un)reslrictcd grammar category or any one in a choice list of specified &lt;element&gt;s. In case of a choice, if &lt;obligativity&gt; is FALSE, besides the specified &lt;element&gt;s, the empty string is a valid candidate too.</Paragraph>
    <Paragraph position="5"> The &lt;displacement&gt; specified in a &lt;compound-element&gt; of the pattern of a &lt;syntagm&gt; determines the role to be played further on by the considered element. The meaning of the value in this field depends on whether the syntagm is to be recognized or generated: - the value '+' specifies that the current element is either the replacer of the syntagm (during analysis), or one of the elements of the syntagm expansion, in the specified position (during generation); - for analysis purposes, the values '&lt;' and '&gt;' specify that the current element is an &amp;quot;alien&amp;quot; constituent which must be transferred in front of or behind the syntagm replacer, r~pectively; during generation phase, the same values specify that the first item from the left or from the right of the syntagmatic item which is to be expanded will be moved -- obeying the possible restrictions -- to the output string, in the current position; - the '--' value is the default and says that the element in case will either be deleted from the input string (duringanalysis) or inserted in the output string (during generation).</Paragraph>
    <Paragraph position="6"> The &lt;restriction&gt;s are the principal means by which a lexicon designer expresses the rules governing the correct use of a syntagm. Depending on its format, the meaning of a &lt;restriction &gt; differs: a) (feature) In this case, the first (from the left to the right) matching value of the feature discussed has to be the same for all subsequent occurrences of the a-type restrictions over the same feature. This type of restriction is used to express feature congruency for different constituents appearing in the &lt;pattern&gt; of a syntagm as well as the inheritance of a feature value from the &lt;pattern&gt; to the &lt;result&gt; or viceversa. null b) (feature value) A &lt;pattern&gt; element restricted like that must match (during analysis) an input item having the specified value for the feature in case. In generation phase, it represents a word-forming parameter. If the restriction is associated with the &lt;result&gt; it simply represents an assignment (in case of analysis) or an expanding parameter (in case of generation). null c) (feature value1 value2..., valuen) Such a restriction may act on each feature only once - 97 in the &lt;pattern&gt; and once in the &lt;result&gt;. The paired multiple-valued features (one from the &lt;pattern&gt; and one from the &lt;result&gt;) positionally specify the relations between the values of a feature existing in both &lt;pattern&gt; and &lt;result&gt;. That is, if, during analysis, a &lt;pattern&gt; element matched an input item having for a given feature, say fro, one of the values specified in its restriction, say the k th, then the feature fm will be assigned in the &lt;result&gt; the k th value in its associated rt~triction. With generation, things are similar. In Tufts and Popescu (1990b), the flow of control as well as the formal power ofsyntagmatic processing are outlined by means of annotated examples of syntagms codifying the rules governing the compound verbal forms (including interrogative forms and &amp;quot;aliens&amp;quot; (adverbs, reflexive pronoun insertion) for English, French, Romanian, Russian and Spanish. null As an example we give below a syntagm describing one of the possible ways of forming two negative analytical verbal forms (pass6-compos6 and plus- null A more elaborated example, describing the basic compound tenses in English (not including the syntagms for handling adverbs insertion or negative and interrogative constructions) is the following:</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE ENDINGS TREE AND THE
PARADIGMS TABLE
</SectionTitle>
    <Paragraph position="0"> The endings tree (a discrimination tree) is a knowledge source for the parsing process: Internally, it represents all the known endings (we use the term 'ending' without further noticing its eventual structure -- e.g. suffix + desinence), and their morphological feature values. The nodes are labelled with letters appearing in different endings. A proper ending is represented by the concatenation of the letters labelling the nodes along a certain path, starting from a terminal node towards the root of the tree (this organization is due to the retrograde parsing strategy (Kotkova 1985) used in our system). A terminal node is not necessarilY a leaf node because of the possibility of including one ending into a longer one. Such a case is called intrinsic ambiguity. All terminal nodes are attached to the paradigmatic information specific to the endings they stand for. More often than not, an ending does not uniquely identify a paradigm but:a set of paradigms. In this case, the ending is called extrinsically ambiguous. Both types of ambiguity are theoretically solved by checks on the congruency between paradigmatic information attached to the respective endings (taken from the endings tree) and the candidate roots (taken from their dictionary entries).</Paragraph>
    <Paragraph position="1"> The paradigms table is the data structure used during the word-form generation process. The paradigms are automatically classified during the learning (acquisition) phase (Tufis 1990) into an inheritance hierarchy. A compilation phase transforms this hierarchy into the paradigms table. The internally assigned code era given paradigm is used as the index in the paradigms table, an entry of which has the following structure:</Paragraph>
    <Paragraph position="3"> The &lt;fixed-feature-values&gt; field represents a list of morphologica! features with predetermined values for the paradigm in case. These feature-values (if any) are collected while compiling the paradigms hierarchy and represent the discrimination criteria, according to which a more general paradigm is split into different specific paradigms.</Paragraph>
    <Paragraph position="4"> The &lt;variable-feature-values &gt; represents a list of (ordered) morphological features which may take any value out of the legal ones. An efficient numeric algorithm converts an arbitrary ordered set of feature-values into a code used as a displacement identifying the appropriate &lt;ending&gt; in the current entry of the table. Let us mention that the variable features have default values, so that, even if the generation criteria set was not completely specified, an inflected word-form is still generated.</Paragraph>
    <Paragraph position="5"> Moreover, if the endings tree or the paradigms table are not defined, the system does not crash but instead functions as if it had been designed for a word-form dictionary (the trivial morphology approach). null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML