File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1078_metho.xml

Size: 14,511 bytes

Last Modified: 2025-10-06 14:14:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1078">
  <Title>References</Title>
  <Section position="3" start_page="0" end_page="471" type="metho">
    <SectionTitle>
1. The Problem
</SectionTitle>
    <Paragraph position="0"> In German there exists a large class of verbs which behave like aufh6ren ('stop'), illustrated in (1).</Paragraph>
    <Paragraph position="1">  (1) a. Anna glaubt, dass Bernard aufh6rt. ('Anna believes that Bernard stops') b. Claudia h6rt jetzt auf.</Paragraph>
    <Paragraph position="2"> ('Claudia stops now PRT') c. Daniel versucht aufzuh6ren.</Paragraph>
    <Paragraph position="3">  ('Daniel tries to_stop') In subordinate clauses as in (1 a), the particle auf and the inflected part of the verb h6rt are written together. In main clauses such as (lb), the inflected form h6rt is moved by verb-second, leaving the particle stranded. In infinitive clauses with the particle zu ('to'), zu separates the two components of the verb and all three elements are written together. In analysis, the problem of separable verbs is to combine the two parts of the verb in contexts such as (lb) and (lc). Such a combination is necessary because syntactic and semantic properties of aufh6ren are the same, irrespective of whether the two parts are written together or not, but they cannot be deduced from the syntactic and semantic properties of the parts. Therefore, a solution to the problem of separable verbs will treat (lb) as if it read (2a) and (lc) as (2b): (2) a. Claudia aufh6rt jetzt.</Paragraph>
    <Paragraph position="4"> b. Daniel versucht zu aufh6ren.</Paragraph>
    <Paragraph position="5"> The problem arises in a very similar fashion in Dutch, as the Dutch translations (3) of the sentences in (1) show. The only difference is that the infinitive in (3c) is not written  together.</Paragraph>
    <Paragraph position="6"> (3) a. Anna gelooft dat Bernard ophoudt. b. Claudia houdt nu op.</Paragraph>
    <Paragraph position="7"> c. Daniel probeert op te houden.</Paragraph>
    <Paragraph position="8">  On the other hand, the problem of separable verbs in German and Dutch differs from the corresponding one in English, because English verbs such as look up are multi-word units in all contexts. A treatment of these cases which is in line with the solution proposed here is described by Tschichold (forthcoming).</Paragraph>
    <Paragraph position="9"> As suggested by the English translation, separable verbs in German and Dutch are lexemes. Therefore, an important issue in evaluating a mechanism for dealing with them is how it fits in with the reusability of lexical resources.</Paragraph>
    <Paragraph position="10"> Given the importance of the orthographic component in the problem, it ~s not surprising that it is hardly if ever treated in the linguistic literature.</Paragraph>
  </Section>
  <Section position="4" start_page="471" end_page="474" type="metho">
    <SectionTitle>
2. Previous Approaches
</SectionTitle>
    <Paragraph position="0"> In existing systems or resources for NLP, separable verbs are usually treated as a lexicographic and syntactic problem. Two typical approaches can be illustrated on the basis of Celex and Rosetta.</Paragraph>
    <Paragraph position="1"> Celex (http://www.kun.nl/celex) is a lexical database project offering a German dictionary with 50'000 entries and a Dutch dictionary with 120'000 entries. In these dictionaries separable verbs are listed with a feature conveying the information that they belong to the class of separable verbs and a bracketing structure showing the decomposition into a prefix and a base, e.g.</Paragraph>
    <Paragraph position="2"> (auf)(h6ren). Celex dictionaries are reusable, but the rule component for the interpretation of the information on separable verbs, i.e.</Paragraph>
    <Paragraph position="3"> the mechanism for going from (lb-c) to (2), remains to be developed by each NLPsystem using the dictionaries.</Paragraph>
    <Paragraph position="4"> Rosetta is a machine translation system which includes Dutch as one of the source and target languages. Rosetta (1994:78-79) describes how separable verbs are treated.</Paragraph>
    <Paragraph position="5"> For the verb ophouden illustrated in (3), there are three lexical entries, ophouden for the continuous forms as in (3a), and houden and op for the discontinuous forms as in (3b-c). When a form of houden is found in a text, it is multiply ambiguous, because it can be a form of the simple verb houden ('hold') or of one of the separable verbs ophouden ('stop'), aanhouden ('arrest'), afhouden ('withhold'), etc. The entry for houden as part of ophouden contains the information that it must be combined with a particle op.</Paragraph>
    <Paragraph position="6"> At the same time, op is ambiguous between a reading as preposition or particle. In syntax, there is a rule combining the two elements in a sentence such as (3b). It is clear that, while this approach may work, it is far from elegant. It creates ambiguity and redundancies, because ophouden written together is treated in a different entry from op + houden as a discontinuous unit. These properties make the resulting dictionaries less transparent and do not favour reusability.</Paragraph>
    <Paragraph position="7"> It should be pointed out that Celex and Rosetta were not chosen because their solution to the problem of separable verbs is worse than others. They are representative examples of currently used strategies, chosen mainly because they are relatively well-documented.</Paragraph>
    <Paragraph position="8">  morphological dictionaries. It includes rules for inflection and derivation (WM proper) and for clitics and multi-word units (Phrase Manager, PM). We will use WM here as a name for the combination of the two components. A general description of the design of WM, with references to various publications where the formalism is discussed in more detail, can be found in ten Hacken &amp; Domenig (1996).</Paragraph>
    <Paragraph position="9"> The German WM dictionary consists of a comprehensive set of inflectional and word formation rules describing the full range of morphological processes in German. In the last two years we have specified more than 100'000 database entries by classification of lexemes in terms of inflection rules (for morphologically simple entries) and by the application of word formation rules (for morphologically complex entries). In addition, the PM module contains a set of rules for clitics and multi-word units which covers German periphrastic inflection patterns and separable verbs.</Paragraph>
    <Paragraph position="10"> The rule types invoked in the treatment of separable verbs in WM include Inflection Rules (IRules), Word Formation Rules (WFRules), Periphrastic Inflection (PIRules), and Clitic Rules (CRules). We will describe each of them in turn.</Paragraph>
    <Section position="1" start_page="471" end_page="472" type="sub_section">
      <SectionTitle>
3.1. Inflection
</SectionTitle>
      <Paragraph position="0"> In inflection, aufhfJren is treated as a verb with a detachable prefix at!f The detachable prefix is defined as an underspecified IFormative. This means that, in the same way as for stems, its specification is distributed over a class specification and a  specification of the individual string. The class is defined by the linguist in the specification of inflection processes. The specification of the string is part of the lexicographic specification, i.e. the string specification is the result of the application of the word formation rule the lexicographer chooses for the definition of an individual entry. In the IRules, detachable prefixes are referred to as formatives in the formulae generating the word forms. Fig. 1 gives the relevant rule of the database for otherwise regular separable verbs, such as aufhOren.</Paragraph>
    </Section>
    <Section position="2" start_page="472" end_page="472" type="sub_section">
      <SectionTitle>
3.2. Word Formation
Word Formation Rules consist of a source
</SectionTitle>
      <Paragraph position="0"> definition and a target definition. The source definition determines what (kind of) formatives are taken to form a new word.</Paragraph>
      <Paragraph position="1"> The target definition specifies how the source formatives are combined, and which inflection rule the new word is assigned to.</Paragraph>
      <Paragraph position="2"> Separable verbs are the result of WFRules which are remarkable because of their target.</Paragraph>
      <Paragraph position="3"> The target specification is as in Fig. 2. This specification departs from the usual specification of a target in a WFRule in two respects. First, instead of concatenating the source formatives, the rule lists them, leaving concatenation to the IRule. This is necessary to form the past participle aufgeh6rt, where the two formatives are separated by the prefix ge- (cf. last line of Fig. 1). Separable verbs are specified by the lexicographer by linking a word to a WFRule having a target specification as in Fig. 2. In the case of aufl~Oren, this is a rule for prefixing in which &amp;quot;1&amp;quot; in Fig. 2 matches a closed set of predefined prefixes. The IRules and WFRules described so far cover the non-separated occurrences as in (1 a).</Paragraph>
      <Paragraph position="4"> The second special property of the specification in Fig. 2 is the system keyword &amp;quot;separable&amp;quot; in the second line. It assigns the result of the WFRule to the predefined class %separable. This class, whose name is defined in the WM-formalism, can be used to establish a link between the result of word formation and the input to the periphrastic inflection mechanism used to recognize occurrences such as in (lb).</Paragraph>
    </Section>
    <Section position="3" start_page="472" end_page="473" type="sub_section">
      <SectionTitle>
3.3. Periphrastic Inflection
</SectionTitle>
      <Paragraph position="0"> The mechanism for periphrastic inflection in WM consists of two parts. PIClasses are used to identify the components and PIRules to turn them into a single word form. The PIRule for separable verbs in German is given in Fig. 3. The rule in Fig, 3 consists of a name and a body, which in turn consists of input and output specifications separated by &amp;quot;=&amp;quot;. The input specifies a finite verb form (infinitive and participles are excluded by &amp;quot;^&amp;quot;) and a detachable prefix. The output combines them in the position of the verb, with the form prefix + verb, and with the features percolated from the verb (person,  number, etc.). This yields (2a) as a step in the analysis of (lb).</Paragraph>
      <Paragraph position="1"> The possibilities for specifying the relative position of the two elements to be combined are the same as the possibilities for multi-word units in general. In the PIClass for German it is specified that the finite verb always precedes the particle when the two are separated. In Dutch this is not the case, as illustrated by (3c), so that a different specification is required.</Paragraph>
    </Section>
    <Section position="4" start_page="473" end_page="473" type="sub_section">
      <SectionTitle>
3.4. Clitic Rules
</SectionTitle>
      <Paragraph position="0"> The clitic rule mechanism is used to analyse aufzuh6ren in (lc) and produce zu aufh6ren as in (2b). The CRule used is given in Fig.</Paragraph>
      <Paragraph position="1"> 4. Again input and output are separated by &amp;quot;=&amp;quot;. The input consists of the concatenation of three elements: a detachable prefix, infinitival zu, and an infinitive. Graphic concatenation is indicated by &amp;quot;+&amp;quot;. The CElement zu is defined elsewhere as a form of the infinitival z u, rather than the homonymous preposition, in order not to lose information. The output consists of two words, as indicated by the comma, the second of which concatenates the prefix and the verb.</Paragraph>
    </Section>
    <Section position="5" start_page="473" end_page="474" type="sub_section">
      <SectionTitle>
3.5. Recognition and
Generation
</SectionTitle>
      <Paragraph position="0"> In recognition, the input is the largest domain over which components of multi-word units (MWUs) can be spread. In practice, this coincides with the sentence.</Paragraph>
      <Paragraph position="1"> Since WM does not contain a parser, larger chunks of input will result in spurious recognition of potential MWUs. Let us assume as an example that the sentences in (1) are given as input.</Paragraph>
      <Paragraph position="2"> WM.</Paragraph>
      <Paragraph position="3"> The first component to act is the clitics component. It leaves everything unchanged except (lc), which is replaced by (2b): aufzuh6ren =&gt; zu at!f176ren. Then the rules of WM proper are activated. They replace each word form by a set of analyses in terms of a string and feature set. In (1 a), att.flliJrt is analysed as third person singular or second person plural of the present tense of aufhOren, in (lb) hOrt and attfare analysed separately, and in (Ic) aufiti~ren, which was given the feature infinitive by the CRule in Fig. 4, only as infinitive, not as any of the homonymous forms in the paradigm. The next step is periphrastic inflection. It applies to (la) and (lc) vacuously, but combines hOrt and auf in (lb), producing the feature description corresponding to (2b): hOrt auf =&gt; aufhOrt. Finally, the idiom recognition component (not treated here) applies vacuously.</Paragraph>
      <Paragraph position="4"> A general remark on recognition is in order here. The rule components of PM, i.e.</Paragraph>
      <Paragraph position="5"> clitics, periphrastic inflection and idiom recognition add their results to the set of intermediate representations available at the relevant point. Thus, after the clitic component, attfz.uhiSren continues to exist alongside zu auJh6ren in the analysis of (lc). Since the former cannot be analysed by WM proper, it is discarded. Likewise, hgrt will survive in (lb) after periphrastic inflection and indeed as part of the final result. This is necessary in examples such as (4): (4) Der Hund h6rt auf den Namen Wurzel.</Paragraph>
      <Paragraph position="6"> ('The dog answers to the name \[of\] Wurzel') Since rules in WM are not inherently directional, it is also possible to generate all forms of a lexeme such as aufhOren in the way they may occur in a text. The client  application required for this task can also include codes indicating places in the string where other material may intervene, because this information is available in the relevant PIClass of the database.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML