File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0310_metho.xml

Size: 12,444 bytes

Last Modified: 2025-10-06 14:14:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0310">
  <Title>Assigning Grammatical Relations with a Back-off Model</Title>
  <Section position="4" start_page="90" end_page="91" type="metho">
    <SectionTitle>
2 Collecting Training and Test Data
</SectionTitle>
    <Paragraph position="0"> Shallow parsing techniques are used to collect training and test data from a text corpus. The corpus is tokenized, morphologically analyzed, lemmatized, and parsed using a standard CFG parser with a hand-written grammar to identify clauses containing a finite verb taking a nominative NP as its subject and an accusative NP as its object.</Paragraph>
    <Paragraph position="1"> Constructs covered by the grammar include verb-second and verb-final clauses. Each clause is segmented into phrase-like constituents, including nominative (NC), prepositional (PC), and verbal (VC) constituents. Their definition is non-standard; for instance, all prepositional phrases, whether complement or not, are left unattached. As an example, the shallow parse structure for the sentence in (4) is shown in (4') below.</Paragraph>
    <Paragraph position="2"> (4) Die Gesellschaft erwartet in diesem Jahr the society expects in this year in Siidostasien einen Umsatz in southeast Asia a turnover von 125 Millionen DM.</Paragraph>
    <Paragraph position="3"> from 125 million DM 'The society expects this year in southeast Asia a turnover of 125 million DM.'  Nominal and verbal constituents display person and number information; nominal constituents also display case information. For instance in the structure above, 3 denotes third person, s denotes singular number, nora and acc denote nominative and accusative case, respectively. The set {nora, acc} indicates that the first nominal constituent in the structure is ambiguous with respect to case; it may be nominative or accusative.</Paragraph>
    <Paragraph position="4"> Test and training tuples are obtained from shallow structures containing a verbal constituent and two nominative/accusative nominal constituents. Note that no subcategorization information is used; it suffices for a verb to occur in a clause with two nominative/accusative NCs for it to be considered testdeg ing/training data.</Paragraph>
    <Paragraph position="5"> Training data consists of tuples (nl,v, n2,x), where v is a verb, nl and n2 are nouns, and x E {1,0} indicates whether nl is the subject of the verb. Test data consists of ambiguous tuples (nx,v, n2) for which it cannot be established which noun is the subject/object of the verb based on morpho-syntacticai information alone.</Paragraph>
    <Paragraph position="6"> The set of training and test tuples for a given corpus is obtained as follows. For each shallow structure s in the corpus containing one verbal and two nominative/accusative nominal constituents, let nl, v, n2 be such that v is the main verb in s, and nl and n2 are the heads of the nominative/accusative NCs in s such that nl precedes n2 in s. In the rules below, i,j e {1,2},j ~ i, and g(i) = 1 if i = 1, and 0 otherwise. Note that the last element in a training  tuple indicates whether the first NC in the structure is the subject of the verb (1 if so, 0 otherwise). Case Nominative Rule. If ni is masculine, and the NC headed by ni is unambiguously nominative 1, then (nx, v, n2, g(i)) is a training tuple, Case Accusative Rule. If ni is masculine, and the  NC headed by ni is unambiguously accusative, then (nl, v, n2, g(j)) is a training tuple, Agreement Rule. If ni but not nj agrees with v in person and number, then (nl,v, n2,g(i)) is a training tuple, Heuristic Rule. If the shallow structure consists of a verb-second clause with an adverbial in the first position, or of a verb-final clause introduced by a conjunction or a complementizer, then (nl, v, n2, 1) is a training tuple (see below for examples), Default Rule. (hi, v, n2) is a test triple.</Paragraph>
    <Paragraph position="7"> For instance, the training tuple (Gesellschaft, erwarren, Umsatz, 1) ('society, expect, turnover') is obtained from the structure (4') above with the Case Accusative Rule, since the NC headed by the masculine noun Umsatz ('turnover') is unambiguously accusative and hence the object of the verb. The training tuple (Inflationsrate, erwarten, Okonom, O) ('inflation rate, expect, economist') and (Okonom, erwarten, Inflationsrate, 1) ('economist, expect, inflation rate') are obtained from sentences  (2) and (3) with the Case Nominative and Agreement Rules, respectively, and the test tuple (Inflationsrate, erwarten, Okonomin) ('inflation rate, expect, economist' ) from the ambiguous sentence in (1) by the Default Rule.</Paragraph>
    <Paragraph position="8">  The Heuristic Rule is based on the observation that in the constructs stipulated by the rule, although the object may potentially precede the sub-ject of the verb, this does not (usually) occur in written text. (5) and (6) are sentences to which this rule applies.</Paragraph>
    <Paragraph position="9"> (5) In diesem Jahr erwartet die Okonomin in this year expects the economist eine hohe Inflationsrate.</Paragraph>
    <Paragraph position="10"> a high inflation rate 'This year the economist expects a high inflation rate.&amp;quot; (6) Weil die Okonomin eine hohe Inflationsrate because the economist a high inflation rate erwartet,...</Paragraph>
    <Paragraph position="11"> expects 'Because the economist expects a high inflation rate,... ' Note that the Heuristic Rule does not apply to verb-final clauses introduced by a relative or interrogative item, such as in (7): (7) Die Rate, die die Okonomin erwartet, ...</Paragraph>
    <Paragraph position="12"> the rate which the economist expects, ...</Paragraph>
  </Section>
  <Section position="5" start_page="91" end_page="92" type="metho">
    <SectionTitle>
3 Testing
</SectionTitle>
    <Paragraph position="0"> The testing algorithm makes use of the back-off model (Katz, 1987) in order to determine the subject/object in an ambiguous test tuple. The model, developed within the context of speech recognition, consists of a recursive procedure to estimate n-gram probabilities from sparse data. Its generality makes it applicable to other areas; the method has been used, for instance, to solve prepositional phrase attachment in (Collins and Brooks, 1995).</Paragraph>
    <Section position="1" start_page="91" end_page="91" type="sub_section">
      <SectionTitle>
3.1 Katz's back-off model
</SectionTitle>
      <Paragraph position="0"> Let w~ denote the n-gram Wl,...,wn, and ff(w~) denote the number of times it occurred in a sample text. The back-off estimate computes the probability of a word given the n - 1 preceding words. It is defined recursively as follows. (In the formulae below, O~(W~ -1) is a normalizing factor and dr a discount coefficient. See (Katz, 1987) for a detailed account of the model.) P~ &amp;quot;w 'W n-l&amp;quot; fP(w. lwF--1), if P(w.lw~ -1) &gt; 0 hot nl 1 ): ~c~(w~-l)Pbo(Wnlw~-l), otherwise, where P(w.lw7 -1) is defined as follows: ~ iff(w\[ -1)~0 P(w.lwl '-1) = dI(~7)//(wl-, ~-,), 0, otherwise.</Paragraph>
    </Section>
    <Section position="2" start_page="91" end_page="92" type="sub_section">
      <SectionTitle>
3.2 The Revised Model
</SectionTitle>
      <Paragraph position="0"> In the current context, instead of estimating the probability of a word given the n-1 preceding words, we estimate the probability that the first noun nz in a test triple (nl,v, n2) is the subject of the verb v, i.e., P(S = ilNi = nl, V = v, N2 = n2) where S is an indicator random variable iS = I if the first noun in the triple is the subject of the verb, 0 otherwise).</Paragraph>
      <Paragraph position="1"> In the estimate Pbo(WnlW~ -I) only one relation-the precedence relation--is relevant to the problem; in the current setting, one would like to make use of two implicit relations in the training tuplc subject and object--in order to produce an estimate for P(l\[nl,v, n2). The model below is similar to that in iCollins and Brooks, 1995).</Paragraph>
      <Paragraph position="2"> Let PS be the set of lemmata occurring in the training triples obtained from a sample text, and let c(nl,v, n2,x) denote the frequency count obtained for the training tuple (nl,v, n2,x) (x E {0, 1}). We define the count fso(nl,v, n2) : c(nl,v, n2, 1) + c(n2, v, nx, 0) of nl as the subject and n2 as the object of v. Further, we define the count fs (nl, v) = ~n2ePSfso(nl,v, n2) of nl as the subject of v with any object, and analogously, the count fo(nx,v) of nl as the object of v with any subject. Further, we define the counts fs(v) = ~nl,n2eL c(nl, v, n2, 1)</Paragraph>
      <Paragraph position="4"> where the counts ci(nl,v, n2), and ti(nl,v, n2) are defined as follows: fso(nx,v, n2), ifi = 3</Paragraph>
      <Paragraph position="6"> The defnition of P3 (llnl, v, n2) is analogous to that of Pbo(Wnlw~-X). In the case where the counts are  positive, the numerator in the latter is the number of times the word Wn followed the n-gram w~ -1 in training data, and in the former, the number of times nl occurred as the subject with n2 as the object of v. This count is divided, in the latter, by the number of times the n-gram w~ -1 was seen in training data, and in the former, by the number of times nl was seen as the subject or object of v with n2 as its object/subject respectively.</Paragraph>
      <Paragraph position="7"> However, the definition of P2(1\]nl, v, n2) is somewhat different; it makes use of both the subject and object relations implicit in the tuple. In P2(llnl, v, n2), one combines the evidence for nl as the subject of v (with any object) with that of n2 as the object of v (with any subject).</Paragraph>
      <Paragraph position="8"> At the P1 level, only the counts obtained for the verb are used in the estimate; although for certain verbs some nouns may have definite preferences for appearing in the subject or object position, this information was deemed on empirical grounds not to be appropriate for all verbs.</Paragraph>
      <Paragraph position="9"> When the verb v in a test tuple (nl,v, n2) does not occur in any training tuple, the default Po(llnl,v, n2 ) = 1.0 is used; it reflects the fact that constructs in which the first noun is the subject of the verb are more common.</Paragraph>
    </Section>
    <Section position="3" start_page="92" end_page="92" type="sub_section">
      <SectionTitle>
3.3 Decision Algorithm
</SectionTitle>
      <Paragraph position="0"> The decision algorithm determines for a given test tuple (nl,v, n2), which noun is the subject of the verb v. In case one of the nouns in the tuple is a pronoun, it does not make sense to predict that it is subject/object of a verb based on how often it occurred unambiguously as such in a sample text. In this case, only the information provided by training data for the noun in the test tuple is used. Further, in case both heads in a test tuple are pronouns, the tuple is not considered. The algorithm is as follows.</Paragraph>
      <Paragraph position="1"> If nl and n2 are both nouns, then nl is the subject of v if P3 (llnl, v, n2) &gt; 0.5, else its object.</Paragraph>
      <Paragraph position="2"> In case n2 (but not nl) is a pronoun, redefine ci and ti as follows:</Paragraph>
      <Paragraph position="4"> and calculate P2(llnl,v, n2 ) with these new definitions. If P2(l\[nl,v, n2) &gt; 0.5, then nl is the subject of the verb v, else its object. We proceed analogously in case nl (but not n2) is a pronoun.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="92" end_page="92" type="metho">
    <SectionTitle>
3.4 Related Work
</SectionTitle>
    <Paragraph position="0"> In (Collins and Brooks, 1995) the back-off model is used to decide PP attachment given a tuple (v, nl,p, n2), where v is a verb, nl and n2 are nouns, and p a preposition such that the PP headed by p may be attached either to the verb phrase headed by v or to the NP headed by nx, and n: is the head of the NP governed by p.</Paragraph>
    <Paragraph position="1"> The model presented in section 3.2 is similar to that in (Collins and Brooks, 1995), however, unlike (Collins and Brooks, 1995), who use examples from a treebank to train their model, the procedure described in this paper uses training data automatically obtained from sample text. Accordingly, the model must cope with the fact that training data is much more likely to contain errors. The next section evaluates the decision algorithm as well as the training data obtained by the learning procedure.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML