File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1082_metho.xml
Size: 24,121 bytes
Last Modified: 2025-10-06 14:08:58
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1082"> <Title>Using linguistic principles to recover empty categories</Title> <Section position="3" start_page="1" end_page="2" type="metho"> <SectionTitle> 3 Previous work </SectionTitle> <Paragraph position="0"> Previous approaches to this task have all been learning-based. Collins' (1997) Model 3 integrates the detection and resolution of WH-traces in relative clauses into a lexicalized PCFG. Collins' results are not directly comparable to the works cited below, since he does not provide a separate evaluation of the empty category detection and resolution task.</Paragraph> <Paragraph position="1"> Johnson (2002) proposes a pattern-matching algorithm, in which the minimal connected tree fragments containing an empty node and its antecedent(s) are extracted from the training corpus, and matched at runtime to an input tree.</Paragraph> <Paragraph position="2"> As in the present approach, Johnson inserts empty nodes as a post-process on an existing tree. He proposes an evaluation metric (discussed further below), and presents results for both detection and detection plus resolution, given two different kinds of input: perfect trees (with empty nodes removed) and parser output.</Paragraph> <Paragraph position="3"> Dienes and Dubey (2003a,b), on the other hand, integrate their empty node resolution algorithm into their own PCFG parser. They first locate empty nodes in the string, taking a POS-tagged string as input, and outputting a POS-tagged string with labeled empty nodes inserted. The PCFG parser is then trained, using the enhanced strings as input, without inserting any additional empty nodes. Antecedent resolution is handled by a separate post-process. Using Johnson's (2002) evaluation metric, Dienes and Dubey present results on the detection task alone (i.e., inserting empty categories into the POS-tagged string), as well as on the combined detection and resolution tasks in combination with their parser.</Paragraph> <Paragraph position="4"> Higgins (2003) considers only the detection and resolution of WH-traces, and only evaluates the results given perfect input. Higgins' method, like Johnson's (2002) and the present one, involves post-processing of trees. Higgins' results are not directly comparable to the other works cited, since he assumes all WH-phrases as given, even those that are themselves empty.</Paragraph> </Section> <Section position="4" start_page="2" end_page="3" type="metho"> <SectionTitle> 4 The recovery algorithm </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 4.1 The algorithm </SectionTitle> <Paragraph position="0"> The proposed algorithm for recovering empty categories is shown in Figure 1; the algorithm walks the tree from top to bottom, at each node X deterministically inserting an empty category of a given type (usually as a daughter of X) if the syntactic context for that type is met by X. It makes four separate passes over the tree, on each pass applying a different set of rules.</Paragraph> <Paragraph position="1"> 1 for each tree, iterate over nodes from top down 2 for each node X 3 try to insert NP* in X 4 try to insert 0 in X 5 try to insert WHNP 0 or WHADVP 0 in X 6 try to insert *U* in X 7 try to insert a VP ellipsis site in X 8 try to insert S*T* or SBAR in X 9 try to insert trace of topicalized XP in X 10 try to insert trace of extraposition in X 11 for each node X 12 try to insert WH-trace in X 13 for each node X 14 try to insert NP-SBJ * in finite clause X 15 for each node X 16 if X = NP*, try to find antecedent for X The rules called by this algorithm that try to insert empty categories of a particular type specify the syntactic context in which that type of empty category can occur, and if the context exists, specify where to insert the empty category. For example, the category NP*, which conflates the GB categories NP-trace and PRO, occurs typically It is unclear whether Dienes and Dubey's evaluation of empty category detection is based on actual tags provided by the annotation (perfect input), or on the output of a POS-tagger.</Paragraph> <Paragraph position="2"> NP* is used in roles that go beyond the GB notions of NP-trace and PRO, including e.g. the subject of as the object of a passive verb or as the subject of an infinitive. The rule which tries to insert this category and assign it a function tag is called in line 3 of Figure 1 and given in pseudo-code in Figure 2. Some additional rules are given in the Appendix.</Paragraph> <Paragraph position="3"> if X is a passive VP & X has no complement S if there is a postmodifying dangling PP Y then insert NP* before all postmodifiers of Y else insert NP* before all postmodifiers of X else if X is a non-finite S and X has no subject then insert NP-SBJ* after all premodifiers of X This rule, which accounts for about half the empty category tokens in the PTB, makes no use of lexical information such as valency of the verb, etc. This is potentially a problem, since in GB the infinitives that can have NP-trace or PRO as subjects (raising and control infinitives) are distinguished from those that can have overt NPs or WH-trace as subjects (exceptional-Casemarked, or ECM, infinitives), and the distinction relies on the class of the governing verb.</Paragraph> <Paragraph position="4"> Nevertheless, the rules that insert empty nodes do not have access to a lexicon, and very little lexical information is encoded in the rules: reference is made in the rules to individual function words such as complementizers, auxiliaries, and the infinitival marker to, but never to lexical properties of content words such as valency or the raising/ECM distinction. In fact, the only reference to content words at all is in the rule which tries to insert null WH-phrases, called in line 5 of Figure 1: when this rule has found a relative clause in which it needs to insert a null WH-phrase, it checks if the head of the NP the relative clause modifies is reason(s), way(s), time(s), day(s), or place(s); if it is, then it inserts WHADVP with the appropriate function tag, rather than WHNP.</Paragraph> <Paragraph position="5"> The rule shown in Figure 2 depends for its successful application on the system's being able to identify passives, non-finite sentences, heads of phrases (to identify pre- and post-modifiers), and functional information such as subject; similar information is accessed by the other rules used in the algorithm. Simple functions to identify passives, etc. are therefore called by the implemented versions of these rules. Functional information, such as subject, can be gleaned from the function tags in the treebank annotation; the rules make frequent use of a variety of function tags as they occur on various nodes. The output of imperatives; see below.</Paragraph> <Paragraph position="6"> Charniak's parser (Charniak, 2000), however, does not include function tags, so in order for the algorithm to work properly on parser output (see Section 5), additional functions were written to approximate the required tags. Presumably, the accuracy of the algorithm on parser output would be enhanced by accurate prior assignment of the tags to all relevant nodes, as in Blaheta and Charniak (2000) (see also Section 5).</Paragraph> <Paragraph position="7"> Each empty category insertion rule, in addition to inserting an empty node in the tree, also may assign a function tag to the empty node. This is illustrated in Figure 2, where the final line inserts NP* with the function tag SBJ in the case where it is the subject of an infinitive clause.</Paragraph> <Paragraph position="8"> The rule that inserts WH-trace (called in line 12 in Figure 1) takes a WHXP needing a trace as input, and walks the tree until an appropriate insertion site is found (see Appendix for a fuller description). Since this rule requires a WHXP as input, and that WHXP may itself be an empty category (inserted by an earlier rule), it is handled in a separate pass through the tree.</Paragraph> <Paragraph position="9"> A separate rule inserts NP* as the subject in sentences which have no overt subject, and which have not had a subject inserted by any of the other rules. Most commonly, these are imperative sentences, but calling this rule in a separate pass through the tree, as in Figure 1, ensures that any subject position missed by the other rules is filled. Finally, a separate rule tries to find an antecedent for NP* under certain conditions. The antecedent of NP* may be an empty node inserted by rules in any of the first three passes through the tree, even the subject of an imperative; therefore this rule is applied in a separate pass through the tree. This rule is also fairly simple, assigning the local subject as antecedent for a non-subject NP*, while for an NP* in the subject position of a non-finite S it searches up the tree, given certain locality conditions, for another NP subject.</Paragraph> <Paragraph position="10"> All the rules that insert empty categories are fairly simple, and derive straighforwardly from standard GB theory and from the annotation guidelines. The most complex rule is the rule that inserts WH-trace when it finds a WHXP daughter of SBAR; most are about as simple as the rule shown in Figure 2, some more so. Representative examples are given in the Appendix.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.2 Development method </SectionTitle> <Paragraph position="0"> After implementing the algorithm, it was run over sections 1, 3, and 11 of the WSJ portion of the PTB, followed by manual inspection of the trees to perform error analysis, with revisions made as necessary to correct errors. Initially sections 22 and 24 were used for development testing.</Paragraph> <Paragraph position="1"> However, it was found that these two sections differ from each other substantially with respect to the annotation of antecedents of NP* (which is described somewhat vaguely in the annotation guidelines), so all of sections 2-21 were used as a development test corpus. Section 23 was used only for the final evaluation, reported in Section 5 below.</Paragraph> </Section> </Section> <Section position="5" start_page="3" end_page="5" type="metho"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> Following Johnson (2002), the system was evaluated on two different kinds of input: first, on perfect input, i.e., PTB annotations stripped of all empty categories and information related to them; and second, on imperfect input, in this case the output of Charniak's (2000) parser. Each is discussed in turn below.</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 5.1 Perfect input </SectionTitle> <Paragraph position="0"> The system was run on PTB trees stripped of all empty categories. To facilitate comparison to previous approaches, we used Johnson's label and string position evaluation metric, according to which an empty node is identified by its label plus its string position, and evaluated the detection task alone. We then evaluated detection and resolution combined, identifying each empty category as before, plus the label and string position of its antecedent, if any, again following Johnson's work.</Paragraph> <Paragraph position="1"> The results are shown in Table 2. Precision here and throughout is the percentage of empty nodes proposed by the system that are in the gold standard (section 23 of the PTB), recall is the percentage of empty nodes in the gold standard that are proposed by the system, and F gories given perfect input (label + string position method), expressed as percentage These results compare favorably to previously reported results, exceeding them mainly by achieving higher recall. Johnson (2002) reports 93% precision and 83% recall (F = 88%) for the detection task alone, and 80% precision and 70%</Paragraph> <Paragraph position="3"> = 75%) for detection plus resolution. In contrast to Johnson (2002) and the present work, Dienes and Dubey (2003a) take a POS-tagged string, rather than a tree, as input; they report 86.5% precision and 72.9% recall (F</Paragraph> <Paragraph position="5"> the detection task. For Dienes and Dubey, the further task of finding antecedents for empty categories is integrated with their own PCFG parser, so they report no numbers directly relevant to the task of detection and resolution given perfect input.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 5.2 Parser output </SectionTitle> <Paragraph position="0"> The system was also run using as input the output of Charniak's parser (Charniak, 2000). The results, again using the label and string position method, are given in Table 3.</Paragraph> <Paragraph position="1"> method), expressed as percentage Again the results exceed those previously reported. Johnson (2002) reports 85% precision and 74%</Paragraph> <Paragraph position="3"> resolution on the output of Charniak's parser.</Paragraph> <Paragraph position="4"> Dienes and Dubey (2003b) integrate the results of their detection task into their own PCFG parser, and report 81.5% precision and 68.7% recall (F</Paragraph> <Paragraph position="6"> 74.6%) on the combined task of detection and resolution.</Paragraph> </Section> <Section position="3" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 5.3 Perfect input with no function tags </SectionTitle> <Paragraph position="0"> The lower results on parser output obviously reflect errors introduced by the parser, but may also be due to the parser not outputting function tags on any nodes. As mentioned in Section 4, it is believed that the results of the current method on parser output would improve if that output were reliably assigned function tags, perhaps along the lines of Blaheta and Charniak (2000).</Paragraph> <Paragraph position="1"> Testing this hypothesis directly is beyond the scope of the present work, but a simple experiment can give some idea of the extent to which the current algorithm relies on function tags in the input. The system was run on PTB trees with all nodes stripped of function tags; the results are given in Table 4.</Paragraph> <Paragraph position="2"> (label + string position method), expressed as percentage While not as good as the results on perfect input with function tags, these results are much better than the results on parser output. This suggests that function tag assignment should improve the results shown on parser output, but that the greater part of the difference between the results on perfect input and on parser output is due to errors introduced by the parser.</Paragraph> </Section> <Section position="4" start_page="3" end_page="5" type="sub_section"> <SectionTitle> 5.4 Refining the evaluation </SectionTitle> <Paragraph position="0"> The results reported in the previous subsections are quite good, and demonstrate that the current approach outperforms previously reported approaches on the detection and resolution of empty categories. In this subsection some refinements to the evaluation method are considered.</Paragraph> <Paragraph position="1"> The label and string position method is useful if one sees the task as inserting empty nodes into a string, and thus is quite useful for evaluating systems that detect empty categories without parse trees, as in Dienes and Dubey (2003a). However, if the task is to insert empty nodes into a tree, then the method leads both to false positives and to false negatives. Suppose for example that the sentence When do you expect to finish? has the bracketing shown below, where '1' and '2' indicate two possible locations in the tree for the trace of the WHADVP: finish 1 ] 2 ] Suppose position 1 is correct; i.e. it represents the position of the trace in the gold standard. Since 1 and 2 correspond to the same string position, if a system inserts the trace in position 2, the string position evaluation method will count it as correct. This is a serious problem with the string-based method of evaluation, if one assumes, as seems reasonable, that the purpose of inserting empty categories into trees is to be able to recover semantic information such as predicate-argument structure and modification relations. In the above example, it is clearly semantically relevant whether the system proposes that when modifies expect instead of finish.</Paragraph> <Paragraph position="2"> Conversely, suppose the sentence Who (besides me) cares? has the bracketing shown: placement of the WHNP trace in the gold standard. If a system places the trace in position 2 instead, the string position method will count it as an error, since 1 and 2 have different string positions.</Paragraph> <Paragraph position="3"> However it is not at all clear what it means to say that one of those two positions is correct and the other not, since there is no semantic, grammatical, or textual indicator of its exact position. If the task is to be able to recover semantic information using traces, then it does not matter in this case whether the system inserts the trace to the left or to the right of the parenthetical.</Paragraph> <Paragraph position="4"> Given that both false positives and false negatives are possible, I propose that future evaluations of this task should identify empty categories by their label and by their parent category, instead of, or perhaps in addition to, doing so by label and string position. Since the parent of an empty node is always an overt node , the parent could be identified by its label and string position (left and right edges). Resolution is evaluated by a natural extension, by identifying the antecedent (which could itself be an empty category) according to its label and its parent's label and string position. This would serve to identify an empty category by its position in the tree, rather than in the string, and would avoid the false positives and false negatives described above. In addition to an evaluation based on tree position rather than string position, I propose to evaluate the entire recovery task, i.e., including function tag assignment, not just detection and resolution.</Paragraph> <Paragraph position="5"> The revised evaluation is still not perfect: when inserting an NP* or NP*T* into a double-object construction, it clearly matters semantically whether it is the first or second object, though both positions have the same parent.</Paragraph> <Paragraph position="6"> Ideally, we would evaluate based on a richer set of grammatical relations than are annotated in the PTB, or perhaps based on thematic roles. However, it is difficult to see how to accomplish this without additional annotation. It is probable that constructions of this sort are relatively rare in the PTB in any case, so for now the proposed evaluation method, however imperfect, will suffice.</Paragraph> <Paragraph position="7"> The result of this revised evaluation, given perfect input, is presented in Table 5. The first two rows are comparable to the string-based results in Table 2; the last row, showing the results of the full recovery task (i.e., including antecedents and function tags), is not much lower, suggesting that labeling empty categories with function tags does not pose any serious difficulties.</Paragraph> <Paragraph position="8"> The only exception is the 0 complementizer and S*T* daughters of the SBAR category in Table 1; but since the entire SBAR is treated as a single empty node for evaluation purposes, this does not pose a problem. I am indebted to two ACL reviewers for calling this to my attention.</Paragraph> <Paragraph position="9"> Task Prec. Rec. F empty categories given perfect input (label + parent method), expressed as percentage Three similar evaluations were also run, using parser output as input to the algorithm; the results are given in Table 6.</Paragraph> <Paragraph position="10"> Task Prec. Rec. F empty categories on parser output (label + parent method), expressed as percentage The results here are less impressive, no doubt reflecting errors introduced by the parser in the labeling and bracketing of the parent category, a problem which does not affect a string-based evaluation. However it does not seem reasonable to have an effective evaluation of empty node insertion in parser output that does not depend to some extent on the correctness of the parse. The fact that our proposed evaluation metric depends more heavily on the accuracy of the input structure may be an unavoidable consequence of using a tree-based evaluation.</Paragraph> </Section> </Section> <Section position="6" start_page="5" end_page="5" type="metho"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> The empty category recovery algorithm reported on here outperforms previously published approaches on the detection and resolution tasks; it also does well on the task of function tag assignment to empty categories, which has not been considered in other work. As suggested in the introduction, the reason a rule-based approach works so well in this domain may be that empty categories are not naturally in the text, but are only inserted by the annotator, who is consciously following explicit linguistic principles, in this case, the principles of early GB theory.</Paragraph> <Paragraph position="1"> As a result, the recovery of empty categories is, for the most part, more amenable to a rule-based approach than to a learning approach. It makes little sense to learn, for example, that NP* occurs as the object of a passive verb or as the subject of certain infinitives in the PTB, if that information is already explicit in the annotation guidelines.</Paragraph> <Paragraph position="2"> This is not to say that learning approaches have nothing to contribute to this task. Information about individual lexical items, such as valency, the raising/ECM distinction, or subject vs. object control, which is presumably most robustly acquired from large amounts of data, would probably help in the task of detecting certain empty categories.</Paragraph> <Paragraph position="3"> Consider for example an input structure V [ S to VP]. GB principles, which are enforced in the annotation guidelines, dictate that an empty category must be inserted as the subject of the infinitival S; but exactly which empty category, NP* or NP*T*, depends on properties of the governing verb, including whether it is a raising or control verb, such as seem or try, or an ECM verb, such as believe. In the present algorithm, the rule that inserts NP* applies first, without access to lexical information of any kind, so NP* is inserted, instead of NP*T*, regardless of the value of V.</Paragraph> <Paragraph position="4"> This leads to some errors which might be corrected given learned lexical information. Such errors are fewer than might have been expected, however: the present system achieved 97.7% precision and = 97.5%) on the isolated task of detecting NP*, even without lexical knowledge (see Table 7).</Paragraph> <Paragraph position="5"> A combined learning and rule-based algorithm might stand to make a bigger gain in the task of deciding whether NP* in subject position has an antecedent or not, and if it does, whether the antecedent is a subject or not. The annotation guidelines and the theory that underlies it are less explicit on the principles underlying this task than they are on the other subtasks. As a result, the accuracy of the current system drops considerably when this task is taken into account, from 97.5%</Paragraph> <Paragraph position="7"> to 86.9% (see Table 7). Dienes and Dubey (2003a), on the other hand, claim this as one of the strengths of their learning-based system.</Paragraph> <Paragraph position="8"> for detection and resolution of empty categories by type, using perfect input (label + parent method), expressed as percentage</Paragraph> </Section> class="xml-element"></Paper>