XML Viewer - p84-1114

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1114_metho.xml
Size: 22,974 bytes
Last Modified: 2025-10-06 14:11:42
<?xml version="1.0" standalone="yes"?>
<Paper uid="P84-1114">
  <Title>INTERPRETING SYNTACTICALLY ILL-FORMED SENTENCES</Title>
  <Section position="4" start_page="0" end_page="535" type="metho">
    <SectionTitle>
GRAMMARS AND NATURAL LANGUAGE
</SectionTitle>
    <Paragraph position="0"> It is widely accepted (see Charniak 81) that syntactic knowledge consitutes one of the founda tions needed to build natural language interpreters.</Paragraph>
    <Paragraph position="1"> Various kinds of grammatical formalisms have been devised to represent in efficient, flexible and pe\[ spicuous way the syntactic knowledge (Winograd 83).</Paragraph>
    <Paragraph position="2"> Even if the formalisms are quite different, the main characteristic shared by all grammars is that they are prescriptive (or normative) in nature. A grammar defines what a sentence is, that is it spe~  what sequences of words are acceptable. This is in sharp contrast with the normal use of language, which has, as its main purpose, the communication of something. Of course all grammars can be (and have been) augmented in order to build a representa tion of the meaning of the sentences (i.e. some thing that should be able to carry most of its tom municative contents), but a meaning can only be ob tained for correct sentences.</Paragraph>
    <Paragraph position="3"> Some efforts have recently been devoted to ex tending the coverage of grammars, in order to deal also with ill-formed sentences (Kwasny &amp; Sondheimer 81, Weischedel &amp; Sondheimer 82, Granger 82). This is usually done by relaxing the constraints imposed by some rules of the grammar, by adding new rules to take care of some kinds of ill-formedness, or by allowing the semantics to intervene when the sy~ tax is not able to process the input. However, most of these approaches present some problems: either the perspicuousness and the readibility of the gram mar is reduced or the control structure of the ana lyser is made considerably more complex.</Paragraph>
    <Paragraph position="4"> The sources of ill-formedness can be grouped in three classes: ellipsis, conjunctions, and syn tactic errors.</Paragraph>
    <Paragraph position="5"> In the case of ellipsis, a fragment such as &amp;quot;John&amp;quot; or &amp;quot;probably&amp;quot; can be understood by a human listener without any particular difficulty, prov! dad that a particular context is given. On the oth er hand, it is apparent that those fragments are not consistent with the rules defining the well-formed sentences.</Paragraph>
    <Paragraph position="6"> Similar problems arise in case the grammar at tempts to cope with conjunctions. In general, ellip sis is meaningful just in case a context external to the expression to analyse is assumed to exist.</Paragraph>
    <Paragraph position="7"> The situation with conjunctions is rather different: in some sense, the context that must be used to in terpret a conjunct is given by the previous con junet(s), so that it is expressed inside the sen tence that has to be analysed. The difficulty in the analysis of conjunctions depends on the fact that not only the second conjunct is often ill-formed (if it is considered as a standing-alone sen tence), but it is the particular form of ill-formed hess that provides the analyzer with the piece of information needed to decide what is the syntactic role of that conjunct (or, if we assume that the re sult of the syntactic analysis is represented in form of a tree, to decide where the constituent ex pressed by the conjunct has to be appended in the syntactic tree). For this reason, in the following sentences the second conjuncts have quite different roles: John loves Mary and Susy (i) John loves Mary and Susy Fred (2) John loves Mary and hates Violet (3) Thus, as in the case of ellipsis, a syntactic ana lyser designed to handle conjunctions must be able to operate on ill-formed fragments, but with the additional difficulty of modifying the parse tree on the basis of the type of ill-formedness.</Paragraph>
    <Paragraph position="8"> The last source of ill-formedness that we will consider are the syntactic errors. Differently from the previous cases, it is almost impossible to list all possible mistakes that a person could make in writing a sentence. Probably, most of them can not be considered as syntactic errors (e.g. misspe! ling of words or wrong markers for a given case of a verb), but there are also errors that have purely syntactic grounds. Some noticeable examples are agreement errors, ordering errors and errors in verb tenses. An examples of each of them is report ed below: John love Mary (4) John is going probably to home (5) Yesterday I have eaten a good cake (6) Even if a more detailed discussion appears in the fifth section of this paper, it is worth noting here three points: - most native English speakers will probably never make such errors, but, firstly, they could easily be made by non-native speakers and, secondly, at least the error exemplified in (4) could result from a typing error - errors of that kind are more frequent in Italian, since it is richly inflectional - even if the first and third type of errors can be (more or less) easily handled by means of relaxa tion techniques (Kwasny &amp; Sondheimer 81), this is not the case for ordering errors; this is due to the fact that the agreement and tense constraints are expressed &amp;quot;explicitly&amp;quot; in the grammar (e.g. by an augmentation), whereas the order is specif_i ed implicitly (i.e. rigidly embodied in the gram mar itself).</Paragraph>
    <Paragraph position="9"> The analysis of the problems mentioned in this section, together with some other considerations that are not worth being discussed extensively here (regarding, for instance, garden paths) led us to the design of a formalism for representing the sy~ tactic knowledge that splits it into two levels.</Paragraph>
    <Paragraph position="10"> The first level contains a set of rules that, in our intention, characterize the meaningful sen fences. It can be questioned whether rules regard ing meaning can be considered as syntactic rules.</Paragraph>
    <Paragraph position="11"> Our opinion is that the syntactic categories asso ciated with natural language words have a strong semantic bias (see, for a thorough discussion of this thesis (Lyons 77, Chapt.ll~ For this reason, we defined a set of node types that have to be used in building the tree representing the syntactic structure of the sentence. These node types (report ed in table l) are associated with the syntactic categories and the topological constraints that go v  the name (actual and extended); the secoond one contains the classical syntactic categories associated with the node type ern the attachment of nodes constitute the basic filter which selects the &amp;quot;meaningful&amp;quot; fragments of sentence. As an example of this kind of constraint% it is unreasonable to assume that an ADJ node can be attached elsewhere than a REF node (with the exception of verbs having a copulative function, e.g. to be, to seem, to taste etc.). For this reason, in dependently of its position in the sentence, we can exclude some kinds of constructs (e.g. ADJ-ADJ attachment) as meaningless. W When a rule of the first set is executed it (normally) involves the creation of a new node (possibly more than one) and its attachment to the syntactic tree which was built up to that time.</Paragraph>
    <Paragraph position="12"> Because of the limited knowledge used to hypothesize the attachment point, it can often happen that the parser made the wrong choice. Such an error can be detected by using two different knowledge sources: higher-level syntactic constraints and semantics. The first of them contains the rules that define the well-formedness of sentences (in particular gender-number agreements rules and ordering rules) whereas the second knowledge source tells whether an attachment is semantically acceptable (of course, even if a REF-ADJ attachment is consis tent with the topological constraints, not all adjectives can be used to qualify a given noun). The semantic checks are done accessing a semantic net organized in two levels: the first of them (external) concerns the acceptable surface structures (e. g. case frames for verbs), whilst the second one (internal) is concerned with the actual semantics of the domain (e.g. subsetting among classes).</Paragraph>
    <Paragraph position="13"> 4 it must be noted that the rules embodying these constraints are expressed in procedural form. Even if the lack of a declarative representation makes more difficult the design and the maintenance of the rules, they are made more efficient in terms of execution time by taking into account the con text where the word occurs (involving a limited one word lookahead).</Paragraph>
    <Paragraph position="14"> Because of the frequency of this kind of wrong hyp2 thesization, an effective computational tool must be used to restructure the tree: this tool consists in what we called &amp;quot;natural changes&amp;quot;, which are simple pattern-action rules able to move around constituents; their purpose is to provide the parser with an alternative hypothesis when a given one has failed. Whereas the natural changes are tri~ered the same way both in case the inconsistency is syntactic and semantic, different courses of action take place if the changes cannot produce any accep~ able alternative hypothesis: if the error is of sy~ tactic type than the first hypothesis is maintained but a warning message is sent to the user; if the error is semantic, then the current interpretation of the fragment is considered unacceptable and, in case one or more choice points were previously met, the parser backtracks, otherwise the analysis fails. More details about the use of backup, as well as about other topics related with the parsing stratep~y, can be found in (Lesmo &amp; Torasso 83).</Paragraph>
    <Paragraph position="15"> A problem which must be faced when a natural change is stimulated is the choice of the best interpretation. Let us suppose that an agreement between an adjective and a noun is violated. In this case the natural change MOVE UP tries to attach the adjective to a REF node which is at a higher level with respect to the REF which the adjective is cur rently attached to. The new attachment stimulates the rules of the second set (that is the rules veri lying the agreement and the word ordering) and the semantic ones. It is possible that the semantic rules signal that the new attachment is not admissi ble from a semantic point of view. At this point, if no alternative attachment is possible, the sysL tem has to consider the first interpretation as the best one since it violates only the &amp;quot;weak&amp;quot; syntactic constraints.</Paragraph>
  </Section>
  <Section position="5" start_page="535" end_page="536" type="metho">
    <SectionTitle>
ELLIPSIS
</SectionTitle>
    <Paragraph position="0"> &amp;quot;Ellipsis&amp;quot; is a greek word (elleipsis) roughly corresponding to &amp;quot;lack, omission&amp;quot;, that is used, to take a dictionary definition, to stand for &amp;quot;omission of one or more words that can easily be subsumed&amp;quot;. Even if all components of the definition are fundamental, we want to stress the presence of the adverb &amp;quot;easily&amp;quot;. It is consistent with the observation that, whereas other phenomena occurring in natural language (e.g. garden path) require a conscious effort in the listener, elliptical sentences are understood without any difficulty. On the other hand, most current grammatical formalisms are not able to account for this ease in understand ing ellipsis; it must be noted the importance that is often laid on the ability to decide as soon as possible what is the allowable form of a given conz stituent (Buchenko et al. 83). This is due to the necessityof triggering in advance a suitable re- null stricted set of grammar rules, in our case this is not required: the first-level rules will work the same way independently of the global context where s given word or constituent occurs (this is not true for &amp;quot;local&amp;quot; contexts in the current version of the system: see note i); the consistency with the rules which govern the construction of well-formed sentences will be tested afterwards. This is particularly useful for handling elliptical fragments.</Paragraph>
    <Paragraph position="1"> Let's see through a pair of examples what is the b~ haviour of the parser in such sistuations.</Paragraph>
    <Paragraph position="2"> Example (i) is reported below: John (i) The rules associated with the category &amp;quot;noun&amp;quot; (note that the first-level rules are grouped in packets associated with syntactic categories), in case the analysis is at the beginning of the sentence, cause the building of the sentence reported below:</Paragraph>
    <Paragraph position="4"> When the end of the sentence in encountered, the structure is recognized as being incomplete and a pattern matching procedure applied to any preceding question can reconstruct its actual meaning. What must be noticed is that the first-level syntactic rules used to analyze the fragment are exactly the same that are used to analyze complete and correct sentences.</Paragraph>
  </Section>
  <Section position="6" start_page="536" end_page="536" type="metho">
    <SectionTitle>
CONJUNCTIONS
</SectionTitle>
    <Paragraph position="0"> The kind of processing that occurs in handling conjunctions requires the introduction of rather different constraints. The first interpretation pro duced for sentences 3) and 4) after the fragment &amp;quot;John loves Mary and Susy&amp;quot; has been analyzed is reported in fig. is. This interpretation is confirmed when the end of sentence 3) is encountered (so that the final structure is the one shown in fig. la).</Paragraph>
    <Paragraph position="1"> On the contrary, when the name &amp;quot;Fred&amp;quot; is scanned in sentence 4), it cannot be attached to &amp;quot;Susy&amp;quot; (excl~ ding the possibility that &amp;quot;Fred&amp;quot; is her family name) and the attempt to move it up to &amp;quot;loves&amp;quot; causes a semantic error (three unmarked case for &amp;quot;love&amp;quot;). At this point another &amp;quot;natural change&amp;quot; is triggered, which handles conjunctions. It tries to move up the &amp;quot;and&amp;quot; node, producing the structure of fig.lb which is accepted as the correct one. Note, however, that this kind of natural change is much more complex than the standard ones. For example, in the reported examples two new nodes have to be built: the emp ty REL node (this is done easily since only two nodes of the same type can be connected via &amp;quot;and&amp;quot;)  and the &amp;quot;UNMARKED&amp;quot; connection (for which an explicit request of creation and attachment must be issued). null A final observation regards the fact that the parser assumes that the first acceptable interpretation is the right one. This implies that a sentence of the form (see EX4 in Huang 83, pag.82) &amp;quot;The man with the telescope and the woman with the umbrella kicked the ball&amp;quot; would be interpreted as &amp;quot;The man with the telescope and with the woman with the umbrella kicked the ball&amp;quot;, that is not the most natural interpretation for a human listener. However, Italian always expresses explicitly the number of the verb (i.e. plural in this case), so that the Italian translation of the sentence would be analyzed correctly.</Paragraph>
  </Section>
  <Section position="7" start_page="536" end_page="538" type="metho">
    <SectionTitle>
SYNTACTIC ERRORS
</SectionTitle>
    <Paragraph position="0"> The system tolerates and possibly recovers the following different kinds of errors:  - lexical errors - agreement errors - errors in the ordering of the constituents - extra cases  (note that only the second and the third kind of errors are actual syntactic errors). As regards the errors at the lexical level, they are detected when the morphological analyzer tries to decompose a given word in &amp;quot;root + suffix&amp;quot; form. When no decomposition is posslble or none of the obtained roots occurs in the dictionary, the system asks the user about the possibility that the input word is mispelled. In the affirmative case, the user can retype the word, whereas in the opposite case the system asks the user to provide it with some pieces of information such as the synta~ tic category of the word, its normalized form (i.e. its root), the gender, the number, etc.; moreover the system asks what semantic object the word refers to. In this way the analysis of the sentence can go on and possibly an interpretation is constructed. However, it has to be pointed out that the information provided by the user during the  analysis of the sentence is not always sufficient for the system to complete the analysis. In fact, the current version of the system has not the capability of restructuring the semantic net dynamically, so that the system can continue the analysis only when the semantic object denoted by the unknown word is already present in the net.</Paragraph>
    <Paragraph position="1"> As regards &amp;quot;agreement errors&amp;quot; there is a large variety of error types grouped under this label: a) a first kind refers to the agreement in number and gender between the noun and the determiner and between the noun and the adjectives. It is worth noticing that such kind of errors is uncommon in Italian, because the suffixes for male and female and for singular and plural are in many cases quite different.</Paragraph>
    <Paragraph position="2"> b) A slightly more frequent error concerns the agreement in number, gender and person between the subject and the verb. Since in Italian the suffixes indicating the different persons of the verb, its tense and mood are quite different, people whose mother tongue is Italian usually do not make this kind of mistake.</Paragraph>
    <Paragraph position="3"> c) Another kind of agreement refers to the relation ships existing between the moods and the tenses of the verbs occurring in the main sentences and its subordinates. The rules, which are quite com plex since they derive from the &amp;quot;consecutio temporum&amp;quot; of Latin, are often violated so that this kind of error must be tolerate by the system. In this case the procedure which has the task of verifying the agreement emits a warning message when the rules are violated, but, contrarily to cases a) and b), it does not try to restructure the parse tree via &amp;quot;natural changes&amp;quot;, since in most cases no alternative interpretation exists.</Paragraph>
    <Paragraph position="4"> The framework we have provided is particularly useful for treating errors in the ordering of the constituents, in fact the order is checked only when a given sentence (possibly a subordinate) has been completed. This happens when the REL node that heads the clause (main or subordinate) is closed, that is a punctuation mark is encountered or a new node is attached to a node which is (in the parse tree) at a level higher than the REL currently analized. Before stimulating the ordering rules, the system checks that the case frame of REL has been correctly filled, that is all the cases attached to REL are compatible with the head and among them.</Paragraph>
    <Paragraph position="5"> Just in this case a set of rules is activated depending on the sentence type (it is apparent that the constituent order is different in a declarative, interrogative or relative clause). Each rule represents a legitimate ordering of the constituents and the rules are ordered in decreasing degree of acceptability. The rules are matched in turn against the actual case frame of the verb acting as head of the clause under examination; in case no rule matches, a warning is issued to signal the user that something has gone wrong in the ordering; anyway the interpretation of the clause obtained by ac cessing the semantic net is maintained and the analysis goes on if the entire sentence has not yet been scanned. A similar (but simpler) processing oc curs for a REF node with respect to the adjectives attached to it.</Paragraph>
    <Paragraph position="6"> There are also cases which are more difficult to treat thao the ones involving violations in the word ordering. In fact, a sentence like &amp;quot;Ii giornale Io ha comprato Giovanni stamattina&amp;quot; (literally &amp;quot;The newspaper it has bought John this morning&amp;quot;) in volves not only word order violations (the syntactic object occurs in the first position in the sentence), but also there is a case denoted by &amp;quot;io&amp;quot; (&amp;quot;it&amp;quot;) which duplicates the object. Such sentences are clearly incorrect from a syntactic point of view as well as, in principle, from a semantic one (wrong case frame), but they are perfectly understandable and quite frequent because they allow one to identify as focus of the utterance the object without passivizing the sentence.</Paragraph>
    <Paragraph position="7"> The treatment of such kinds of errors requires only relatively inexpensive modifications to the way the semantic net is accessed. It is worth noticing, in fact, that the syntactic object (&amp;quot;il giornale&amp;quot;) is attached to a REL node which is empty when this attachment is performed. The semantic and agreement check procedures are stimulated but are immediately suspended since the REL node is empty.</Paragraph>
    <Paragraph position="8"> Similarly the pronoun &amp;quot;lo&amp;quot; is attached to the REL and the corresponding check procedures are suspended. When the REL node has been filled with &amp;quot;comprato&amp;quot; the suspended checks are resumed. The semantic procedure is able, by inspecting the semantic net, to state that &amp;quot;giornale&amp;quot; may fill the &amp;quot;object&amp;quot; role so that when the previously suspended semantic check is executed, it concludes that &amp;quot;lo&amp;quot; (&amp;quot;it&amp;quot;) cannot be attached to the REL filled with &amp;quot;comprare&amp;quot; (&amp;quot;buy&amp;quot;) since the object role has already been filled. null Instead of rejecting the current interpretation by stimulating the natural changes and possibly the backup mechanism, a modification of the par sing strategy consists in attaching a warning to the REF node containing the pronoun &amp;quot;lo&amp;quot; and in going on with the sentence analysis. When the sentence has been completely scanned and, consequently, it is possible to perform a global check on the actual case frame of &amp;quot;comprare&amp;quot;, the semantic procedure decides that &amp;quot;lo&amp;quot; is simply a repetition of the object and therefore it may be disregarded. In this way the interpretation of the sentence is possible, but the warning attached to the REF node con taining &amp;quot;io&amp;quot; is output to the user.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML