File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1113_metho.xml

Size: 20,245 bytes

Last Modified: 2025-10-06 14:14:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1113">
  <Title>A FLEXIBLE EXAMPLE-BASED PARSER BASED ON THE SSTC&amp;quot;</Title>
  <Section position="3" start_page="687" end_page="688" type="metho">
    <SectionTitle>
2. NON-PROJECTIVE CORRESPONDE
-NCES IN NATURAL LANGUAGE
SENTENCES
</SectionTitle>
    <Paragraph position="0"> In this section, we shall present some cases where projective representation tree is found to be inadequate for characterizing representation tree of some natural language sentences. The cases illustrated here are featurisation, lexicalisation and crossed dependencies. An example containing mixture of these non-projective correspondences also will be presented.</Paragraph>
    <Section position="1" start_page="687" end_page="687" type="sub_section">
      <SectionTitle>
2.1 Featurisation
</SectionTitle>
      <Paragraph position="0"> Featurisation occurs when a linguist decides that a particular substring in the sentence, should not be represented as a subtree in the representation tree but perhaps as a collection of features. For example, as illustrated in figure 1, this would be the case for prepositions in arguments which can be interpreted as part of the predicate and not the argument, and should be featurised into the predicate (e.g. &amp;quot;up&amp;quot; in &amp;quot;picksup&amp;quot;), the particle &amp;quot;up&amp;quot; is featurised as a part of the feature properties of the verb &amp;quot;pick&amp;quot;.</Paragraph>
      <Paragraph position="1"> picks up He picks up the ball</Paragraph>
    </Section>
    <Section position="2" start_page="687" end_page="687" type="sub_section">
      <SectionTitle>
2.2 Lexicalisation
</SectionTitle>
      <Paragraph position="0"> Lexicalisation is the case when a particular subtree in the representation tree presents the meaning of some part of the string, which is not orally realized in phonological form. Lexicalisation may result from the correspondence of a subtree in the tree to an empty substring in the sentence, or substring in the sentence to more than one subtree in the tree. Figure 2 illustrates the sentence &amp;quot;John eats the apple and Mary the pear&amp;quot; where &amp;quot;eats&amp;quot; in the sentence corresponds to more than one node in the tree.</Paragraph>
      <Paragraph position="1"> and ea_./&amp;quot;oO~~eats John eats the apple and Mary tile pear</Paragraph>
    </Section>
    <Section position="3" start_page="687" end_page="688" type="sub_section">
      <SectionTitle>
2.3 Crossed dependencies
</SectionTitle>
      <Paragraph position="0"> The most complicated case of string-tree correspondence is when dependencies are intertwined with each other. It is a very common phenomenon in natural language. In crossed dependencies, subtree in the tree corresponds to single substring in the sentence, but the words in a substring are distributed over the whole sentence in a discontinuous manner, in relation to the subtree they correspond to. An example of crossed dependencies is occurred in the b n c n sentences of the form (a n v I n&gt;0), figure 3 illustrates the representation tree for the string &amp;quot;aa v bb cc &amp;quot; (also written a.la.2 v b.lb.2 c.lc.2 to show the positions), this akin to the 'respectively' problem in English sentence like &amp;quot;John and Mary give Paul and Ann trousers and dresses respectively&amp;quot; \[4\].</Paragraph>
      <Paragraph position="2"> Sometimes the sentence contains mixture of these non-projective correspondences, figure 4 illustrates the sentence &amp;quot;He picks the ball up&amp;quot;, which contains both featurisation and crossed dependencies. Here, the particle &amp;quot;up&amp;quot; is separated from its verb &amp;quot;picks&amp;quot; by a noun phrase &amp;quot;the ball&amp;quot; in the string. And &amp;quot;up&amp;quot; is featurised into the verb &amp;quot;picks&amp;quot; (e.g. &amp;quot;up&amp;quot; in &amp;quot;picksup&amp;quot;). picl</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="688" end_page="688" type="metho">
    <SectionTitle>
3. STRUCTURED STRING-TREE
CORRESPONDENCE (SSTC)
</SectionTitle>
    <Paragraph position="0"> The correspondence between the string on one hand, and its representation of meaning on the other hand, is defined in terms of finer subcorrespondences between substrings of the sentence and subtrees of the tree. Such correspondence is made of two interrelated correspondences, one between nodes and substrings, and the other between subtrees and substrings, (the substrings being possibly discontinuous in both cases).</Paragraph>
    <Paragraph position="1"> The notation used in SSTC to denote a correspondence consists of a pair of intervals X/Y attached to each node in the tree, where X(SNODE) denotes the interval containing the substring that corresponds to the node, and Y(STREE) denotes the interval containing the substring that corresponds to the subtree having the node as root \[4\].</Paragraph>
    <Paragraph position="2"> Figure 5 illustrates the sentence &amp;quot;all cats eat mice&amp;quot; with its corresponding SSTC. It is a simple projective correspondence. An interval is assigned to each word in the sentence, i.e. (0-1) for &amp;quot;all&amp;quot;, (1-2) for &amp;quot;cats&amp;quot;, (2-3) for &amp;quot;eat&amp;quot; and (3-4) for &amp;quot;mice&amp;quot;. A substring in the sentence that corresponds to a node in the representation tree is denoted by assigning the interval of the substring to SNODE of the node, e.g.</Paragraph>
    <Paragraph position="3"> the node &amp;quot;cats&amp;quot; with SNODE interval (1-2) corresponds to the word &amp;quot;cats&amp;quot; in the string with the similar interval. The correspondence between subtrees and substrings are denoted by the interval assigned to the STREE of each node e.g. the subtree rooted at node &amp;quot;eat&amp;quot; with STREE interval (0-4) corresponds to the whole sentence &amp;quot;all cats eat mice&amp;quot;.</Paragraph>
  </Section>
  <Section position="5" start_page="688" end_page="692" type="metho">
    <SectionTitle>
4. USES OF SSTC ANNOTATION IN
EXAMPLE-BASED PARSING
</SectionTitle>
    <Paragraph position="0"> In order to enhance the quality of example-based systems, sentences in the example-base are normally annotated with theirs constituency or dependency structures which in turn allow example-based parsing to be established at the structural level. To facilitate such structural annotation, here we annotate the examples based on the Structured String-Tree Correspondence (SSTC). The SSTC is a general structure that can associate, to string in a language, arbitrary tree structure as desired by the annotator to be the interpretation structure of the string, and more importantly is the facility to specify the correspondence between the string and the associated tree which can be interpreted for both analysis and synthesis in NLP. These features are very much desired in the design of an annotation scheme, in particular for the treatment of linguistic phenomena which are not-standard e.g. crossed dependencies \[5\].</Paragraph>
    <Paragraph position="1"> Since the example in the example-base are described in terms of SSTC, which consists of a sentence (the text), a dependency tree' (the linguistic representation) and the mapping between the two (correspondence); example-based parsing is performed by giving a new input sentence, followed by getting the related examples(i.e, examples that contains same words in the input sentence) from the example-base, and used them to compute the representation tree for the input sentence guided by the correspondence between the string and the tree as discussed in the following sections. Figure 6 illustrates the general schema for example-based NL parsing based on the SSTC schema.</Paragraph>
    <Paragraph position="2">  The example-based approach in MT \[1\], \[2\] or \[3\], relies on the assumption that if two sentences are &amp;quot;close&amp;quot;, their analysis should be &amp;quot;close&amp;quot; too. If the analysis of the first one is known, the analysis of the other can be obtained by making some modifications in the analysis of the first one (i.e.</Paragraph>
    <Paragraph position="3"> i Each node is tagged with syntactic category to enable substitution at category level.</Paragraph>
    <Paragraph position="4">  close: distance not too large, modification: edit operations (insert, delete, replace) \[6\].</Paragraph>
    <Paragraph position="5"> In most of the cases, similar sentence might not occurred in the example-base, so the system utilized some close related examples to the given input sentence (i.e. similar structure to the input sentence or contain some words in the input sentence). For that it is necessary to construct several subSSTCs (called substitutions hereafter) for phrases in the input sentence according to their occurrence in the examples from the example-base. These substitutions are then combined together to form a complete SSTC as the output.</Paragraph>
    <Paragraph position="6"> Suppose the system intends to parse the sentence &amp;quot; the old man picks the green lamp up&amp;quot;, depending on the following set of examples representing the example-base.</Paragraph>
    <Paragraph position="7">  The example-base is first processed to retrieve some knowledge related to each word in the example-base to form a knowledge index. Figure 7 shows the knowledge index constructed based on the example-base given above. The knowledge retrieved for each word consists of: 1. Example number: The example number of one of the examples which containing this word with this knowledge. Note that each example in the example-base is assigned with a number as its identifier.  2. Frequency: The frequency of occurrence in the example-base for this word with the similar knowledge.</Paragraph>
    <Paragraph position="8"> 3. Category: Syntactic category of this word.</Paragraph>
    <Paragraph position="9"> 4. Type: Type of this word in the dependency tree (0: terminal, l: non-terminal).</Paragraph>
    <Paragraph position="10"> - Terminal word: The word which is at the bottom level of the tree structure, namely the word without any son/s under it (i.e.</Paragraph>
    <Paragraph position="11"> STREE=SNODE in SSTC annotation).</Paragraph>
    <Paragraph position="12"> - Non terminal word: The word which is linked to other word/s at the lower level, namely the word that has son/s (i.e.</Paragraph>
    <Paragraph position="13"> STREE~:SNODE in SSTC annotation).</Paragraph>
    <Paragraph position="14"> 5. Status: Status of this word in the dependency tree (0: root word, 1 : non-root word, 2: friend word) - Friend word: In case of featurisation, if a word is featurised into other word, this word is called friend for that word, e.g. the word &amp;quot;up&amp;quot; is a friend for the word &amp;quot;picks&amp;quot; in figure 1.</Paragraph>
    <Paragraph position="15"> 6. Parent category: Syntactic category of the parent node of this word in the dependency tree.</Paragraph>
    <Paragraph position="16"> 7. Position: The position of the parent node in the  sentence (0: after this word, 1 : before this word). 8. Next knowledge: A pointer pointing to the next possible knowledge of this word. Note that a word might have more than one knowledge, e.g. &amp;quot;man&amp;quot; could be a verb or a noun.</Paragraph>
    <Paragraph position="17"> Based on the constructed knowledge index in figure 7, the system built the following table of knowledge for the input sentence: The input sentence: the old man picks the green 0-1 1-2 2-3 3-4 4-5 5-6  Note that to each word in the input sentence, the system built a record which contain the word, SNODE interval, and a linked list of possible knowledge related to the word as recorded in the knowledge index. The following figure describes an example record for the word &lt;the&gt;: This mean: the word &lt;the&gt;, snode(0-1), one of the examples that contain the word with this knowledge is example l, this knowledge repeated 4 time in the example-base, the category of the word is &lt;det&gt;, it is a terminal node, non-root node, the parent category is &lt;n&gt;, and the parent appear after it in the sentence.</Paragraph>
    <Paragraph position="18">  =glExample No. Ifrequeneylcategory Itype Is~tus IParent categorylPosition INextKn.\[ the ~ I 4 det 0 I n 0 nil.</Paragraph>
    <Paragraph position="19"> old - ~ 4 1 adj 0 I n 0 nil.</Paragraph>
    <Paragraph position="20"> he - ~ I I n 0 I v 0 nil.</Paragraph>
    <Paragraph position="21"> turns - ~ 2 1 v I 0 nil.</Paragraph>
    <Paragraph position="22"> ball - ~ I I n 1 I v I nil.</Paragraph>
    <Paragraph position="23"> green - ~ 2 I adj 0 1 n 0 nil.</Paragraph>
    <Paragraph position="24"> signal - ~ 2 I n I I v 0 nil.</Paragraph>
    <Paragraph position="25"> on - ~ 2 1 adv 0 I v 1 nil.</Paragraph>
    <Paragraph position="26"> ticks - ~ I 1 v I 0 nil.</Paragraph>
    <Paragraph position="27"> off - ~ 3 1 adv 0 1 v 1 nil.</Paragraph>
    <Paragraph position="28"> man - ~ 4 I n 1 1 v 0 nil.</Paragraph>
    <Paragraph position="29"> died - ~ 4 I v I 0 nil.</Paragraph>
    <Paragraph position="30"> lamp - ~ 3 I n 1 I v 0 nil.</Paragraph>
    <Paragraph position="31"> up - ~ 1 1 p I 2 v 1 nil.</Paragraph>
    <Paragraph position="32">  This knowledge will be used to build the substitutionsfor the input sentence, as we will discuss in the next section.</Paragraph>
    <Paragraph position="33">  In order to build substitutions, the system first classifies the words in the input sentence into terminal words and non-terminal words. For each terminal word, the system tries to identify the non-terminal word it may be connected to based on the syntactic category and the position of the non-terminal word in the input sentence (i.e. before or after the terminal word) guided by SNODE interval. In the input sentence given above, the terminal words are &amp;quot;the&amp;quot;, &amp;quot;old&amp;quot; and &amp;quot;green&amp;quot; and based on the knowledge table for the words in the input sentence, they may be connected as son node to the first non-terminal with category \[n\] which appear after them in the input sentence.</Paragraph>
    <Paragraph position="34"> For ( &amp;quot;the&amp;quot; 0-1, and &amp;quot;old&amp;quot; 1-2 ) they are connected as sons to the word (&amp;quot;man&amp;quot; 2-3).</Paragraph>
    <Paragraph position="35"> nowledge I\] Non-terminal I able II wordStn\] I For (&amp;quot;the&amp;quot; 4-5, and &amp;quot;green&amp;quot; 5-6 ) they are connected as sons to the word (&amp;quot;lamp&amp;quot; 6-7).</Paragraph>
    <Paragraph position="37"> The remainder non-terminal words, which are not connected to any terminal word, will be treated as separate substitutions.</Paragraph>
    <Paragraph position="38"> From the input sentence the system builds the following substitutions respectively : man\[n\] picks\[v\] lamp\[n\] up\[p\]</Paragraph>
    <Paragraph position="40"> Note that this approach is quite similar to the generation of constituents in bottom-up chart parsing except that the problem of handling multiple overlapping constituents is not addressed here.</Paragraph>
    <Paragraph position="41">  In order to combine the substitutions to form a complete SSTC, the system first finds non-terminal words of input sentence, which appear as root word of some dependency trees in the example SSTCs. If more than one example are found (in most cases), the system will calculate the distance between the input sentence and the examples, and the closest example  (namely one with minimum distance) will be chosen to proceed further.</Paragraph>
    <Paragraph position="42"> In our example, the word &amp;quot;picks&amp;quot; is the only word in the sentence which can be the root word, so example (1) which containing &amp;quot;pick&amp;quot; as root will be used as the base to construct the output SSTC. The system first generates the substitutions for example (1) based on the same assumptions mentioned earlier in substitutions generation, which are : heln\] Picks\[v\] ball\[n\] uplPl</Paragraph>
    <Paragraph position="44"> Distance calculation: Here the system utilizes distance calculation to determine the plausible example, which SSTC structure will be used as a base to combine the substitutions at the input sentence. We define a heuristic to calculate the distance, in terms of editing operations. Editing operations are insert (E --&gt; p), deletion (p--)E) and replacing (a &amp;quot;-) s). Edition distances, which have been proposed in many works \[7\], \[8\] and \[9\], reflect a sensible notion, and it can be represented as metrics under some hypotheses. They defined the edition distances as number of editing operations to transfer one word to another form, i.e.</Paragraph>
    <Paragraph position="45"> how many characters needed to be edited based on insertion, deletion or replacement. Since words are strings of characters, sentences are strings of words, editing distances hence are not confined to words, they may be used on sentences \[6\].</Paragraph>
    <Paragraph position="46"> With the similar idea, we define the edition distance as: (i) The distance is calculated at level of substitutions (i.e. only the root nodes of the substitutions will be considered, not all the words in the sentences). (ii) The edit operations are done based on the syntactic category of the root nodes, (i.e. the comparison between the input sentence and an example is based on the syntactic category of the root nodes of their substitutions, not based on the words).</Paragraph>
    <Paragraph position="47"> The distance is calculated based on the number of editing operations (deletions and insertion) needed to transfer the input sentence substitutions to the example substitutions, by assigning weight to each of these operations: 1 to insertion and 1 to deletion.</Paragraph>
    <Paragraph position="48"> e.g. : a) S 1: The old man eats an apple.</Paragraph>
    <Paragraph position="49"> $2: He eats a sweet cake.</Paragraph>
    <Paragraph position="50"> man \[n\] eats \[v\] f' aplle in) the~\[adj\] ea~~ ~an \[det\] He In\] Iv\] cake ln\] a ldet\] sweet \[adj\] In (a), the distance between S 1 and $2 is 0.</Paragraph>
    <Paragraph position="51">  In (b), the distance between S1 and $2 is (3+2)=5.</Paragraph>
    <Paragraph position="52"> Note that when a substitution is decided to be deleted from the example, all the words of the related substitutions (i.e. the root of the substitutions and all other words that may link to it as brothers, or son/s), are deleted too. This series is determined by referring to an example containing this substitution in the example-base. For example in (b) above, the substitution rooted with &amp;quot;who&amp;quot; must be deleted, hence substitutions &amp;quot;drinks&amp;quot; and &amp;quot;tea&amp;quot; must be deleted too, similarly &amp;quot;in&amp;quot; must be deleted hence &amp;quot;garden&amp;quot; must be deleted too.</Paragraph>
    <Paragraph position="53"> Before making the replacement, the system must first check that the root nodes categories for substitutions in both the example and the input sentence are the same, and that these substitutions are occurred in the same order (i.e. the distance is 0). If there exist additional substitutions in the input sentence (i.e. the distance ~: 0), the system will either combine more than one substitution into a single substitution based on the knowledge index before replacement is carried out or treat it as optional substitution which will be added as additional subtree under the root. On the other hand, additional substitutions appear in the example will be treated as optional substitutions and hence can be removed.</Paragraph>
    <Paragraph position="54"> Additional substitutions are determined during distance calculation.</Paragraph>
    <Paragraph position="55"> Replacement: Next the substitutions in example (1) will be replaced by the corresponding substitutions generated from the input sentence to form a final SSTC. The replacement  process is done by traversing the SSTC tree structure for the example in preorder traversal, and each substitution in the tree structure replaced with its corresponding substitution in the input sentence. This approach is analogous to top down parsing technique.  sentence &amp;quot;the old man picks the green lamp up&amp;quot; using example ( 1 ).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML