File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/c90-3088_intro.xml

Size: 4,614 bytes

Last Modified: 2025-10-06 14:04:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-3088">
  <Title>Parsing Long English Sentences with Pattern Rules</Title>
  <Section position="3" start_page="0" end_page="410" type="intro">
    <SectionTitle>
3. Our Approach
</SectionTitle>
    <Paragraph position="0"> In addition to some-path bottom-up parsing for reducing branching factors, the input sentence can be divided into several meaningful segments(i.e, reducing the searching depth), then each segment is parsed separately without exchanging information with the other segments, and finally the parsing results of all segments are combined. The Chinese translation will be based on the combined parsing result.</Paragraph>
    <Paragraph position="1"> The parser of ERSO-ECMT first see if the input sentence matches the long sentence patterns. It will do parsing in accordance with the pattern having been matched, otherwise it will proceed parsing with the augmented context free grammar. In the case of failing in getting a complete parsing tree with the long sentence pattern, the parser will also do the same thing, trying to parse the sentence with the augmented context free grammar.</Paragraph>
    <Paragraph position="2"> The procedure for parsing with the 10ng English patterns is as follows:  a. Partitions the input long sentence into some meaningful segments: .Looks up the partition rules by pattern-matching with unification.</Paragraph>
    <Paragraph position="3"> .If the resultant segments are still with length greater than 40, does partitioning recursively on them, until no more pattern can be used.</Paragraph>
    <Paragraph position="4"> Note: in genera\], the resultant segments, such as Declarative Sentence(SDEC),  Phrase(INF), and Verb Phrase(VP), are big structures with some.k~y words or some special structures among them in the sentential form.</Paragraph>
    <Paragraph position="5"> b. Parses or translates each segment separately.</Paragraph>
    <Paragraph position="6"> c. Co~ines the results of all segments. d. Generates the corresponding Chinese sentence.</Paragraph>
    <Paragraph position="7"> Note: The parser can either combine the syntactic parsing results of all segments and then generate corresponding Chinese sentence, or generate Chinese translations of all segments and then put them, by transformation rules, in a sequence with order not necessary that of the original English segments in the input sentence.</Paragraph>
    <Paragraph position="8"> Before parsing a sentence, a sentence length threshold, say 40 for ERSO-ECMT, can be set to indicate how long a sentence will be parsed with the pattern ~ rules.</Paragraph>
    <Paragraph position="9"> The format of the rules for long English patterns in ERSO-ECMT is as follows:</Paragraph>
    <Paragraph position="11"> where &amp;quot;(&amp;quot; and &amp;quot;)&amp;quot;: all terminals, SEGRULE : a r~lle for long English segmentation, LHS: an augmented regular expression which is compo6ed of a regular expression and test(s), RHS: parsing action(s), test :: a LISP function which implements the designated test, ving: a gerund, such as going, doing, ved: a verb with endinq &amp;quot;ed&amp;quot;, where it indicates its past or past participle form, num: a number, english : an English word, or a symbol of punctuation, closure and plus: the functions * + corresponding to R and R where R is a regular expression, (The function are done by matching the shortest pattern, covered by the functions, in the input sentence.) opt: an optional item, Each symbol of the right-hand side of CAT: part-of-speech or category.</Paragraph>
    <Paragraph position="12"> parse : the LISP function for doing parsing, node : a grammar node, a nonte~ninal of the parsing tree, parsetransfersynthesis: the LISP function to do syntactic parsing, transformation, and generation, chinese : Chinese character(s), and %vl ... %vn: each of them being a variable to which a segment of the input English sentence will be bound.</Paragraph>
    <Paragraph position="13"> The reason for using the regular expression is that some repeated elements can be covered. Although the expressive power of the regular expression is less than that of the augmented context free grammar already in the system, they focus on two different things. The augmented context free gra~nar takes care of detailed phrase structures, though it can deal with long sentences, not quite well in general, whereas the long sentence pattern rules handle some of long sentences by breaking them down into segments of some big structures and then the augmented context free grammar takes care of all the segments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML