File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-1027_intro.xml

Size: 2,173 bytes

Last Modified: 2025-10-06 14:05:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1027">
  <Title>Compiling and Using Finite-State Syntactic Rules</Title>
  <Section position="3" start_page="0" end_page="88" type="intro">
    <SectionTitle>
Afn'Es DE COLING-9Z NANTES, 23-28 Aou'r 1992
</SectionTitle>
    <Paragraph position="0"> phrase structure grammars. Instead of using trees as a means of representlng structures, we use syntactic tags associated with words, and the finite-state rules constrain the choice of tags. This style of representaUon was adopted from Karlsson's CG approach and an earlier Finnish parser called FPARSE (Karlsson 1985, 1990).</Paragraph>
    <Paragraph position="1"> The current approach employs a shallow surface oriented syntax. We expect it to be useful in syntactic tagging of large text corpora. Infermat/on retrieval, and as a starting point for more elaborate syntactic or semantic analysis.</Paragraph>
    <Section position="1" start_page="0" end_page="88" type="sub_section">
      <SectionTitle>
1.1 Representation of sentences
</SectionTitle>
      <Paragraph position="0"> We represent the sentences as regu/ar expressions, or equivalently, asfinite-state networks, which list all combinatory possibilities to interpret them. Consider the sentence: the program runs.</Paragraph>
      <Paragraph position="1"> A (simplified) representation for the morphologically processed but syntactically unanalyzed sentence as a regular expression could be roughly as follows:  Here 8S represents a sentence boundary, @ a word boundary, 8/ an ordinary clause houndamy, @&lt; a begi,Lrflng of a center embedded clause, and @&gt; the end of such an embedding. Square brackets '\[...r are used for grouping, and vertical bars' I' separate alternaUves. Each word has been assigned all possible syntactlc roles It could assume in sentences (eg. 0SUBJ 1 5 6 PROC. oF COLING-92, NANTES. AUG. 23-28, 1992 or @OBJ or ~PREDC). Note that between each two words there might be a clause boundary or a plain word boundary. The regular expression represents a number of strings (some 320) which we call the readings of the (unanalyTed) sentence. The following is one of them:</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML