File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-1027_intro.xml
Size: 2,173 bytes
Last Modified: 2025-10-06 14:05:13
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1027"> <Title>Compiling and Using Finite-State Syntactic Rules</Title> <Section position="3" start_page="0" end_page="88" type="intro"> <SectionTitle> Afn'Es DE COLING-9Z NANTES, 23-28 Aou'r 1992 </SectionTitle> <Paragraph position="0"> phrase structure grammars. Instead of using trees as a means of representlng structures, we use syntactic tags associated with words, and the finite-state rules constrain the choice of tags. This style of representaUon was adopted from Karlsson's CG approach and an earlier Finnish parser called FPARSE (Karlsson 1985, 1990).</Paragraph> <Paragraph position="1"> The current approach employs a shallow surface oriented syntax. We expect it to be useful in syntactic tagging of large text corpora. Infermat/on retrieval, and as a starting point for more elaborate syntactic or semantic analysis.</Paragraph> <Section position="1" start_page="0" end_page="88" type="sub_section"> <SectionTitle> 1.1 Representation of sentences </SectionTitle> <Paragraph position="0"> We represent the sentences as regu/ar expressions, or equivalently, asfinite-state networks, which list all combinatory possibilities to interpret them. Consider the sentence: the program runs.</Paragraph> <Paragraph position="1"> A (simplified) representation for the morphologically processed but syntactically unanalyzed sentence as a regular expression could be roughly as follows: Here 8S represents a sentence boundary, @ a word boundary, 8/ an ordinary clause houndamy, @< a begi,Lrflng of a center embedded clause, and @> the end of such an embedding. Square brackets '\[...r are used for grouping, and vertical bars' I' separate alternaUves. Each word has been assigned all possible syntactlc roles It could assume in sentences (eg. 0SUBJ 1 5 6 PROC. oF COLING-92, NANTES. AUG. 23-28, 1992 or @OBJ or ~PREDC). Note that between each two words there might be a clause boundary or a plain word boundary. The regular expression represents a number of strings (some 320) which we call the readings of the (unanalyTed) sentence. The following is one of them:</Paragraph> </Section> </Section> class="xml-element"></Paper>