File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-2028_intro.xml

Size: 2,825 bytes

Last Modified: 2025-10-06 14:02:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2028">
  <Title>Modelling of a Gazetteer Look-up Component</Title>
  <Section position="3" start_page="0" end_page="161" type="intro">
    <SectionTitle>
2 Preliminaries
</SectionTitle>
    <Paragraph position="0"> A deterministic finite-state automaton (DFSA) is a quintuple M = (Q;SS;-;q0;F), where Q is a finite set of states, SS is the alphabet of M, - : Q PS SS ! Q is the transition function, q0 is the initial state and F Q is the set of final states. The transition function can be extended to -/ : Q PS SS/ ! Q by defining -(q;+) = q, and -(q;wa) = -(-/(q;w);a) for a 2 SS, w 2 SS/.</Paragraph>
    <Paragraph position="1"> The language accepted by an automaton M is defined as L(M) = fw 2 SS/j-/(q0;w) 2 Fg.</Paragraph>
    <Paragraph position="2"> In turn, the right language of a state q is defined as L(q) = fw 2 SS/j-/(q;w) 2 Fg.</Paragraph>
    <Paragraph position="3"> A path in a DFSA M is a sequence of triples h(p0;a0;p1);:::;(pk!1;ak!1;pk)i, where (pi!1;ai!1;pi) 2 QPSSSPSQ and -(pi;ai) = pi+1 for 1 * i &lt; k. The string a0a1 :::ak is the label of the path. The first and last state in a path ... are denoted as f(...) and l(...) respectively. We call a path ... a cycle if f(...) = l(...). Further, we call a path ... sequential if all intermediate states on ... are non-final and have exactly one incoming and one outgoing transition. Among all DFSAs recognizing the same language, the one with the minimal number of states is called minimal.</Paragraph>
    <Paragraph position="4"> Minimal acyclic DFSA (MADFSA) are the most compact data structure for storing and efficiently recognizing a finite set of words. They can be built via application of the space-efficient incremental algorithm for constructing a MADFSA from a list of strings in nearly linear time (4). An- null other finite-state device we refer to is the so called numbered minimal acyclic deterministic finite-state automaton. Each state of such automata is associated with an integer representing the cardinality of its right language. An example is given in Figure 1. Numbered automata can be used for assigning each accepted word a unique numeric key, i.e., they implement perfect hashing. An index I(w) of a word w can be computed as follows.</Paragraph>
    <Paragraph position="5"> We start with an index I(w) equal to 1 and scan the input w with the automaton. While traversing the accepting path, in each state we increase the index by the sum of all integers associated with the target states of transitions lexicographically preceding the transition used. Once the final state has been reached I(w) contains the unique index of w. Analogously, for a given index i the corresponding word w such that I(w) = i can be computed by deducing the path, which would lead to</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML