File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1006_intro.xml
Size: 6,190 bytes
Last Modified: 2025-10-06 14:01:47
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1006"> <Title>Generalized Algorithms for Constructing Statistical Language Models</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Preliminaries </SectionTitle> <Paragraph position="0"> Definition 1 A system a9a11a10a13a12a15a14a16a12a18a17a16a12 a19a20a12 a21a23a22 is a semiring (Kuich and Salomaa, 1986) if: a9a11a10a24a12a15a14a16a12 a19a25a22 is a commutative monoid with identity element a19 ; a9a11a10a26a12a18a17a16a12 a21a27a22 is a monoid with identity element a21 ; a17 distributes over a14 ; and a19 is an annihilator for a17 : for all a28a30a29a31a10a26a12a32a28a33a17 a19a34a3 a19a24a17a35a28a36a3 a19 . Thus, a semiring is a ring that may lack negation. Two semirings often used in speech processing are: the log semiring a37a38a3a39a9a11a40a42a41a44a43a46a45a48a47a49a12a15a14a13a50 a51a32a52a53a12a18a54a16a12a15a45a48a12a32a19a49a22 (Mohri, 2002) which is isomorphic to the familiar real or probability semiring a9a11a40a56a55a57a12a18a54a16a12a53a58a59a12a32a19a60a12a61a21a23a22 via a a62a64a63a49a65 morphism with, for all a28a66a12a32a67a24a29a68a40a69a41a70a43a46a45a48a47 : a28a59a14 a50 a51a71a52 a67a72a3a6a73a68a62a64a63a49a65a74a9a76a75a78a77a80a79a81a9a82a73a83a28a80a22a81a54a44a75a84a77a80a79a81a9a85a73a24a67a84a22a85a22 and the convention that: a75a84a77a80a79a81a9a85a73a13a45a35a22 a3 a19 and a73a68a62a64a63a49a65a25a9a11a19a49a22a86a3a87a45 , and the tropical semiring a88a89a3a89a9a90a40a91a55a92a41 a43a46a45a48a47a49a12a71a93a33a94a64a95a96a12a15a54a16a12a18a45a8a12a71a19a49a22 which can be derived from the log semiring using the Viterbi approximation.</Paragraph> <Paragraph position="1"> Definition 2 A weighted finite-state transducer a97 over a semiring a10 is an 8-tuple a97a98a3a99a9a101a100a13a12a18a102a30a12a18a103a33a12a71a104a25a12a71a105a72a12a32a106a107a12a32a108a81a12a85a109a80a22 where: a100 is the finite input alphabet of the transducer; a102 is the finite output alphabet; a103 is a finite set of states; a104a48a110a111a103 the set of initial states; a105a112a110a113a103 the set of final states; a106a114a110a48a103a115a58a68a9a116a100a31a41a117a43a23a118a61a47a119a22a120a58a121a9a11a102a122a41a117a43a119a118a78a47a46a22a123a58a107a10a92a58a86a103 a finite set of transitions; a108a4a124a60a104a30a125a126a10 the initial weight function; and a109a4a124a20a105a127a125a128a10 the final weight function mapping a105 to a10 .</Paragraph> <Paragraph position="2"> A Weighted automaton a129a39a3a130a9a101a100a13a12a18a103a33a12a71a104a25a12a71a105a72a12a32a106a107a12a32a108a81a12a85a109a80a22 is defined in a similar way by simply omitting the output labels. We denote by a131a24a9a11a129a13a22a132a110a8a100a13a133 the set of strings accepted by an automaton a129 and similarly by a131a24a9a90a134a135a22 the strings described by a regular expression a134 .</Paragraph> <Paragraph position="3"> Given a transition a136a70a29a8a106 , we denote by a137a15a138 a136a61a139 its input label, a140a120a138 a136a61a139 its origin or previous state and a2a141a138 a136a61a139 its destination state or next state, a142a34a138 a136a78a139 its weight, a143a80a138 a136a61a139 its output label (transducer case). Given a state a144a68a29a145a103 , we denote by a106a86a138 a144a23a139 the set of transitions leaving a144 .</Paragraph> <Paragraph position="4"> A path a146a147a3a39a136a49a148a120a149a61a149a78a149a85a136a46a150 is an element of a106a107a133 with consecutive transitions: a2a141a138 a136a152a151a76a153a20a148a32a139a83a3a154a140a120a138 a136a119a151a90a139 , a137a155a3a157a156a80a12a61a158a78a158a78a158a61a12a32a159 . We extend a2 and a140 to paths by setting: a2a141a138a146a160a139a86a3a130a2a141a138 a136 a150 a139 and a140a161a138 a146a160a139a70a3a162a140a161a138 a136a46a148a18a139 . A cycle a146 is a path whose origin and destination states coincide: a2a141a138 a146a160a139a59a3a6a140a161a138 a146a160a139 . We denote by</Paragraph> <Paragraph position="6"> and a163 a9a90a144a164a12a85a167a161a12a85a169a160a12a71a144a53a165a166a22 the set of paths from a144 to a144a27a165 with input label a167a38a29a38a100 a133 and output label a169 (transducer case).</Paragraph> <Paragraph position="7"> These definitions can be extended to subsets a170a36a12a32a170a107a165a161a110a171a103 , by: a163 a9a11a170a16a12a71a167a81a12a32a170a172a165a173a22a135a3a174a41a56a175a15a176a53a177a120a178a116a175a85a179a76a176a53a177a96a179 a163 a9a90a144a164a12a85a167a161a12a71a144a46a165a173a22 . The labeling functions a137 (and similarly a143 ) and the weight function a142 can also be extended to paths by defining the label of a path as the concatenation of the labels of its constituent transitions, and the weight of a path as the a17 -product of the weights of its constituent transitions: a137a15a138a146a160a139a83a3a127a137a18a138 a136 a148 a139a27a149a78a149a78a149a85a137a18a138 a136 a150 a139 , a142a34a138 a146a160a139a24a3a180a142a16a138 a136 a148 a139a160a17a181a149a61a149a78a149a27a17a48a142a34a138 a136 a150 a139 . We also extend a142 to any finite set of paths a182 by setting:</Paragraph> <Paragraph position="9"> larly, the output weight associated by a transducer a97 to a pair of input-output string a9a76a167a161a12a85a169a60a22 is:</Paragraph> <Paragraph position="11"> path in a weighted automaton or transducer a200 is a path from an initial state to a final state. a200 is unambiguous if for any string a167a7a29a121a100a26a133 there is at most one successful path labeled with a167 . Thus, an unambiguous transducer defines a function.</Paragraph> <Paragraph position="12"> For any transducer a97 , denote by a182a33a201a49a9a90a97a13a22 the automaton obtained by projecting a97 on its output, that is by omitting its input labels.</Paragraph> <Paragraph position="13"> Note that the second operation of the tropical semiring and the log semiring as well as their identity elements are identical. Thus the weight of a path in an automaton a129 over the tropical semiring does not change if a129 is viewed as a weighted automaton over the log semiring or viceversa. null</Paragraph> </Section> class="xml-element"></Paper>