File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-3209_abstr.xml

Size: 1,064 bytes

Last Modified: 2025-10-06 13:45:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3209">
  <Title>Learning Probabilistic Paradigms for Morphology in a Latent Class Model</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper introduces the probabilistic paradigm, a probabilistic, declarative model of morphological structure. We describe an algorithm that recursively applies Latent Dirichlet Allocation with an orthogonality constraint to discover morphological paradigms as the latent classes within a suffix-stem matrix. We apply the algorithm to data preprocessed in several different ways, and show that when suffixes are distinguished for part of speech and allomorphs or gender/conjugational variants are merged, the model is able to correctly learn morphological paradigms for English and Spanish. We compare our system with Linguistica (Goldsmith 2001), and discuss the advantages of the probabilistic paradigm over Linguistica's signature representation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML