File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1065_intro.xml

Size: 2,281 bytes

Last Modified: 2025-10-06 14:02:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1065">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 515-522, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Disambiguation of Morphological Structure using a PCFG</Title>
  <Section position="3" start_page="516" end_page="516" type="intro">
    <SectionTitle>
2 Head-Lexicalized PCFGs
</SectionTitle>
    <Paragraph position="0"> A head-lexicalized parse tree is a parse tree in which each constituent is labeled with its category and its lexical head. The lexical head of a terminal symbol is the symbol itself and the lexical head of a non-terminal symbol is the lexical head of its (unique) head child.</Paragraph>
    <Paragraph position="1"> In a head-lexicalized PCFG (HL-PCFG) (Carroll and Rooth, 1998; Charniak, 1997), one symbol on the right-hand side of each rule is marked as the head. A HL-PCFG assumes that (i) the probability of a rule depends on the category and the lexical head of the expanded constituent and (ii) that the lexical head of a non-head node depends on its own category, and the category and the lexical head of the parent node. The probability of a head-lexicalized parse tree is therefore:</Paragraph>
    <Paragraph position="3"> where root is the root node of the parse tree cat(n) is the syntactic category of node n head(n) is the lexical head of node n rule(n) is the grammar rule which expands node n pcat(n) is the syntactic category of the parent of n phead(n) is the lexical head of the parent of n HL-PCFGs have a large number of parameters which need to estimated from training data. In order to avoid sparse data problems, the parameters usually have to be smoothed. HL-PCFGs can either be trained on labeled data (supervised training) or on unlabeled data (unsupervised training) using the Inside-Outside algorithm, an instance of the EM algorithm. Training on labeled data usually gives better results, but it requires a treebank which is expensive to create. In our experiments, we used unsupervised training with the LoPar parser which is available at http://www.ims.unistuttgart.de/projekte/gramotron/SOFTWARE/LoPar- null en.html.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML