File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/02/w02-1002_relat.xml
Size: 2,113 bytes
Last Modified: 2025-10-06 14:15:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1002"> <Title>Conditional Structure versus Conditional Estimation in NLP Models</Title> <Section position="4" start_page="0" end_page="0" type="relat"> <SectionTitle> 4 Related Results </SectionTitle> <Paragraph position="0"> Johnson (2001) describes two parsing experiments.</Paragraph> <Paragraph position="1"> First, he examines a PCFG over the ATIS treebank, trained both using RFEs to maximize JL, and using a CG method to maximize what we have been calling CL . He does not give results for the unconstrained CL, but even in the constrained case, the effects from section 2 occur. CL and parsing accuracy are both higher using the CL estimates. He also describes a conditional shift-reduce parsing model, but notes that it underperforms the simpler joint formulation.</Paragraph> <Paragraph position="2"> We take these two results not as contradictory, but as confirmation that conditional estimation, though often slow, generally improves accuracy, while conditional model structures must be used with caution.</Paragraph> <Paragraph position="3"> The conditional shift-reduce parsing model he describes can be expected to exhibit the same type of competing-variable explaining-away issues that occur in MEMM tagging. As an extreme example, if all words have been shifted, the rest of the parser actions will be reductions with probability one.</Paragraph> <Paragraph position="4"> Goodman (1996) describes algorithms for parse selection where the criterion being maximized in parse selection is the bracket-based accuracy measure that parses are scored by. He shows a test-set accuracy benefit from optimizing accuracy directly.</Paragraph> <Paragraph position="5"> Finally, model structure and parameter estimation are not the entirety of factors which determine the behavior of a model. Model features are crucial, and the ability to incorporate richer features in a relatively sensible way also leads to improved models. This is the main basis of the real world benefit which has been derived from maxent models.</Paragraph> </Section> class="xml-element"></Paper>