File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-1085_concl.xml
Size: 2,587 bytes
Last Modified: 2025-10-06 13:52:44
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1085"> <Title>Estimation of Stochastic Attribute-Value Grammars using an Informative Sample</Title> <Section position="8" start_page="591" end_page="591" type="concl"> <SectionTitle> 7 Comments </SectionTitle> <Paragraph position="0"> We argued that RFM estimation tbr broad-coverage attribute-valued grammars could be made eomputationally tractable by training upon an inforlnative sample. Our small-scale experiments suggested that using those parses that could be etliciently unpacked (SCFG sampling) was ahnost as effective as sampling from all possible parses (R~and salnplillg).</Paragraph> <Paragraph position="1"> Also, we saw that models should not be both built and also estimated using all possible parses. Better results can be obtained when models m'e built and trained using an intbrmative san@e.</Paragraph> <Paragraph position="2"> Given the relationshi I) between sample size and model complexity, we see that when there is a danger of overfitting, one should build models on the basis of all informative set. Itowever, this leaves open the possil)ility of training such a model upon a su1)erset of the, informative set;. Although we ha.re not tested this scenario, we believe that this would lead to t)etter results ttlan those achieved here.</Paragraph> <Paragraph position="3"> The larger scale experiments showed that I{FMs can be estimated using relatively long sentences.</Paragraph> <Paragraph position="4"> They also showed that a simple Gaussian prior could reduce the etfects of overfitting. However, they also showed that excessive overfitting probably required an alternative smoothing approach.</Paragraph> <Paragraph position="5"> The smaller and larger experiments can be both viewed as (complementary) ways of dealing with overfitting. We conjecture that of the two approaches, the informative smnple al)proach is preferable as it deals with overfitting directly: overfitting results fi'om fitting to complex a model with too little data.</Paragraph> <Paragraph position="6"> Our ongoing research will concentrate upon stronger ways of dealing with overfitting in lexicalised RFMs. One line we are pursuing is to combine a compression-based prior with an exponential model. This blends MDL with Maximum Entropy.</Paragraph> <Paragraph position="7"> We are also looking at alternative template sets.</Paragraph> <Paragraph position="8"> For example, we would probably benefit fi'om using templates that capture more of the syntactic context of a rule instantiation.</Paragraph> </Section> class="xml-element"></Paper>