File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1020_evalu.xml

Size: 3,543 bytes

Last Modified: 2025-10-06 13:59:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1020">
  <Title>A Fast Algorithm for Feature Selection in Conditional Maximum Entropy Modeling</Title>
  <Section position="6" start_page="10" end_page="10" type="evalu">
    <SectionTitle>
5 Comparison and Conclusion
</SectionTitle>
    <Paragraph position="0"> Feature selection has been an important topic in both ME modeling and linear regression. In the past, most researchers resorted to count cutoff technique in selecting features for ME modeling (Rosenfeld, 1994; Ratnaparkhi, 1998; Reynar and Ratnaparkhi, 1997; Koeling, 2000). A more refined algorithm, the incremental feature selection algorithm by Berger et al (1996), allows one feature being added at each selection and at the same time keeps estimated parameter values for the features selected in the previous stages. As discussed in (Ratnaparkhi, 1998), the count cutoff technique works very fast and is easy to implement, but has the drawback of containing a large number of re- null feature set and the feature subsets through the IFS algorithm, the SGC algorithm, the SGC algorithm with 500 look-ahead, and the count cutoff algorithm.</Paragraph>
    <Paragraph position="1"> dundant features. In contrast, the IFS removes the redundancy in the selected feature set, but the speed of the algorithm has been a big issue for complex tasks. Having realized the drawback of the IFS algorithm, Berger and Printz (1998) proposed an f-orthogonal condition for selecting k features at the same time without affecting much the quality of the selected features. While this technique is applicable for certain feature sets, such as link features between words, the f-orthogonal condition usually does not hold if part-of-speech tags are dominantly present in a feature subset.</Paragraph>
    <Paragraph position="2"> Chen and Rosenfeld (1999) experimented on a feature selection technique that uses a c  test to see whether a feature should be included in the ME model, where the c  test is computed using the counts from a prior distribution and the counts from the real training data. It is a simple and probably effective technique for language modeling tasks. Since ME models are optimized using their likelihood or likelihood gains as the criterion, it is important to establish the relationship between c  test score and the likelihood gain, which, however, is absent.</Paragraph>
    <Paragraph position="3"> There is a large amount of literature on feature selection in linear regression, where least mean squared errors measure has been the primary optimization criterion. Two issues need to be addressed in order to effectively use these techniques. One is the scalability issue since most statistical literature on feature selection only concerns with dozens or hundreds of features, while our tasks usually deal with feature sets with a million of features. The other is the relationship between mean squared errors and likelihood, similar to the concern expressed in the previous paragraph.</Paragraph>
    <Paragraph position="4"> These are important issues and require further investigation. null In summary, this paper presents our new improvement to the incremental feature selection algorithm. The new algorithm runs hundreds to thousands times faster than the original incremental feature selection algorithm. In addition, the new algorithm selects the features of a similar quality as the original Berger et al algorithm, which has also shown to be better than the simple cutoff method in some cases.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML