File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1070_intro.xml

Size: 1,596 bytes

Last Modified: 2025-10-06 14:02:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1070">
  <Title>Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Bag-of-Concepts
</SectionTitle>
    <Paragraph position="0"> The standard BoW representations are usually refined before they are used as input to a classification algorithm. One refinement method is to use feature selection, which means that words are removed from the representations based on statistical measures, such as document frequency, information gain, kh2, or mutual information (Yang and Pedersen, 1997). Another refinement method is to use feature extraction, which means that &amp;quot;artificial&amp;quot; features are created from the original ones, either by using clusteringmethods, suchasdistributionalclustering (Baker and McCallum, 1998), or by using factor analytic methods such as singular value decomposition. null Note that feature extraction methods also handle problems with synonymy, by grouping together words that mean similar things, or by restructuring the data (i.e. the number of features) according to a small number of salient dimensions, so that similar words get similar representations. Since these methods do not represent texts merely as collections of the words they contain, but rather as collections of the concepts they contain -- whether these be synonym sets or latent dimensions -- a more fitting label for these representations would be Bag-of-Concepts (BoC).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML