File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1023_intro.xml
Size: 7,006 bytes
Last Modified: 2025-10-06 14:06:49
<?xml version="1.0" standalone="yes"?> <Paper uid="E99-1023"> <Title>Representing Text Chunks</Title> <Section position="3" start_page="0" end_page="174" type="intro"> <SectionTitle> 2 Methods and experiments </SectionTitle> <Paragraph position="0"> In this section we present and explain the data representation formats and the machine learning algorithm that we have used. In the final part we describe the feature representation used in our experiments.</Paragraph> <Section position="1" start_page="0" end_page="173" type="sub_section"> <SectionTitle> 2.1 Data representation </SectionTitle> <Paragraph position="0"> We have compared four complete and three partial data representation formats for the baseNP recognition task presented in (Ramshaw and Marcus, 1995). The four complete formats all use an I tag for words that are inside a baseNP and an 0 tag for words that are outside a baseNP. They differ</Paragraph> <Paragraph position="2"> gold was quoted at $ 366.50 an ounce . for seven different tagging formats. The I tag has been used for words inside a baseNP, \[:1 for words outside a baseNP, B and \[ for baseNP-initial words and E and \] for baseNP-final words.</Paragraph> <Paragraph position="3"> in their treatment of chunk-initial and chunk-final \[ + \] words:</Paragraph> <Paragraph position="5"> The first word inside a baseNP immediately following another baseNP receives a B tag (Ramshaw and Marcus, 1995).</Paragraph> <Paragraph position="6"> All baseNP-initial words receive a B tag (Ratnaparkhi, 1998).</Paragraph> <Paragraph position="7"> The final word inside a baseNP immediately preceding another baseNP receives an E tag.</Paragraph> <Paragraph position="8"> All baseNP-final words receive an E tag.</Paragraph> <Paragraph position="9"> We wanted to compare these data representation tbrmats with a standard bracket representation. We have chosen to divide bracketing experiments in two parts: one for recognizing opening brackets and one for recognizing closing brackets. Additionally we have worked with another partial representation which seemed promising: a tagging representation which disregards boundaries between adjacent chunks. These boundaries can be recovered by combining this format with one of the bracketing formats. Our three partial rep- null rcsentations are: \[ All baseNP-initial words receive an \[ tag, other words receive a. tag.</Paragraph> <Paragraph position="10"> \] All t)aseNP-final words receive a \] tag, other words receive a. tag.</Paragraph> <Paragraph position="11"> IO Words inside a baseNP receive an I tag, others receive an O tag.</Paragraph> <Paragraph position="12"> These partial representations can be combined ill three pairs which encode the complete baseNP structure, of tile data:</Paragraph> <Paragraph position="14"> A word sequence is regarded as a baseNP if the first word has received an \[ tag, the final word has received a \] tag and these are the only brackets that have been assigned to words in the sequence.</Paragraph> <Paragraph position="15"> In the IO format, tags of words that have received an I tag and an \[ tag are changed into B tags. The result is interpreted as the IOB2 format.</Paragraph> <Paragraph position="16"> In the IO format, tags of words that have received an I tag and a \] tag axe changed into E tags. The result is interpreted as the IOE2 format.</Paragraph> <Paragraph position="17"> Examples of the four complete formats and the three partial formats can be found in table 1.</Paragraph> </Section> <Section position="2" start_page="173" end_page="174" type="sub_section"> <SectionTitle> 2.2 Memory-Based Learning </SectionTitle> <Paragraph position="0"> We have build a baseNP recognizer by training a machine learning algorithm with correct tagged data and testing it with unseen data. The machine learning algorithm we used was a Memory-Based Learning algorithm (MBL). During training it stores a symbolic feature representation of a word in the training data together with its classification (chunk tag). In the testing phase the algorithm compares a feature representation of a test word with every training data item and chooses the classification of the training item which is closest to the test item.</Paragraph> <Paragraph position="1"> In the version of the algorithm that we have used, IBI-IG, the distances between feature representations are computed as the weighted sum of distances between individual features (Daeleroans et al., 1998). Equal features are defined to have distance 0, while the distance between other pairs is some feature-dependent value. This value is equal to the information gain of the feature, an information theoretic measure which contains the word/POS tag pair context sizes for the seven representation formats using 5-fold cross-validation on section 15 of the WSJ corpus.</Paragraph> <Paragraph position="2"> normalized entropy decrease of the classification set caused by the presence of the feature. Details of the algorithm can be found in (Daelemans et al., 1998) I.</Paragraph> </Section> <Section position="3" start_page="174" end_page="174" type="sub_section"> <SectionTitle> 2.3 Representing words with features </SectionTitle> <Paragraph position="0"> An important decision in an MBL experiment is the choice of the features that will be used for representing the data. IBI-IG is thought to be less sensitive to redundant features because of the data-dependent feature weighting that is included in the algorithm. We have found that the presence of redundant features has a negative influence on the performance of the baseNP recognizer.</Paragraph> <Paragraph position="1"> In (Ramshaw and Marcus, 1995) a set of transformational rules is used for modifying the classification of words. The rules use context information of the words, the part-of-speech tags that have been assigned to them and the chunk tags that are associated with them. We will use the same information as in our feature representation for words.</Paragraph> <Paragraph position="2"> In TBL, rules with different context information are used successively for solving different problems. We will use the same context information for all data. The optimal context size will be determined by comparing the results of different context sizes on the training data. Here we will perform four steps. We will start with testing diffhrent context sizes of words with their part-of-speech tag. After this, we will use the classification results of the best context size for determining the optimal context size for the classification tags.</Paragraph> <Paragraph position="3"> As a third step, we will evaluate combinations of classification results and find the best combination. Finally we will examine the influence of an MBL algorithm parameter: the number of examined nearest neighbors.</Paragraph> <Paragraph position="4"> ~lr~l-l(; is a part of the TiMBL software package which is available from http://ilk.kub.nl</Paragraph> </Section> </Section> class="xml-element"></Paper>