File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2018_metho.xml
Size: 5,001 bytes
Last Modified: 2025-10-06 14:10:13
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2018"> <Title>MMR-based Active Machine Learning for Bio Named Entity Recognition</Title> <Section position="3" start_page="69" end_page="70" type="metho"> <SectionTitle> 2 MMR-based Active Learning for Bio- </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="69" end_page="69" type="sub_section"> <SectionTitle> medical Named-entity Recognition 2.1 Active Learning </SectionTitle> <Paragraph position="0"> We integrate active learning methods into the POSBIOTM/NER (Song et al. 2005) system by the following procedure: Given an active learning scoring strategy S and a threshold value th, at each iteration t, the learner uses training corpus TMt to train the NER module Mt. Each time a user wants to annotate a set of un-labeled sentences U, the system first tags the sentences using the current NER module Mt. At the same time, each tagged sentence is assigned with a score according to our scoring strategy S. Sentences will be marked if its score is larger than the threshold value th. The tag result is presented to the user, and those marked ones are rectified by the user and added to the training corpus. Once the training data accumulates to a certain amount, the NER module Mt will be retrained.</Paragraph> </Section> <Section position="2" start_page="69" end_page="69" type="sub_section"> <SectionTitle> 2.2 Uncertainty-based Sample Selection </SectionTitle> <Paragraph position="0"> We evaluate the uncertainty degree that the current NER module holds for a given sentence in terms of the entropy of the sentence. Given an input sequence o, the state sequence set S is a finite set.</Paragraph> <Paragraph position="1"> And is the probability distribution over S. By using the equation for CRF (Lafferty et al. 2001) module, we can calculate the probability of any possible state sequence s given an input sequence o. Then the entropy of is defined to be: The number of possible state sequences grows exponentially as the sentence length increases. In order to measure the uncertainty by entropy, it is inconvenient and unnecessary to compute the probability of all the possible state sequences. Instead we implement N-best Viterbi search to find the N state sequences with the highest probabilities. The entropy H(N) is defined as the entropy of the distribution of the N-best state sequences:</Paragraph> <Paragraph position="3"> The range of the entropy H(N) is [0,</Paragraph> </Section> <Section position="3" start_page="69" end_page="70" type="sub_section"> <SectionTitle> 2.3 Diversity-based Sample Selection </SectionTitle> <Paragraph position="0"> We measure the sentence structure similarity to represent the diversity and catch the most representative ones in order to give more diverse features to the machine learning-based classification systems.</Paragraph> <Paragraph position="1"> We propose a three-level hierarchy to represent the structure of a sentence. The first level is NP chunk, the second level is Part-Of-Speech tag, and the third level is the word itself. Each word is represented using this hierarchy structure. For example in the sentence &quot;I am a boy&quot;, the word &quot;boy&quot; is</Paragraph> <Paragraph position="3"> is defined from the top level as the number of levels that the two words are in common. Under our three-level hierarchy scheme above, each word representation has depth of 3.</Paragraph> <Paragraph position="4"> The structure of a sentence S is represented as the word representation vectors ], ,,[</Paragraph> <Paragraph position="6"> We measure the similarity of two sentences by the standard cosine-similarity measure. The similarity score of two sentences is defined as:</Paragraph> </Section> <Section position="4" start_page="70" end_page="70" type="sub_section"> <SectionTitle> 2.4 MMR Combination for Sample Selection </SectionTitle> <Paragraph position="0"> We would like to score the sample sentences with respect to both the uncertainty and the diversity.</Paragraph> <Paragraph position="1"> The following MMR (Maximal Marginal Relevance) (Carbonell and Goldstein 1998) formula is used to calculate the active learning score: where si is the sentence to be selected, Uncertainty is the entropy of si given current NER module M, and Similarity indicates the divergence degree between the si and the sentence sj in the training corpus TM of M. The combination rule could be interpreted as assigning a higher score to a sentence of which the NER module is uncertain and whose configuration differs from the sentences in the existing training corpus. The value of parameter l coordinates those two different aspects of the desirable sample sentences.</Paragraph> <Paragraph position="2"> After initializing a NER module M and an appropriate value of the parameterl , we can assign each candidate sentence a score under the control of the uncertainty and the diversity.</Paragraph> </Section> </Section> class="xml-element"></Paper>