File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1304_metho.xml
Size: 6,767 bytes
Last Modified: 2025-10-06 14:10:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1304"> <Title>Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pages 25-31, Detroit, June 2005. c(c)2005 Association for Computational Linguistics A Machine Learning Approach to Acronym Generation</Title> <Section position="4" start_page="25" end_page="25" type="metho"> <SectionTitle> a0 SKIP </SectionTitle> <Paragraph position="0"> The generator skips the letter.</Paragraph> </Section> <Section position="5" start_page="25" end_page="25" type="metho"> <SectionTitle> a0 UPPER </SectionTitle> <Paragraph position="0"> If the target letter is uppercase, the generator outputs the same letter. If the target letter is lowercase, the generator coverts the letter into the corresponding upper letter.</Paragraph> </Section> <Section position="6" start_page="25" end_page="25" type="metho"> <SectionTitle> a0 LOWER </SectionTitle> <Paragraph position="0"> If the target letter is lowercase, the generator outputs the same letter. If the target letter is uppercase, the generator coverts the letter into the corresponding lowercase letter.</Paragraph> </Section> <Section position="7" start_page="25" end_page="25" type="metho"> <SectionTitle> a0 SPACE </SectionTitle> <Paragraph position="0"> The generator convert the letter into a space.</Paragraph> </Section> <Section position="8" start_page="25" end_page="25" type="metho"> <SectionTitle> a0 HYPHEN </SectionTitle> <Paragraph position="0"> The generator convert the letter into a hyphen.</Paragraph> <Paragraph position="1"> From the probabilistic modeling point of view, this task is to find the sequence of actions a1a3a2a3a4a5a4a5a4a6a1a8a7 that maximizes the following probability given the</Paragraph> <Paragraph position="3"> Observations are the letters in the definition and various types of features derived from them. We decompose the probability in a left-to-right manner.</Paragraph> <Paragraph position="5"> By making a first-order markov assumption, the equation becomes</Paragraph> <Paragraph position="7"> If we have the training data containing a large number of definition-acronym pairs where the definition is annotated with the labels for actions, we can estimate the parameters of this probabilistic model and the best action sequence can be efficiently computed by using a Viterbi decoding algorithm.</Paragraph> <Paragraph position="8"> In this paper we adopt a maximum entropy model (Berger et al., 1996) to estimate the local probabili-</Paragraph> <Paragraph position="10"> a9a22a21 since it can incorporate diverse types of features with reasonable computational cost. This modeling, as a whole, is called Maximum Entropy and the acronym is &quot;DuIFN-gamma&quot;. Each letter in the acronym is generated from a letter in the definition following the action for the letter.</Paragraph> <Paragraph position="11"> Regularization is important in maximum entropy modeling to avoid overfitting to the training data.</Paragraph> <Paragraph position="12"> For this purpose, we use the maximum entropy modeling with inequality constraints (Kazama and Tsujii, 2003). The model gives equally good performance as the maximum entropy modeling with Gaussian priors (Chen and Rosenfeld, 1999), and the size of the resulting model is much smaller than that of Gaussian priors because most of the parameters become zero. This characteristic enables us to easily handle the model data and carry out quick decoding, which is convenient when we repetitively perform experiments. This modeling has one parameter to tune, which is called width factor. We set this parameter to be 1.0 throughout the experiments.</Paragraph> </Section> <Section position="9" start_page="25" end_page="27" type="metho"> <SectionTitle> 3 The Data for Training and Testing </SectionTitle> <Paragraph position="0"> Since there is no training data available for the machine learning task described in the previous section, we manually created the data. First, we extracted definition-acronym pairs from MEDLINE abstracts using the acronym acquisition method proposed by (Schwartz and Hearst, 2003). The abstracts used for constructing the data were randomly selected from the abstracts published in the year of 2001. Duplicated pairs were removed from the set.</Paragraph> <Paragraph position="1"> In acquiring the pairs from the documents, we focused only on the pairs that appear in the form of ... expanded form (acronym) ...</Paragraph> <Paragraph position="2"> We then manually removed misrecognized pairs and annotated each pair with positional information. The positional information tells which letter in the definition should correspond to a letter in the acronym. Table 1 lists a portion of the data. For example, the positional information in the first pair indicates that the first letter 'i' in the definition corresponds to 'I' in the acronym, and the 12th letter 'm' corresponds to 'M'.</Paragraph> <Paragraph position="3"> With this positional information, we can create the training data for the sequence labeling task because there is one-to-one correspondence between the sequence labels and the data with positional information. In other words, we can determine the ap- null acronyms and the positional information.</Paragraph> <Paragraph position="4"> propriate action for each letter in the definition by comparing the letter with the corresponding letter in the acronym.</Paragraph> </Section> <Section position="10" start_page="27" end_page="27" type="metho"> <SectionTitle> 4 Features </SectionTitle> <Paragraph position="0"> Maximum entropy modeling allows us to incorporate diverse types of features. In this paper we use the following types of features in local classification.</Paragraph> <Paragraph position="1"> As an example, consider the situation where we are going to determine the action at the letter 'f' in the 1. The sequence of the letters ranging from the beginning of the word to the target letter. (i.e. &quot;a13a15a14a17a16a19a18a21a20a11a22a24a23 interf&quot;) 2. The sequence of the letters ranging from the target letter to the end of the word. (i.e.</Paragraph> <Paragraph position="2"> &quot;a13a25a14a26a16a15a27 a25a29a28a31a30 a23 feron&quot;) 3. The word containing the target letter. (i.e.</Paragraph> <Paragraph position="3"> &quot;a13a25a14a26a16a15a32a34a33a8a27a36a35 interferon&quot;) a0 Distance (DIS) 1. The distance between the target letter and the beginning of the word. (i.e. &quot;a37 a3 a13 a18a21a20a11a22a24a23 6&quot;) 2. The distance between the target letter and the tail of the word. (i.e. &quot;a37 a3 a13 a27 a25a29a28a31a30 a23 5&quot;)</Paragraph> </Section> class="xml-element"></Paper>