File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2145_metho.xml
Size: 12,533 bytes
Last Modified: 2025-10-06 14:15:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2145"> <Title>Text Segmentation with Multiple Surface Linguistic Cues</Title> <Section position="5" start_page="881" end_page="882" type="metho"> <SectionTitle> 3 Automatically Weighting Multiple </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="881" end_page="882" type="sub_section"> <SectionTitle> Linguistic Cues </SectionTitle> <Paragraph position="0"> We think it is better to determine the weights automatically, because it can avoid the need for expert hand tuning and can achieve performance that is at least locally optimal. We use the training texts that are tagged with the correct segment boundaries. For automatically training the weights, we use the method of the multiple regression analysis(Jobson, 1991). We think the method can yield a set of weights that are better than those derived by a labor-intensive hand-tuning effort. Considering the following equation S(n, n + 1), at each point</Paragraph> <Paragraph position="2"> where a is a constant, p is the number of the cues, and wi is the estimated weight for the i-th cue, we can obtain the above equations in the number of the points in the training texts. Therefore, giving some value to S, we can calculate the weights wi for each cue automatically by the method of least squares.</Paragraph> <Paragraph position="3"> The higher values should be given to S(n, n + 1) at the segment boundary points than non-boundary 4We use the Kadokawa Ruigo Shin Jiten(Oono and Hamanishi, 1981) as Japanese thesaurus.</Paragraph> <Paragraph position="4"> points in the multiple regression analysis. If we can give the better value to S(n, n + 1) that reflects the real phenomena in the texts more precisely, we think we can expect the better performance. However, since we have only the correct segment boundaries that are tagged to the training texts, we decide to give 10 each S(n, n + 1) of the segment boundary point and -1 to the non-boundary point. These values were decided by the results of the preliminary experiment with four types of S.</Paragraph> <Paragraph position="5"> Watanabe(Watanabe, 1996) can be considered as a related work. He describes a system which automatically creates an abstract of a newspaper article by selecting important sentences of a given text. He applies the multiple regression analysis for weighting the surface features of a sentence in order to determine the importance of sentences. Each S of a sentence in training texts is given a score that the number of human subjects who judge the sentence as important, divided by the number of all subjects.</Paragraph> <Paragraph position="6"> We do not adopt the same method for giving a value to S, because we think that such a task by human subjects is labor-intensive.</Paragraph> </Section> </Section> <Section position="6" start_page="882" end_page="882" type="metho"> <SectionTitle> 4 Automatically Selecting Useful Cues </SectionTitle> <Paragraph position="0"> It is not clear which cues are useful in the linguistic cues listed in section 2. Useless cues might cause a bad effect on calculating weights in the multiple regression model. Furthermore, the overfitting problem is caused by the use of too many linguistic cues compared with the size of training data.</Paragraph> <Paragraph position="1"> If we can select only the useful cues from the entire set of cues, we can obtain better weights and improve the performance. However, we need an objective criteria for selecting useful cues. Fortunately, many parameter selecting methods have already been developed in the multiple regression analysis. We adopt one of these methods called the step-wise method which is very popular for parameter selection(Jobson, 1991).</Paragraph> <Paragraph position="2"> The most commonly used criterion for the addition and deletion of variables in the stepwise method is based on the partial F-statistic. The partial F-statistic is given by</Paragraph> <Paragraph position="4"> where SSR denotes the regression sum of squares, SSE denotes the error sum of squares, p is the number of linguistic cues, N is the number of training data, and q is the number of cues in the model at each selection step. SSR and SSE refer to the larger model with p cues plus an intercept, and SSRR refers to the reduced model with (p - q) cues and an intercept(Jobson, 1991).</Paragraph> <Paragraph position="5"> The stepwise method begins with a model that contains no cues. Next, the most significant cue is selected, and added to the model to form a new model(A) if and only if the partial F-statistic of the new model(A) is greater than Fir,. After adding the cue, some cues may be eliminated from the model(A) and a new model(B) is constructed if and only if the partial F-statistic of the model(B) is less than Fo~,t. These two processes occur repetitively until a certain termination condition is detected. Fin and Fo~,t are some prescribed the partial F-statistic limits.</Paragraph> <Paragraph position="6"> Although there are other popular methods for cue selection (for example, the forward selection method and the backward selection method), we use the stepwise method, because the stepwise method is expected to be superior to the other methods.</Paragraph> </Section> <Section position="7" start_page="882" end_page="884" type="metho"> <SectionTitle> 5 The Experiments </SectionTitle> <Paragraph position="0"> To give the evidence for the claims that are mentioned in the previous sections and are summarized below, we carry out some preliminary experiments to show the effectiveness of our approach.</Paragraph> <Paragraph position="1"> * Combining multiple surface cues is effective for text segmentation.</Paragraph> <Paragraph position="2"> * The multiple regression analysis with the step-wise method is good for selecting the useful cues and weighting these cues automatically.</Paragraph> <Paragraph position="3"> We pick out 14 texts, which are from the exam questions of the Japanese language that ask us to partition the texts into a given number of segments. The question is like &quot;Answer 3 points which partition the following text into semantic units.&quot; The system's performance is evaluated by comparing the system's outputs with the model answer attached to the above exam question.</Paragraph> <Paragraph position="4"> In our 14 texts, the average number of points (boundary candidates) is 20 (the range from 12 to 47). The average number of correct answers boundaries from the model answer is 3.4 (the range from 2 to 6). Here we do not take into account the information of paragraph boundaries (such as the indentation) at all due to the following two reasons: Many of the exam question texts have no marks of paragraph boundaries; In case of Japanese texts, it is pointed out that paragraph boundaries and segment boundaries do not always coincide with each other(Tokoro, 1987).</Paragraph> <Paragraph position="5"> In our experiments, the system generates the outputs in the order of the score scr(n,n + 1). We evaluate the performance in the cases where the system outputs 10%,20%,30%, and 40% of the number of boundary candidates. We use two measures, Recall and Precision for the evaluation: Recall is the quotient of the number of correctly identified boundaries by the total number of correct boundaries. Precision is the quotient of the number of correctly identified boundaries by the number of generated boundaries.</Paragraph> <Paragraph position="6"> The experiments are made on the following cases: 1. Use the information of except for lexical cohesion (cues from 1 to 18 and 23).</Paragraph> <Paragraph position="7"> 2. Use the information of lexical cohesion(cues from 19 to 22).</Paragraph> <Paragraph position="8"> 3. Use all linguistic cues mentioned in section 2. The weights are manually determined by one of the authors.</Paragraph> <Paragraph position="9"> 4. Use all linguistic cues mentioned in section 2. The weights are automatically determined by the multiple regression analysis. We divide 14 texts into 7 groups each consisting of 2 texts and use 6 groups for training and the remaining group for test. Changing the group for the test, we evaluate the performance by the cross validation(Weiss and Kulikowski, 1991).</Paragraph> <Paragraph position="10"> 5. Use only selected cues by applying the step-wise method. As mentioned in section 4, we use the stepwise method for selecting useful cues for training sets. The condition is the same as for the case 4 except for the cue selection.</Paragraph> <Paragraph position="11"> 6. Answer from five human subjects. By this experiment, we try to clarify the upper bound of the performance of the text segmentation task, which can be considered to indicate the degree of the difficulty of the task(Passonneau and Litman, 1993; Gale et al., 1992).</Paragraph> <Paragraph position="12"> Figure 1,2 and table 1 show the results of the experiments. Two figures show the system's mean performance of 14 texts. Table 1 shows the 5 subjects' mean performance of 14 texts (experiment 6). We think table 1 shows the upper bound of the performance of the text segmentation task. We also calculate the lower bound of the performance of the task(&quot;lowerbound&quot; in figure 2). It can be calculated by considering the case where the system selects boundary candidates at random. In the case, the precision equals to the mean probability that each candidate will be a correct boundary. The recall is equal to the ratio of outputs. In figure 1, comparing the performance among the case without lexical chains(&quot;ex.l&quot;), the one only with lexical chains(&quot;ex.2&quot;), and the one with multiple linguistic cues(&quot;ex.3&quot;), the results show that better performance can be yielded by using the whole set of the cues. In figure 2, comparing the performance of the case where the hand-tuned weights are used for multiple linguistic cues(&quot;ex.3&quot;) and the one where the automatic weights are determined with the training texts(&quot;ex.4.test&quot;), the results show that better performance can be yielded by automatically training the weights in general. Furthermore, since it can avoid the labor-intensive work and yield objective weights, automatic weighting is better than handtuning. null Comparing the performance of the case where the automatic weights are calculated with the entire set of cues(&quot;ex.4.test&quot; in figure 2) and the one where the automatic weights are calculated with selected cues(&quot;ex.5.test&quot;), the results show that better performance can be yielded by the selected cues. The result also shows that our cue selection method can avoid the overfitting problem in that the results for training and test data have less difference. The difference between &quot;ex.5.training&quot; and &quot;ex.5.test&quot; is less than the one between &quot;ex.4.training&quot; and &quot;ex.4.test&quot;. In our cue selection, the average number of selected cues is 7.4, though same cues are not always selected. The cues that are always selected are the contrastive conjunctives(cue 9 in section 2) and the lexical chains(cues 19 and 20 in section 2).</Paragraph> <Paragraph position="14"> We also make an experiment with another answer, where we use points in a text that 3 or more human subjects among five judged as segment boundaries.</Paragraph> <Paragraph position="15"> The average number of correct answers is 3.5 (the range from 2 to 6). As a result, our system can yield similar results as the one mentioned above.</Paragraph> <Paragraph position="16"> Litman and Passonneau(Litman and Passonneau, 1995)'s work can be considered to be a related research, because they presented a method for text segmentation that uses multiple knowledge sources.</Paragraph> <Paragraph position="17"> The model is trained with a corpus of spoken narratives using machine learning tools. The exact comparison is difficult. However, since the slightly lower upper bound for our task shows that our task is a bit more difficult than theirs, our performance is not inferior to theirs.</Paragraph> <Paragraph position="18"> In fact, our experiments might be small-scale with a few texts to show the correctness of our claims and the effectiveness of our approach. However, we think the initial results described here are encouraging.</Paragraph> </Section> class="xml-element"></Paper>