File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1092_evalu.xml
Size: 3,866 bytes
Last Modified: 2025-10-06 13:59:10
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1092"> <Title>Automatic extraction of paraphrastic phrases from medium size corpora</Title> <Section position="6" start_page="1" end_page="1" type="evalu"> <SectionTitle> 6 Evaluation </SectionTitle> <Paragraph position="0"> The evaluation concerned the extraction of information from a French financial corpus, about companies buying other companies. The corpus is made of 300 texts (200 texts for the training corpus, 100 texts for the test corpus).</Paragraph> <Paragraph position="1"> A system was first manually developed and evaluated. We then tried to perform the same task with automatically developed resources, so that a comparison is possible. The corpus is firstly normalized. For example, all the company names are replaced by a variable *c-company* thanks to the named entity recognizer. In the semantic network, *c-company* is introduced as a synonym of company, so that all the sequences with a proper name corresponding to a company could be extracted.</Paragraph> <Paragraph position="2"> For the slot corresponding to the company that is being bought, 6 seed example patterns were given to semantic expansion module.</Paragraph> <Paragraph position="3"> This module acquired from the corpus 25 new validated patterns. Each example pattern generated 4.16 new patterns on average. For example, from the pattern rachat de *c-company* we obtain the following list: reprise de *c-company* achat de *c-company* acquerir *c-company* racheter *c-company* cession de *c-company* This set of paraphrastic patterns includes nominal phrases (reprise de *c-company*) and verbal phrases (racheter *c-company*).</Paragraph> <Paragraph position="4"> The acquisition process concerns at the same time, the head and the expansion. The simultaneous acquisition of different semantic classes can also be found in the co-training algorithm proposed for this kind of task by E. Riloff and R. Jones (Riloff et Jones, 1999). The proposed patterns must be filtered and validated by the end-user. We estimate that generally 25% of the acquired pattern should be rejected. However, this validation process is very rapid: a few minutes only were necessary to check the 31 proposed patterns and retain 25 of them.</Paragraph> <Paragraph position="5"> We then compared these results with the ones obtained with the manually elaborated system. The evaluation concerned the three slots that necessitate a syntactic and semantic analysis: the company that is buying another one (arg1) the company that is being bought (arg2), the company that sells (arg3). These slots imply nominal phrases, they can be complex and a functional analysis is most of the time necessary (is the nominal phrase the subject or the direct object of the sentence?). We thus chose to perform an operational evaluation: what is evaluated is the ability of a given phrase or pattern to fill a given slot (also called textual entailment by Dagan and Glickman [2004]). This kind of evaluation avoids, as far as possible, the bias of human judgment on possibly ambiguous expressions.</Paragraph> <Paragraph position="6"> An overview of the results is given below (P refers to precision, R to recall, F to the harmonic mean between P and R): F: 70F: 81.9F: 77 We observed that the system running with automatically defined resources is about 10% less efficient than the one with manually defined resources. The decrease of performance may vary in function of the slot (the decrease is less important for the arg2 than for arg1 or arg3). Two kind of errors are observed: Certain sequences are not found because a relation between words is missing in the semantic net. Some sequences are extracted by the semantic analysis but do not correspond to a transformation registered in the syntactic variation management module.</Paragraph> </Section> class="xml-element"></Paper>