File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/p03-1029_evalu.xml
Size: 3,422 bytes
Last Modified: 2025-10-06 13:58:58
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1029"> <Title>An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> One of the advantages of the proposed model is the ability to capture more varied context. The Predicate-Argument model relies for its context on the predicate and its direct arguments. However, some Predicate-Argument patterns may be too general, so that they could be applied to texts about a different scenario and mistakenly detect entities from them. For example, ((a17 C-ORGa18 -SBJ) happyo-suru), &quot;a17 C-ORGa18 reports&quot; may be the pattern used to extract an Organization in the Succession scenario but it is too general -- it could match irrelevant sentences by mistake. The proposed Subtree Model can acquire a more scenario-specific pattern ((a17 C-ORGa18 -SBJ)((shunin-suru-REL) jinji-OBJ) happyo-suru) &quot;a17 C-ORGa18 reports a personnel affair to appoint&quot;. Any scoring function that penalizes the generality of a pattern match, such as inverse document frequency, can successfully lessen the significance of too general patterns. The detailed analysis of the experiment revealed that the overly-general patterns are more severely penalized in the Subtree model compared to the Chain model. Although both models penalize general patterns in the same way, the Subtree model also promotes more scenario-specific patterns than the Chain model. In Figure 3, the large drop was caused by the pattern ((a17 C-DATEa18 -ON) a17 C-POSTa18 ), which was mainly used to describe the date of appointment to the C-POST in the list of one's professional history (which is not regarded as a Succession event), but also used in other scenarios in the business domain (18% precision by itself).</Paragraph> <Paragraph position="1"> Although the scoring function described in Section 3.3 is the same for both models, the Subtree model can also produce contributing patterns, such as ((a17 C-PERSONa18 a17 C-POSTa18 -SBJ)(a17 C-POSTa18 -TO) shuninsuru) &quot;a17 C-PERSONa18a12a17 C-POSTa18 was appointed to a17 C-POSTa18 &quot; whose ranks were higher than the problematic pattern. null Without generalizing case marking for nominalized predicates, the Predicate-Argument model excludes some highly contributing patterns with nominalized predicates, as some example patterns show in Figure 4. Also, chains of modifiers could be extracted only by the Subtree and Chain models.</Paragraph> <Paragraph position="2"> A typical and highly relevant expression for the Succession scenario is (((daihyo-ken-SBJ) aru-REL) a17 C-POSTa18 ) &quot;a17 C-POSTa18 with ministerial authority&quot;. Although, in the Arrest scenario, the superiority of the Subtree model to the other models is not clear, the general discussion about the capability of capturing additional context still holds. In Figure 4, the short pattern ((a17 C-PERSONa18 a17 C-POSTa18 -APPOS) a17 C-NUMa18 ), which is used for a general description of a person with his/her occupation and age, has relatively low precision (71%). However, with more relevant context, such as &quot;arrest&quot; or &quot;unemployed&quot;, the patterns become more relevant to Arrest scenario.</Paragraph> </Section> class="xml-element"></Paper>