File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-1210_evalu.xml
Size: 2,323 bytes
Last Modified: 2025-10-06 13:58:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1210"> <Title>Efficient Deep Processing of Japanese</Title> <Section position="5" start_page="4" end_page="4" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> The grammar currently covers 93.4% of constructed examples for the banking domain (747 sentences) and 78.2% of realistic email correspondence data (316 sentences), concerning requests for documents. During three months of work, the coverage in the banking domain increased 48.49%. The coverage of the document request data increased 51.43% in the following two weeks.</Paragraph> <Paragraph position="1"> We applied the grammar to unseen data in one of the covered domains, namely the FAQ site of a Japanese bank. The coverage was 61%. 91.2% of the parses output were associated with all well-formed MRSs. That means that we could get correct MRSs in 55.61% of all sentences. Conclusion We described a broad coverage Japanese grammar, based on HPSG theory. It encodes syntactic, semantic, and pragmatic information. The grammar system is connected to a morphological analysis system and uses default entries for words unknown to the HPSG lexicon. Some basic constructions of the Japanese grammar were described. As the grammar is aimed at working in applications with real-world data, performance and robustness issues are important.</Paragraph> <Paragraph position="2"> The grammar is being developed in a multilingual context, where much value is placed on parallel and consistent semantic representations. The development of this grammar constitutes an important test of the cross-linguistic validity of the MRS formalism. The evaluation shows that the grammar is at a stage where domain adaptation is possible in a reasonable amount of time. Thus, it is a powerful resource for linguistic applications for Japanese.</Paragraph> <Paragraph position="3"> In future work, this grammar could be further adapted to another domain, such as the EDR newspaper corpus (including a headline grammar). As each new domain is approached, we anticipate that the adaptation will become easier as resources from earlier domains are reused. Initial evaluation of the grammar on new domains and the growth curve of grammar coverage should bear this out.</Paragraph> </Section> class="xml-element"></Paper>