File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1079_metho.xml

Size: 8,982 bytes

Last Modified: 2025-10-06 14:07:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1079">
  <Title>Best Analysis Selection in Inflectional Languages</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Figures of Merit
</SectionTitle>
    <Paragraph position="0"> The overall figure of merit of the syntactic analysis results is determined as a combination of several contributory FOMs that reflect particular language features such as * frequency of syntactic constructs represented by pre-computed rule probabilities null * augmented n-gram model based on the occurrence of adjacent lexical heads standing for the corresponding subtrees * affinity between constituents modeled by valency frames of verbs, adjectives and nouns The selected FOMs participate on the determination of the most probable analysis. A straightforward approach lies in the linear combination of FOMs:</Paragraph>
    <Paragraph position="2"> where xi are the FOMs' contributions and li are empirically assigned weights (usually taken as normalizing coefficients). However, our experiments showed that the weights li need to reflect the behaviour of particular lexical items, their categories or even analysed constituents. We thus need to handle the li variables as functions of various pa-</Paragraph>
    <Paragraph position="4"> The following sections deal with the figures of merit that play a crucial role in the search for the best output analysis.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Rule-tied Actions and x1 FOM
</SectionTitle>
      <Paragraph position="0"> A key question is then what the good candidates for FOMs are. The use of probabilistic context-free grammars (PCFGs) involves simple CF rule probabilities to form a FOM (Chitrao and Grishman, 1990; Bobrow, 1991).</Paragraph>
      <Paragraph position="1"> The evaluation of the first FOM is based on the mechanism of contextual actions built into the metagrammar conception (SmrVz and Hor'ak, 2000). It distinguishes four kinds of contextual actions, tests or constraints:  1. rule-tied actions 2. agreement fulfilment constraints 3. post-processing actions 4. actions based on derivation tree  The rule-based probability estimations are solved on the first level by the rule-tied actions, which also serve as rule parameterization modifiers.</Paragraph>
      <Paragraph position="2"> Agreement fulfilment constraints are used in generating the expanded grammar (SmrVz and Hor'ak, 1999) or they serve also as chart pruning actions. In terms of (Maxwell III and Kaplan, 1991), the agreement fulfilment constraints represent the functional constraints, whose processing can be interleaved with that of phrasal constraints. The post-processing actions are not triggered until the chart is already completed. The main part of FOM computation for a particular input sentence is driven by actions on this level. Some figures of merit (e.g. verb valency FOM, see Section 2.3) demand exponential resources for computation over the whole chart structure. This problem is solved by splitting the calculation process into the pruning part (run on the level of post-processing actions) and the reordering part, that is postponed until the actions based on derivation tree.</Paragraph>
      <Paragraph position="3"> The actions that do not need to work with the whole chart structure are run after the best or n most probable derivation trees are selected. These actions are used, for example, for determination of possible verb valencies within the input sentence, which can produce a new ordering of the selected trees.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Augmented n-grams and x2 FOM
</SectionTitle>
      <Paragraph position="0"> The x1 FOM is based on rule frequencies and is not capable of describing the contextual information in the input. A popular technique for capturing the relations between sentence constituents is the n-gram method, which takes advantage of a fast and efficient evaluation algorithm.</Paragraph>
      <Paragraph position="1"> For instance, (Caraballo and Charniak, 1998) presents and evaluate different figures of merit in the context of best-first chart parsing. They recommend boundary trigram estimate that has achieved the best performance on two testing grammars. This technique, as well as stochastic POS tagging based on n-gram statistics, achieves satisfactory results for analytical languages (like English). However, in case of free word order languages, current studies suggest that these simple stochastic techniques considerably suffer from the data sparseness problem and require a huge amount of training data.</Paragraph>
      <Paragraph position="2"> The reduction of the number of possible training schemata, which correctly keeps the correspondence with the syntactic tree structure, is achieved by elaborate selection of n-gram candidates. While the standard n-gram techniques work on the surface level, this approach allows us to move up to the syntactic tree level. We advantageously use the ability of lexical heads to represent the key features of the subtree formed by its dependants (see Figure 2). The principle of lexical heads has shown to be fruitfully exploited in the analysis of free word order languages. The obtained cut-down of the amount of training data may be also crucial to the usability of this stochastic technique.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Verb Valencies and x3 FOM
</SectionTitle>
      <Paragraph position="0"> Our experiments have shown that, in case of a really free word order language, the FOMs x1 and x2 are not always able to discover the correct reordering of analyses. So as to cope with the above mentioned difficulties in Slavonic languages (namely Czech), we propose to exploit the language specific features. Preliminary results indicate that the most advantageous approach is the one based upon valencies of the verb phrase -- a crucial concept in traditional linguistics.</Paragraph>
      <Paragraph position="1"> The part of the system dedicated to exploitation of information obtained from a list of verb valencies (Pala and VSeveVcek, 1997) is necessary for solving the prepositional attachment problem in particular. During the analysis of noun groups and prepositional noun groups in the role of verb valencies in a given input sentence one needs to be able to distinguish free adjuncts or modifiers from obligatory valencies. We are testing a set of heuristic rules that determine With Charles Peter angered at the last meeting  whether a found noun group typically serves as a free adjunct. The heuristics are based on the lexico-semantic constraints (SmrVz and Hor'ak, 1999).</Paragraph>
      <Paragraph position="2"> An example of the application of the heuristics is depicted in Figure 3. In the presented Czech sentence, the expression na Karla (with Charles) is denoted as a verb argument by the valency list of the verb rozhnVevat se (anger), while the prepositional noun phrase na sch@uzi (at the meeting) is classified as a free adjunct by the rule specifying that the preposition na (at) in combination with an &lt;ACTIVITY&gt; class member (in locative) forms a location expression. The remaining constituent na mzdu (for payroll) is finally recommended as a modifier of the preceding noun phrase z'aloze ([about the] advance).</Paragraph>
      <Paragraph position="3"> Certainly, we also need to discharge the dependence on the surface order. Therefore, before the system confronts the actual verb valencies from the input sentence with the list of valency frames found in the lexicon, all the valency expressions are reordered. By using the standard ordering of participants, the valency frames can be handled as pure sets independent on the current position of verb arguments.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Preferred Word Order
</SectionTitle>
      <Paragraph position="0"> In analytical languages, the word order is usually taken as rather fixed and that is why it can be employed in parsing tree pruning algorithms. However, in case of inflectional languages, the approaches to word order analysis are diverse. The most influential theory works with the topic-focus articulation (Sgall et al., 1986). Although nearly all rules that could limit the order of constituents in Czech sentences can be fully relaxed, a standard order of participants can be defined. A corpus analysis of general texts affirms that this preferred word order is often followed and that it can be advantageously used as an arbiter for best analysis selection.</Paragraph>
      <Paragraph position="1"> Cases where the xi FOMs do not unambiguously elect the best candidates can be routed by the preferred word order in the form of functional weights li( ) with appropriate parameters.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML