XML Viewer - p06-1008

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1008_metho.xml
Size: 25,190 bytes
Last Modified: 2025-10-06 14:10:16
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1008">
  <Title>Acceptability Prediction by Means of Grammaticality Quantification</Title>
  <Section position="3" start_page="57" end_page="58" type="metho">
    <SectionTitle>
2 Constraint-based parsing
</SectionTitle>
    <Paragraph position="0"> Constraints are generally used in linguistics as a control process, verifying that a syntactic structure (e.g. a tree) verifies some well-formedness conditions. They can however play a more general role, making it possible to express syntactic information without using other mechanism (such as a generation function). Property Grammars (noted  hereafterPG)aresuchafullyconstraint-basedformalism. In this approach, constraints stipulate different kinds of relation between categories such as linear precedence, imperative co-occurrence, dependency, repetition, etc. Each of these syntactic relations corresponds to a type of constraint (also called property):  (set of possible constituents of NP) In PG, each category of the grammar is described with a set of properties. A grammar is then made of a set of properties. Parsing an input consists in verifying for each category of description the set of corresponding properties in the grammar. More precisely, the idea consists in verifying, for each subset of constituents, the properties for which they are relevant (i.e. the constraints that can be evaluated). Some of these properties are satisfied, some others possibly violated. The result of a parse, for a given category, is the set of its relevant properties together with their evaluation.</Paragraph>
    <Paragraph position="1"> Thisresultiscalledcharacterizationandisformed by the subset of the satisfied properties, notedP+, and the set of the violated ones, noted P[?].</Paragraph>
    <Paragraph position="2"> For example, the characterizations associated to theNPs&amp;quot;thebook&amp;quot;and&amp;quot;bookthe&amp;quot;arerespectively of the form: P+={Det [?] N; Det a59 N; N notdblarrowboth Pro; Uniq(Det), Oblig(N), etc.}, P[?]=[?] P+={Det a59 N; N notdblarrowboth Pro; Uniq(Det), Oblig(N), etc.}, P[?]={Det [?] N} This approach allows to characterize any kind of syntactic object. In PG, following the proposal made in Construction Grammar (see (Fillmore98), (Kay99)), all such objects are called constructions. They correspond to a phrase (NP, PP, etc.) as well as a syntactic turn (cleft, whquestions, etc.). All these objects are described by means of a set of properties (see (Blache05b)).</Paragraph>
    <Paragraph position="3"> In terms of parsing, the mechanism consists in exhibiting the potential constituents of a given construction. This stage corresponds, in constraint solving techniques, to the search of an assignment satisfying the constraint system. The particularity in PG comes from constraint relaxation. Here, the goal is not to find the assignment satisfying the constraint system, but the best assignment (i.e.</Paragraph>
    <Paragraph position="4"> the one satisfying as much as possible the system).</Paragraph>
    <Paragraph position="5"> In this way, the PG approach permits to deal with more or less grammatical sentences. Provided that  some control mechanisms are added to the process, PG parsing can be robust and efficient (see (Blache06)) and parse different material, including spoken language corpora.</Paragraph>
    <Paragraph position="6"> Using a constraint-based approach such as the one proposed here offers several advantages. First, constraint relaxation techniques make it possible to process any kind of input. When parsing non canonical sentences, the system identifies precisely, for each constituent, the satisfied constraints as well as those which are violated.</Paragraph>
    <Paragraph position="7"> It furnishes the possibility of parsing any kind of input, which is a pre-requisite for identifying a graded scale of grammaticality. The second important interest of constraints lies in the fact that syntactic information is represented in a nonholistic manner or, in other words, in a decentralized way. This characteristic allows to evaluate precisely the syntactic description associated with the input. As shown above, such a description is made of sets of satisfied and violated constraints.</Paragraph>
    <Paragraph position="8"> The idea is to take advantage of such a representation for proposing a quantitative evaluation of these descriptions, elaborated from different indicators such as the number of satisfied or violated constraints or the number of evaluated constraints.</Paragraph>
    <Paragraph position="9"> The hypothesis, in the perspective of a gradience account, is to exhibit a relation between a quantitative evaluation and the level of grammaticality: the higher the evaluation value, the more grammatical the construction. The value is then an indication of the quality of the input, according toagivengrammar. Inthenextsectionwepropose a method for computing this value.</Paragraph>
  </Section>
  <Section position="4" start_page="58" end_page="60" type="metho">
    <SectionTitle>
3 Characterization evaluation
</SectionTitle>
    <Paragraph position="0"> The first idea that comes to mind when trying to quantify the quality of a characterization is to calculate the ratio of satisfied properties with respect to the total set of evaluated properties. This information is computed as follows: Let C a construction defined in the grammar by means of a set of properties SC, let AC an assignment for the construction C,</Paragraph>
    <Paragraph position="2"> fied properties divided by the number of evaluated properties SR = N+E The SR value varies between 0 and 1, the two extreme values indicating that no properties are satisfied (SR=0) or none of them are violated (SR=1). However, SR only relies on the evaluated properties. It is also necessary to indicate whether a characterization uses a small or a large subpart of the properties describing the construction in the grammar. For example, the VP in our grammar is described by means of 25 constraints whereas the PP only uses 7 of them. Let's imagine the case where 7 constraints can be evaluated forbothconstructions, withanequal SR.However, the two constructions do not have the same quality: one relies on the evaluation of all the possible constraints (in the PP) whereas the other only uses a few of them (in the VP). The following formula takes these differences into account :</Paragraph>
    <Paragraph position="4"> of evaluated properties divided by the number of properties describing the construction in the grammar CC = ET These purely quantitative aspects have to be contrasted according to the constraint types. Intuitively, some constraints, for a given construction, play a more important role than some others. For example, linearprecedenceinlanguageswithpoor morphology such as English or French may have a greater importance than obligation (i.e. the necessity of realizing the head). To its turn, obligation may be more important than uniqueness (i.e. impossible repetition). In this case, violating a prop-erty would have different consequences according to its relative importance. The following examples illustrate this aspect:  (1) a. The the man who spoke with me is my brother. b. The who spoke with me man is my brother.</Paragraph>
    <Paragraph position="5"> In (1a), the determiner is repeated, violating a uniqueness constraint of the first NP, whereas (1c) violates a linearity constraint of the same NP.  Clearly, (1a) seems to be more grammatical than (1b) whereas in both cases, only one constraint is violated. Thiscontrasthastobetakenintoaccount in the evaluation. Before detailing this aspect, it is important to note that this intuition does not mean that constraints have to be organized into a ranking scheme, as with the Optimality Theory (see (Prince93)). The parsing mechanism remains the same with or without this information and the hierarchization only plays the role of a process control. null Identifying a relative importance of the types of constraints comes to associate them with a weight. Note that at this stage, we assign weights to constraint types, not directly to the constraints, differently from other approaches (cf. (Menzel98), (Foth05)). The experiment described in the next section will show that this weighting level seems to be efficient enough. However, in case of necessity, it remains possible to weight directly some constraints into a given construction, overriding thus the default weight assigned to the constraint types.</Paragraph>
    <Paragraph position="6"> The notations presented hereafter are used to describe constraint weighting. Remind that P+ and P[?] indicate the set of satisfied and violated properties of a given construction.</Paragraph>
    <Paragraph position="8"> One indication of the relative importance of the constraints involved in the characterization of a construction is given by the following formula: * QI: the quality index of a construction</Paragraph>
    <Paragraph position="10"> The QI index varies then between -1 and 1.</Paragraph>
    <Paragraph position="11"> A negative value indicates that the set of violated constraints has a greater importance than the set of satisfied one. This does not mean that more constraintsareviolatedthansatisfied, butindicatesthe importance of the violated ones.</Paragraph>
    <Paragraph position="12"> We now have three different indicators that can be used in the evaluation of the characterization: the satisfaction ratio (noted SR) indicating the ratio of satisfied constraints, the completeness coefficient (noted CC) specifying the ratio of evaluated constraints, and the quality index (noted QI) associated to the quality of the characterization according to the respective degree of importance of evaluated constraints. These three indices are used toformaglobalprecisionindex(notedPI). These three indicators do not have the same impact in the evaluation of the characterization, they are then balanced with coefficients in the normalized formula: null</Paragraph>
    <Paragraph position="14"> As such, PI constitutes an evaluation of the characterization for a given construction. However, it is necessary to take into account the &amp;quot;quality&amp;quot; of the constituents of the construction as well. A construction can satisfy all the constraints describing it, but can be made of embedded constituents more or less well formed. The overall indication of the quality of a construction has then to integrate in its evaluation the quality of each of its constituents. This evaluation depends finally on the presence or not of embedded constructions.</Paragraph>
    <Paragraph position="15"> In the case of a construction made of lexical constituents, no embedded construction is present and the final evaluation is the precision index PI as described above. We will call hereafter the evaluation of the quality of the construction the &amp;quot;grammaticality index&amp;quot; (noted GI). It is calculated as follows: * Let d the number of embedded constructions</Paragraph>
    <Paragraph position="17"> In this formula, we note GI(Ci) the grammaticality index of the construction Ci. The general formula for a construction C is then a function of its precision index and of the sum of the grammaticality indices of its embedded constituents. This  formula implements the propagation of the quality ofeachconstituent. Thismeansthatthegrammaticality index of a construction can be lowered when its constituents violate some properties. Reciprocally, this also means that violating a property at an embedded level can be partially compensated at the upper levels (provided they have a good grammaticality index).</Paragraph>
  </Section>
  <Section position="5" start_page="60" end_page="61" type="metho">
    <SectionTitle>
4 Grammaticality index from PG
</SectionTitle>
    <Paragraph position="0"> We describe in the remainder of the paper predictions of the model as well as the results of a psycholinguistic evaluation of these predictions. The idea is to evaluate for a given set of sentences on the one hand the grammaticality index (done automatically), on the basis of a PG grammar, and on the other hand the acceptability judgment given by a set of subjects. This experiment has been done for French, a presentation of the data and the experiment itself will be given in the next section.</Paragraph>
    <Paragraph position="1"> We present in this section the evaluation of grammaticality index.</Paragraph>
    <Paragraph position="2"> Before describing the calculation of the different indicators, we have to specify the constraints weights and the balancing coefficients used in PI.</Paragraph>
    <Paragraph position="3"> These values are language-dependent, they are chosen intuitively and partly based on earlier analysis, thischoicebeingevaluatedbytheexperiment as described in the next section. In the remainder, the following values are used:  Concerning the balancing coefficients, we give a greater importance to the quality index (coefficient k=2), which seems to have important consequences on the acceptability, as shown in the previous section. The two other coefficients are significatively less important, the satisfaction ratio being at the middle position (coefficient l=1) and the completeness at the lowest (coefficient m=0,5).</Paragraph>
    <Paragraph position="4"> Let's start with a first example, illustrating the process in the case of a sentence satisfying all constraints. null (2) Marie a emprunt'e un tr`es long cheminpour le retour. Mary took a very long way for the return.</Paragraph>
    <Paragraph position="5"> The first NP contains one lexical constituent, Mary. Three constraints, among the 14 describing the NP, are evaluated and all satisfied: Oblig(N), stipulating that the head is realized, Const(N), indicating the category N as a possible constituent, and Excl(N, Pro), verifying that N is not realized together with a pronoun. The following values come from this characterization:</Paragraph>
    <Paragraph position="7"> We can see that, according to the fact that all evaluated constraints are satisfied, QI and SR equal 1. However, the fact that only 3 constraints among 14 are evaluated lowers down the grammatical index. This last value, insofar as no constituents are embedded, is the same as PI.</Paragraph>
    <Paragraph position="8"> These results can be compared with another constituent of the same sentence, the VP. This construction also only contains satisfied properties. Its characterization is the following :</Paragraph>
    <Paragraph position="10"> of evaluated constraints (9 among the possible 25), the VP includes two embedded constructions : a PP and a NP. A grammaticality index has been calculated for each of them: GI(PP) = 1.24 GI(NP)=1.23. The following table indicates the different values involved in the calculation of the GI.</Paragraph>
    <Paragraph position="12"> The final GI of the VP reaches a high value. It benefits on the one hand from its own quality (indicated by PI) and on another hand from that of its embedded constituents. In the end, the final GI obtainedatthesentencelevelisfunctionofitsown PI (very good) and the NP and VP GIs, as shown in the table:</Paragraph>
    <Paragraph position="14"> Let's compare now these evaluations with those obtained for sentences with violated constraints, as in the following examples: (3) a. Marie a emprunt'e tr`es long chemin unpour le retour. Mary took very long way a for the return.</Paragraph>
    <Paragraph position="15"> b. Marie a emprunt'e un tr`es chemin pour le retour. Mary took a very way for the return.</Paragraph>
    <Paragraph position="16"> In (2a), 2 linear constraints are violated: a determiner follows a noun and an AP in &amp;quot;tr`es long chemin un&amp;quot;. Here are the figures calculated for this NP:</Paragraph>
    <Paragraph position="18"> The QI indicator is very low, the violated constraints being of heavy weight. The grammaticality index is a little bit higher because a lot of constraints are also satisfied. The NP GI is then propagated to its dominating construction, the VP. This phrase is well formed and also contains a well-formed construction (PP) as sister of the NP. Note that in the following table summarizing the VP indicators, the GI product of the embedded constituents is higher than the GI of the NP. This is due to the well-formed PP constituent. In the end, the GI index of the VP is better than that of the</Paragraph>
    <Paragraph position="20"> For the same reasons, the higher level construction S also compensates the bad score of the NP.</Paragraph>
    <Paragraph position="21"> However, in the end, the final GI of the sentence is much lower than that of the corresponding well-formed sentence (see above).</Paragraph>
    <Paragraph position="23"> The different figures of the sentence (2b) show that the violation of a unique constraint (in this case the Oblig(Adj) indicating the absence of the head in the AP) can lead to a global lower GI than the violation of two heavy constraints as for (2a).</Paragraph>
    <Paragraph position="24"> In this case, this is due to the fact that the AP only contains one constituent (a modifier) that does not suffice to compensate the violated constraint. The following table indicates the indices of the different phrases. Note that in this table, each phrase is a constituent of the following (i.e. AP belongs to NP itself belonging to VP, and so on).</Paragraph>
  </Section>
  <Section position="6" start_page="61" end_page="62" type="metho">
    <SectionTitle>
5 Judging acceptability of violations
</SectionTitle>
    <Paragraph position="0"> We ran a questionnaire study presenting participants with 60 experimental sentences like (11) to (55) below. 44 native speakers of French completed the questionnaire giving acceptability judgements following the Magnitude Estimation technique. 20 counterbalanced forms of the questionnaire were constructed. Three of the 60 experimental sentences appeared in each version in each form of the questionnaire, and across the 20 forms, each experimental sentence appeared once in each condition. Each sentence was followed by a question concerning its acceptability. These 60 sentences were combined with 36 sentences of various forms varying in complexity (simple main clauses, simple embeddings and doubly nested embeddings) and plausibility (from fully plausible to fairly implausible according to the intuitions of the experimenters). One randomization was made of each form.</Paragraph>
    <Paragraph position="1"> Procedure: The rating technique used was magnitude estimation (ME, see (Bard96)). Participants were instructed to provide a numeric score that indicates how much better (or worse) the current sentence was compared to a given reference sentence (Example: If the reference sentence was given the reference score of 100, judging a target sentence five times better would result in 500, judging it five times worse in 20). Judging the acceptability ratio of a sentence in this way results in a scale which is open-ended on both sides. It has been demonstrated that ME is therefore more sensitivethanfixedrating-scales, especiallyforscores that would approach the ends of such rating scales (cf. (Bard96)). Each questionnaire began with a written instruction where the subject was made familiar with the task based on two examples. After that subjects were presented with a reference sentence for which they had to provide a reference score. All following sentences had to be judged in relation to the reference sentence. Individual judgements were logarithmized (to arrive at a linear scale) and normed (z-standardized) before statistical analyses.</Paragraph>
    <Paragraph position="2"> Global mean scores are presented figure 1. We tested the reliability of results for different randomly chosen subsets of the materials. Constructions for which the judgements remain highly stable across subsets of sentences are marked by an asterisk (rs &gt; 0.90; p &lt; 0.001). The mean reliability across subsets is rs &gt; 0.65 (p &lt; 0.001).</Paragraph>
    <Paragraph position="3"> What we can see in these data is that in particular violations within prepositional phrases are not judged in a very stable way. The way they are judged appears to be highly dependent on the preposition used and the syntactic/semantic context. This is actually a very plausible result, given that heads of prepositional phrases are closed class items that are much more predictable in many syntactic and semantic environments than heads of  noun phrases and verb phrases. We will therefore base our further analyses mainly on violations within noun phrases, verb phrases, and adjectival phrases. Results including prepositional phrases will be given in parentheses. Since the constraints described above do not make any predictions for semanticviolations, weexcludedexamples25, 34, 45, and 55 from further analyses.</Paragraph>
  </Section>
  <Section position="7" start_page="62" end_page="63" type="metho">
    <SectionTitle>
6 Acceptability versus grammaticality
</SectionTitle>
    <Paragraph position="0"> index We compare in this section the results coming from the acceptability measurements described in section 5 and the values of grammaticality indices obtained as proposed section 4.</Paragraph>
    <Paragraph position="1"> From the sample of 20 sentences presented in figure 1, we have discarded 4 sentences, namely sentence 25, 34, 45 and 55, for which the property violation is of semantic order (see above). We are left with 16 sentences, the reference sentence satisfying all the constraints and 15 sentences violating one of the syntactic constraints. The results are presented figure 2. Acceptability judgment (ordinate) versus grammaticality index (abscissa) is plotted for each sentence. We observe a high coefficient of correlation (r = 0.76) between the two distributions, indicating that the grammaticality index derived from PG is a fairly good tracer of the observed acceptability measurements.</Paragraph>
    <Paragraph position="2"> The main contribution to the grammaticality index comes from the quality index QI (r = 0.69) while the satisfaction ratio SR and the complete-No violations 11. Marie a emprunt'e un tr`es long chemin pour le retour 0.465 NP-violations 21. Marie a emprunt'e tr`es long chemin un pour le retour -0.643 * 22. Marie a emprunt'e un tr`es long chemin chemin pour le retour -0.161 * 23. Marie a emprunt'e un tr`es long pour le retour -0.871 * 24. Marie a emprunt'e tr`es long chemin pour le retour -0.028 * 25. Marie a emprunt'e un tr`es heureux chemin pour le retour -0.196 * AP-violations  31. Marie a emprunt'e un long tr`es chemin pour le retour -0.41 * 32. Marie a emprunt'e un tr`es long long chemin pour le retour -0.216 33. Marie a emprunt'e un tr`es chemin pour le retour -0.619 34. Marie a emprunt'e un grossi`erement long chemin pour le retour -0.058 * PP-violations 41. Marie a emprunt'e un tr`es long chemin le retour pour -0.581 42. Marie a emprunt'e un tr`es long chemin pour pour le retour -0.078 43. Marie a emprunt'e un tr`es long chemin le retour -0.213 44. Marie a emprunt'e un tr`es long chemin pour -0.385 45. Marie a emprunt'e un tr`es long chemin dans le retour -0.415  null ness coefficient CC contributions, although significant, are more modest (r = 0.18 and r = 0.17 respectively).</Paragraph>
    <Paragraph position="3"> We present in figure 3 the correlation between acceptability judgements and grammaticality indices after the removal of the 4 sentences presenting PP violations. The analysis of the experiment described in section 5 shows indeed that acceptability measurements of the PP-violation sentences is less reliable than for others phrases. We  thusexpectthatremovingthesedatafromthesample will strengthen the correlation between the two distributions. The coefficient of correlation of the  Finally, the adequacy of the PG grammaticality indices to the measurements was investigated by means of resultant analysis. We adapted the parameters of the model in order to arrive at a good fit based on half of the sentences materials (randomly chosen from the full set), with a correlation of r = 0.85 (r = 0.76 including PPs) between the grammaticality index and acceptability judgements. Surprisingly, we arrived at the best fit with only two different weights: A weight of 2 for Exclusion, Uniqueness, and Requirement, and a weight of 5 for Obligation, Linearity, and Constituency. This result converges with the hard  and soft constraint repartition idea as proposed by (Keller00).</Paragraph>
    <Paragraph position="4"> The fact that the grammaticality index is based on these properties as well as on the number of constraints to be evaluated, the number of constraints to the satisfied, and the goodness of embedded constituents apparently results in a fined grained and highly adequate prediction even with this very basic distinction of constraints.</Paragraph>
    <Paragraph position="5"> Fixing these parameters, we validated the predictions of the model for the remaining half of the materials. Here we arrived at a highly reliable correlation of r = 0.86 (r = 0.67 including PPs) between PG grammaticality indices and acceptability judgements.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML