File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1096_metho.xml
Size: 16,495 bytes
Last Modified: 2025-10-06 14:08:41
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1096"> <Title>Generation of Relative Referring Expressions based on Perceptual Grouping</Title> <Section position="3" start_page="2" end_page="3" type="metho"> <SectionTitle> 2 Data Collection </SectionTitle> <Paragraph position="0"> We conducted a psychological experiment with 42 Japanese undergraduate students to collect referring expressions in which perceptual groups are used. In order to evaluate the collected expressions, we conducted another experiment with a different group of 44 Japanese undergraduate students. There is no overlap between the subjects of those two experiments. Details of this experiment are described in the following subsections.</Paragraph> <Section position="1" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 2.1 Collecting Referring Expressions Method Subjects were presented 2-dimensional </SectionTitle> <Paragraph position="0"> bird's-eye images in which several objects of the same color and the same size were arranged and the subjects were requested to convey a target object to the third person drawn in the same image. We used 12 images of arrangements. In each image, three to nine objects were arranged manually so that the objects distributes non-uniformly. An example of images presented to subjects is shown in Figure 2.</Paragraph> <Paragraph position="1"> Labels a,...,f,xin the image are assigned for purposes of illustration and are not assigned in the actual images presented to the subjects. Each subject was asked to describe a command so that the person in the image picks a target object that is enclosed with dotted lines. When a subject could not think of a proper expression, she/he was allowed to abandon that arrangement and proceed to the next one.</Paragraph> <Paragraph position="2"> Referring expressions designating the target object were collected from these subjects' commands.</Paragraph> <Paragraph position="3"> Analysis We presented 12 arrangements to 42 subjects and obtained 476 referring expressions.</Paragraph> <Paragraph position="4"> Twenty eight judgments were abandoned in the experiment. Observing the collected expressions, we found that starting from a group with all of the objects, subjects generally narrow down the group to a singleton group that has the target object. Therefore, a referring expression can be formalized as a sequence of groups (SOG) reflecting the subject's narrowing down process.</Paragraph> <Paragraph position="5"> The following example shows an observed expression describing the target x in Figure 2 with the corresponding SOG representation below it.</Paragraph> <Paragraph position="6"> &quot;hidari oku ni aru mittu no tama no uti no itiban migi no tama.&quot; (the rightmost ball among the three balls</Paragraph> <Paragraph position="8"> where {a,b,c,d,e,f,x} denotes all objects in the image (total set), {a,b,x} denotes the three objects at the back left, and {x} denotes the target.</Paragraph> <Paragraph position="9"> We denote an SOG representation by enclosing groups with square brackets.</Paragraph> <Paragraph position="10"> Since narrowing down starts from the total set, the SOG representation starts with a set of all objects and ends with a singleton group with the target. Translating the collected referring expressions into the SOG representation enables us to abstract and classify the expressions. On average, we obtained about 40 expressions for each arrangement, and classified them into 8.4 different SOG representations. null Although there are two types of relations between groups as we mentioned in Section 1, the expressions using only intra-group relations made up about 70% of the total.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 2.2 Evaluating the Collected Expressions </SectionTitle> <Paragraph position="0"> Method Subjects were presented expressions collected in the experiment described in Section 2.1 together with the corresponding images, and were requested to indicate objects referred to by the expressions. The presented images are the same as those used in the previous experiment except that there are no marks on the targets. At the same time, subjects were requested to express their confidence in selecting the target, and evaluate the conciseness, and the naturalness of the given expressions on a scale of 1 to 8.</Paragraph> <Paragraph position="1"> Because the number of expressions that we could evaluate with subjects was limited, we chose a maximum of 10 frequent expressions for each arrangement. The expressions were chosen so that as many different SOG representations were included as possible. If an arrangement had SOGs less than 10, several expressions that had the same SOG but different surface realizations were chosen. The resultant 117 expressions were evaluated by 49 subjects. Each subject evaluated about 29.5 expressions.</Paragraph> <Paragraph position="2"> Analysis Discarding incomplete answers, we obtained 1,429 evaluations in total. 12.2 evaluations were obtained for each expression on average.</Paragraph> <Paragraph position="3"> We measured the quality of each expression in terms of an evaluation value that is defined in (1).</Paragraph> <Paragraph position="4"> This measure is used to analyze what kind of expressions are preferred and to set up a scoring function (6) for machine-generated expressions as described in Section 3.1.</Paragraph> <Paragraph position="6"> According to our analysis, the expressions with only intra-group relations (84 samples) obtained high accuracies (Ave. 79.3%) and high evaluation values (Ave. 33.1), while the expressions with inter-group relations (33 samples) obtained lower accuracies (Ave. 69.1%) and lower evaluation values (Ave. 19.7).</Paragraph> <Paragraph position="7"> The expressions with only intra-group relations are observed more than double as many as the expressions with inter-group relations. We provide a couple of example expressions indicating object x in Figure 2 to contrast those two types of expres- null sions below.</Paragraph> <Paragraph position="8"> * without inter-group relations - &quot;the rightmost ball among the three balls at the back left&quot; * with inter-group relations - &quot;the ball behind the two front balls&quot; In addition, expressions explicitly mentioning all the objects obtained lower evaluation values. Considering these observations, we built a generation algorithm using only intra-group relations and did not mention all the objects explicitly.</Paragraph> <Paragraph position="9"> Among these expressions, we selected those with which the subjects successfully identified the target with more than 90% accuracy. These expressions are used to extract parameters of our generation algorithm in the following sections.</Paragraph> </Section> </Section> <Section position="4" start_page="3" end_page="3" type="metho"> <SectionTitle> 3 Generating Referring Expressions </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.1 Generation Algorithm </SectionTitle> <Paragraph position="0"> Given an arrangement of objects and a target, our algorithm generates referring expressions by the following three steps: Step 1: enumerate perceptual groups based on the proximity between objects Step 2: generate the SOG representations by combining the groups Step 3: translate the SOG representations into linguistic expressions In the rest of this section, we illustrate how these three steps generate referring expressions in the situation shown in Figure 2.</Paragraph> <Paragraph position="1"> To generate perceptual groups from an arrangement, Th'orisson's algorithm (Th'orisson, 1994) is adopted. Given a list of objects in an arrangement, the algorithm generates groups based on the proximity of the objects and returns a list of groups. Only groups containing the target, that is x, are chosen because SOG: [{a,b,c,d,e,f,x},{a,b,x},{x}] - E(R({a,b,c,d,e,f,x},{a,b,x})) + E({a,b,x})+E(R({a,b,x},{x})) + E({x}) - &quot;hidari oku no&quot;+&quot;mittu no tama&quot;+&quot;no uti no migihasi no&quot;+&quot;tama&quot; (at the back left) (three balls) (rightmost . . . among) (ball) we handle intra-group relations only as mentioned before, and that implies that all groups mentioned in an expression must include the target. Then, the groups are sorted in descending order of the group size. Finally a singleton group consisting of the target is added to the end of the list if such a group is missing in the list. The resultant group list, GL,is the output of Step 1.</Paragraph> <Paragraph position="2"> For example, the algorithm recognizes the following groups given the arrangement shown in Figure 2: {{a,b,c,d,e,f,x},{a,b,c,d,x}, {a,b,x},{c,d},{e,f}}.</Paragraph> <Paragraph position="3"> After filtering out the groups without the target and adding a singleton group with the target, we obtain the following list: {{a,b,c,d,e,f,x},{a,b,c,d, x},{a, b,x},{x}}.</Paragraph> <Paragraph position="4"> (2) In this step, the SOG representations introduced in Section 2 are generated from the GL of Step 1, which generally has a form like (3), where G</Paragraph> <Paragraph position="6"> notes a group, and G is a group of all the objects. Here, we narrow down the objects starting from the should be included in the SOG representation. For example, from a group list of {G In the last step, the SOG representations are translated into linguistic expressions. Since Japanese is a head-final language, the order of linguistic expressions for groups are retained in the final linguistic expression for the SOG representation. That is, an SOG representation [G</Paragraph> <Paragraph position="8"> As described in Section 2.2, expressions that explicitly mention all the objects obtain lower evaluation values, and expressions using intra-group relations obtain high evaluation values. Considering these observations, our algorithm does not use the linguistic expression corresponding to all the objects, that is E(G ), and only uses intra-group relations for R(X,Y ).</Paragraph> <Paragraph position="9"> Possible expressions of X are collected from the experimental data in Section 2.1, and the first applicable expression is selected when realizing a linguistic expression for X, i.e., E(X). Therefore, this algorithm produces one linguistic expression for each SOG even though there are some other possible expressions.</Paragraph> <Paragraph position="10"> For example, the SOG representation (4) is realized as shown in Figure 3.</Paragraph> <Paragraph position="11"> Note that there is no mention of all the objects, {a,b,c,d,e,f,x}, in the linguistic expression.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.2 Evaluation of Generated Expressions </SectionTitle> <Paragraph position="0"> We implemented the algorithm described in Section 3.1, and evaluated the output with 23 undergraduate students. The subjects were different from those of the previous experiments but were of the same age group, and the experimental environment was the same. The evaluation of the output was performed in the same manner as that of Section 2.2. The results are shown in Table 1. &quot;Human12-all&quot; shows the average values of all expressions collected from humans with 12 arrangements as described in Section 2.2. &quot;Human-12-90&quot; and &quot;Human-12-100&quot; show the average values of expressions by humans that gained more than 90% and 100% in accuracy in the same evaluation experiment respectively.</Paragraph> <Paragraph position="1"> &quot;System-12&quot; shows the average values of expressions generated by the algorithm for the 12 arrangements used in the data collection experiment described in Section 2.1. The algorithm generated 18 expressions for the 12 arrangements, which were presented to each subject in random order for evaluation. null &quot;System-20&quot; shows the average values of expressions generated by the algorithm for 20 randomly generated arrangements that generate at least two linguistic expressions each. The algorithm generated 48 expressions for these 20 arrangements, which were evaluated in the same manner as that of &quot;System-12&quot;.</Paragraph> <Paragraph position="2"> &quot;System-Average&quot; shows the micro average of expressions of both &quot;System-12&quot; and &quot;System-20&quot;. &quot;Accuracy&quot; shows the rates at which the subjects could identify the correct target objects from the given expressions. Comparing the accuracies of &quot;Human-12-*&quot; and &quot;System-12&quot;, we find that the algorithm generates good expressions. Moreover, the algorithm is superior to human in terms of &quot;Naturalness&quot; and &quot;Conciseness&quot;. However, this result should be interpreted carefully. Further investigation of the expressions revealed that humans often sacrificed naturalness and conciseness in order to describe the target as precisely as possible for complex arrangements.</Paragraph> </Section> </Section> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> 4 Scoring SOG Representations </SectionTitle> <Paragraph position="0"> The algorithm presented in the previous section outputs several possible expressions. Therefore, we have to choose one of the expressions by calculating their scores.</Paragraph> <Paragraph position="1"> The scores can be computed using various measures, such as complexity of expressions, and salience of referent objects. In this section, we investigate whether the adequacies of the courses of narrowing down can be predicted: that is, whether meaningful scores of SOG representations can be calculated.</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.1 Method for SOG Scoring </SectionTitle> <Paragraph position="0"> An SOG representation has a form as stated in (3).</Paragraph> <Paragraph position="1"> We presumed that, when a speaker tries to narrow down an object group from G</Paragraph> <Paragraph position="3"> . In other words, narrowing down a group from a very big one to a very small one might cause hearers to become confused.</Paragraph> <Paragraph position="4"> For example, consider the following two expressions that both indicate object x in Figure 2. Hearers would prefer (i) to (ii) though (ii) is simpler than (i). (i) &quot;the rightmost ball among the three balls at the back left&quot; (ii) &quot;the fourth ball from the right&quot; In fact, we found (i) among the expressions collected in Section 2.1, but did not find (ii) among them. Our algorithm generated both (i) and (ii) in Section 3.2, and the two expressions gained the evaluation values of 44.4 and 32.1 respectively.</Paragraph> <Paragraph position="5"> If our presumption is correct, we can expect to choose better expressions by choosing expressions that have adequate dimension ratios between groups.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> Calculation Formula </SectionTitle> <Paragraph position="0"> The total score of an SOG representation is calcu- null whose parameters are dimension ratios between two consecutive groups as given in (6), where n is the number of groups in the SOG.</Paragraph> <Paragraph position="1"> The dimension of a group dim is defined as the average distance between the centroid of the group and that of each object. The dimension of the singleton group {x} is defined as a constant value. Because of this idiosyncrasy of the singleton group {x} compared to other groups, f even though both functions represent the same concept, as described below. The optimal ratio between two groups, and that from a group to the target were found through the quadratic regression analysis of data collected in the experiment described in Section 2.2. f the two regression curves found through analysis representing correlations between dimension ratios and values calculated based on human evaluation as in formula (1).We could not find direct correlations between dimension ratios and accuracies.</Paragraph> </Section> </Section> class="xml-element"></Paper>