XML Viewer - c04-1096

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1096_concl.xml
Size: 5,164 bytes
Last Modified: 2025-10-06 13:53:50
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1096">
  <Title>Generation of Relative Referring Expressions based on Perceptual Grouping</Title>
  <Section position="7" start_page="3" end_page="3" type="concl">
    <SectionTitle>
5 Concluding Remarks and Future Work
</SectionTitle>
    <Paragraph position="0"> This paper proposed an algorithm that generates referring expressions using perceptual groups and n-ary relations among them. The algorithm was built on the basis of the analysis of expressions that were collected through psychological experiments. The performance of the algorithm was evaluated by 23 subjects and it generated promising results.</Paragraph>
    <Paragraph position="1"> In the following, we look at future work to be done.</Paragraph>
    <Paragraph position="2"> Recognizing salient geometric formations: Th'orisson's algorithm (Th'orisson, 1994) cannot recognize a linear arrangement of objects as a group, although such arrangements are quite salient for humans. This is one of the reasons for the disconformity between the evaluations given by the algorithm and those of the humans subjects.</Paragraph>
    <Paragraph position="3"> We can enumerate most of such geometric arrangements salient for human subject by referring to geometric terms found in lexicons and thesauri such as &amp;quot;line&amp;quot;, &amp;quot;circle&amp;quot;, &amp;quot;square&amp;quot; and so on. Th'orisson's algorithm should be extended to recognize these arrangements. null Using relations other than positional relations: In this paper, we focused on positional relations of perceptual groups. Other relations such as degree of color and size should be treated in the same manner.</Paragraph>
    <Paragraph position="4"> Th'orisson's original algorithm (Th'orisson, 1994) takes into account these relations as well as positional relations of objects when calculating similarity between objects to generate groups. However, if we generate groups using multiple relations simultaneously, the assumption used in Step 1 of our algorithm that any pair of groups in an output list do not intersect without a subsumption relation cannot be held. Therefore, the mechanism generating SOG representations (Step 2 in Section 3.1) must be reconsidered. null Resolving reference frames and differences of perspective: We assumed that all participants in a conversation shared the same reference frame.</Paragraph>
    <Paragraph position="5"> However, when we apply our method to conversational agent systems, e.g., (Cavazza et al., 2002; Tanaka et al., 2004), reference frames must be properly determined each time to generate referring expressions. Although there are many studies concerning reference frames, e.g., (Clark, 1973; Herskovits, 1986; Levinson, 2003), little attention has been paid to how reference frames are determined in terms of the perceptual groups and their elements.</Paragraph>
    <Paragraph position="6"> In addition to reference frames, differences of perspective also have to be taken into account to produce proper referring expressions since humans often view spatial relations between objects in a 3-dimensional space by projecting them on a 2-dimensional plane. In the experiments, we presented the subjects with 2-dimensional bird's-eye images. The result might have been different if we had used 3-dimensional images instead, because the projection changes the sizes of objects and spatial relations among them.</Paragraph>
    <Paragraph position="7"> Integration with conventional methods: In this paper, we focused on a limited situation where inherent attributes of objects do not serve any identifying function, but this is not the case in general. An algorithm integrating conventional attribute-based methods and the proposed method should be formulated to achieve the end goal.</Paragraph>
    <Paragraph position="8"> A possible direction would be to enhance the algorithm proposed by Krahmer et al. (Krahmer et al., 2003). They formalize an object arrangement (scene) as a labeled directed graph in which vertices model objects and edges model attributes and binary relations, and regard content selection as a subgraph construction problem. Their algorithm performs searches directed by a cost function on a graph to find a unique subgraph.</Paragraph>
    <Paragraph position="9"> If we consider a perceptual group as an ordinary object as shown in Figure 4, their algorithm is applicable. It will be able to handle not only intra-group relations (e.g., the edges with labels &amp;quot;front&amp;quot;, &amp;quot;middle&amp;quot;, and &amp;quot;back&amp;quot; in Figure 4) but also inter-group relations (e.g., the edge from &amp;quot;Group 1&amp;quot; to &amp;quot;Table&amp;quot; in Figure 4). However, introducing perceptual groups as vertices makes it difficult to design the cost function. A well-designed cost function is indispensable for generating concise and comprehensible expressions. Otherwise, an expression like &amp;quot;a ball in front of a ball in front of a ball&amp;quot; for the situation shown in Figure 1 would be generated.</Paragraph>
    <Paragraph position="10">  the situation shown in Figure 1</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML