File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-3006_metho.xml
Size: 11,914 bytes
Last Modified: 2025-10-06 14:10:30
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-3006"> <Title>Semantic Discourse Segmentation and Labeling for Route Instructions</Title> <Section position="5" start_page="31" end_page="34" type="metho"> <SectionTitle> 3 Semantic Analysis </SectionTitle> <Paragraph position="0"> Note that in this paper, given an instruction, one step in the instruction corresponds to one action shown to the subject, one episode of action detection and tracking, and one segment of the text.</Paragraph> <Paragraph position="1"> In order to annotate unambiguously, we need to detect and track both landmarks and actions. A landmark is a hallway or a door, and an action is a sequence of a few moves one will make with respect to a specific landmark.</Paragraph> <Paragraph position="2"> The moves one can make in this map are: (M1). Advancing to x, (M2). Turning left/right to face x, and (M3). Entering x.</Paragraph> <Paragraph position="3"> Here, x is a landmark. Note that all three moves have to do with the same landmark, and two or three moves on the same landmark constitute one action. An action is ambiguous until x is filled with an unambiguous landmark. The following is a made-up example in which each move in an action is mentioned explicitly.</Paragraph> <Paragraph position="4"> a. &quot;Go down the hallway to the second door on the right. Turn right. Enter the door.&quot; But you could break it down even further.</Paragraph> <Paragraph position="5"> b. &quot;Go down the hallway. You will see two doors on the right. Turn right and enter the second.&quot; One can add any amount of extra information to an instruction and make it longer, which people seem to do. However, we see the following as well. c. &quot;Enter the second door on the right.&quot; In one sentence, this sample contains the advance, the turn and the entering. In the corpus, the norm is to assume the move (M1) when an expression indicating the move (M2) is present. Similarly, an expression of move (M3) often implicitly assumes the move (M1) and (M2). However, in some cases they are explicitly stated, and when this happens, the action that involves the same landmark must be tracked across the sentences.</Paragraph> <Paragraph position="6"> Since all three samples result in the same action, for the back-end it is best not to differentiate the three. In order to do this, actions must be tracked just like landmarks in the corpus.</Paragraph> <Paragraph position="7"> The following two samples illustrate the need to track actions.</Paragraph> <Paragraph position="8"> d. &quot;Go down the hallway until you see two doors. Turn right and enter the second door on the right.&quot; In this case, there is only one action in the instruction, and &quot;turn right&quot; belongs to the action &quot;advance to the second door on the right, and then turn right to face it, and then enter it.&quot; e. &quot;Proceed to the first hallway on the right. Turn right and enter the second door on the right.&quot; There are two actions in this instruction. The first is &quot;advance to the first hallway on the right, and then turn right to face the hallway.&quot; The phrase &quot;turn right&quot; belongs to this first action. The second action is the same as the one in the example (d). Unless we can differentiate between the two, the execution of the unnecessary turn results in failure when following the instructions in the case (d). This illustrates the need to track actions across a few sentences. In the last example, it is important to realize that &quot;turn right&quot; has something to do with a door, so that it means &quot;turn right to face a door&quot;. Furthermore, since &quot;enter the second door on the right&quot; contains &quot;turning right to face a door&quot; in its semantics as well, they can be thought of as the same action. Thus, the critical feature required in the annotation scheme is to track actions and landmarks.</Paragraph> <Paragraph position="9"> The simplest annotation scheme that can show how actions are tracked across the sentences is to segment the instruction into different episodes of action detection and tracking. Note that each episode corresponds to exactly one action shown to the subject during the experiment. The annotation is based on the semantics, not on the the mentions of moves or landmarks. Since each segment involves exactly one landmark, we can label the segment with an action and a specific landmark.</Paragraph> <Paragraph position="10"> For example, GHR1 := &quot;advance to the first hallway on the right, then turn right to face it.&quot; EDR2 := &quot;advance to the second door on the right, then turn right to face it, then enter it.&quot; GHLZ := &quot;advance to the hallway on the left at the end of the hallway, then turn left to face it.&quot; EDSZ := &quot;advance to the door straight ahead of you, then enter it.&quot; Note that GH=go-hall, ED=enter-door, R1=first-right, LZ=left-at-end, SZ=ahead-of-you.</Paragraph> <Paragraph position="11"> The total number of possible actions is 15.</Paragraph> <Paragraph position="12"> This way, we can reduce the front-end task into a sequence of tagging tasks, much like the noun phrase chunking in the CoNLL-2000 shared task (Tjong Kim Sang and Buchholz, 2000). Given a sequence of input tokens that forms a route instruction, a sequence of output labels, with each label matching an input token was prepared. We annotated with the BIO tagging scheme used in syntactic chunkers (Ramshaw and Marcus, 1995).</Paragraph> <Paragraph position="13"> From the output labels, we create the parts in a linear-chain undirected graph (Table 1). Our use of term part is based on (Bartlett et al, 2004). For each pair (xi,yi) in the training set, xi is the token (in the first column, Table 1), and yi is the part (in the second and third column, Table 1). There are two kinds of parts: node and transition. A node part tells us the position and the label,<B-GHL1,0> ,<I-GHL1,1> , and so on. A transition part encodes a transition. For example, between tokens 0 and 1 there is a transition from tag B-GHL1 to I-GHL1. The part that describes this transition is: <B-GHL1,I-GHL1,0,1> .</Paragraph> <Paragraph position="14"> We factor the score of this linear node-transition structure as the sum of the scores of all the parts in y, where the score of a part is again the sum of the feature weights for that part.</Paragraph> <Paragraph position="15"> To score a pair (xi,yi) in the training set, we take each part in yi and check the features associated with it via lexicalization. For example, a part <I-GHL1,1> could give rise to binary features such as, and so on. The features used in this experiment are listed in Table 2.</Paragraph> <Paragraph position="16"> If a feature is present, the feature weight is added. The sum of the weights of all the parts is the score of the pair (xi,yi). To represent this summation, we write s(xi,yi) = w[?]f(xi,yi) where f represents the feature vector and w is the weight vector. We could also have w[?]f(xi,{p}) where p is a single part, in which case we just write s(p).</Paragraph> <Paragraph position="17"> Assuming an appropriate feature representation as well as a weight vector w, we would like to find the highest scoring y = argmaxy'(w[?]k f(y',x)) given an input sequence x. We next present a version of this decoding algorithm that returns the best y consistent with the map.</Paragraph> <Paragraph position="18"> Inferring the Path in the Map The action labels are unambiguous; given the current position, the map, and the action label, there is only one position one can go to. This back-end computation can be integrated into the Viterbi algorithm. The function 'go' takes a pair of (action label, start position) and returns the end position or null if the action cannot be executed at the start position according to the map. The algorithm chooses the best among the label sequences with a legal path in the map, as required by the condition (cost > bestc[?]end negationslash= null). Once the model is trained, we can then use the modified version of the Viterbi algorithm (Algorithm 4.1) to find the destination in the map.</Paragraph> <Paragraph position="20"> Given the above problem formulation, we trained the linear-chain undirected graphical model as Conditional Random Fields (Lafferty et al, 2001; Sha and Pereira, 2003), one of the best performing chunkers. We assume the probability of seeing y given x is</Paragraph> <Paragraph position="22"> where y' is all possible labeling for x , Now, given a training set T = {(xiyi)}mi=1, We can learn the weights by maximizing the log-likelihood,summationtext i logP(yi|xi). A detailed description of CRFs can be found in (Lafferty et al, 2001; Sha and Pereira, 2003; Malouf, 2002; Peng and McCallum, 2004). We used an implementation called CRF++ which can be found in (Kudo, 2005)</Paragraph> <Section position="1" start_page="34" end_page="34" type="sub_section"> <SectionTitle> 4.2 System 2: Baseline </SectionTitle> <Paragraph position="0"> Suppose we have clean data and there is no need to track an action across sentences or phrases. Then, the properties of an action are mentioned exactly once for each episode.</Paragraph> <Paragraph position="1"> For example, in &quot;go straight and make the first left you can, then go into the first door on the right side and stop&quot; , LEFT and FIRST occur exactly once for the first action, and FIRST, DOOR and RIGHT are found exactly once in the next action.</Paragraph> <Paragraph position="2"> In a case like that, the following baseline algorithm should work well.</Paragraph> <Paragraph position="3"> * Find all the mentions of LEFT/RIGHT, * For each occurrence of LEFT/RIGHT, look for an ordinal number, LAST, or END (= end of the hallway) nearby, * Also, for each LEFT/RIGHT, look for a mention of DOOR. If DOOR is mentioned, the action is about entering a door.</Paragraph> <Paragraph position="4"> * If DOOR is not mentioned around LEFT/RIGHT, then the action is about going to a hallway by default, * If DOOR is mentioned at the end of an instruction without LEFT/RIGHT, then the action is to go straight into the room.</Paragraph> <Paragraph position="5"> * Put the sequence of action labels together according to the mentions collected.</Paragraph> <Paragraph position="6"> count average length In this case, all that's required is a dictionary of how a word maps to a concept such as DOOR. In this corpus, &quot;door&quot;, &quot;office&quot;, &quot;room&quot;, &quot;doorway&quot; and their plural forms map to DOOR, and the ordinal number 1 will be represented by &quot;first&quot; and &quot;1st&quot;, and so on.</Paragraph> </Section> </Section> <Section position="6" start_page="34" end_page="34" type="metho"> <SectionTitle> 5 Dataset </SectionTitle> <Paragraph position="0"> As noted, we have 427 route instructions, and the average number of steps was 1.86 steps per instruction. We had 189 cases in which a sentence boundary was found in the middle of a step. Table 3 shows how often action steps occurred in the corpus and average length of the segments.</Paragraph> <Paragraph position="1"> One thing we noticed is that somehow people do not use a short phrase to say the equivalent of &quot;enter the door straight ahead of you&quot;, as seen by the average length of EDSZ. Also, it is more common to say the equivalent of &quot;take a right at the end of the hallway&quot; than that of &quot;go to the second hallway on the right&quot;, as seen by the count of GHR2 and GHRZ. The distribution is highly skewed; there are a lot more GHL1 than GHL2.</Paragraph> </Section> class="xml-element"></Paper>