Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 945–952,
Sydney, July 2006. c©2006 Association for Computational Linguistics
HAL-based Cascaded Model for Variable-Length 
Semantic Pattern Induction from Psychiatry Web Resources 
 
Liang-Chih Yu and Chung-Hsien Wu 
Department of Computer Science and Information Engineering
National Cheng Kung University 
Tainan, Taiwan, R.O.C. 
{lcyu, chwu}@csie.ncku.edu.tw 
Fong-Lin Jang 
Department of Psychiatry 
Chi-Mei Medical Center 
Tainan, Taiwan, R.O.C. 
jcj0429@seed.net.tw
  
 
Abstract 
Negative life events play an important 
role in triggering depressive episodes. 
Developing psychiatric services that can 
automatically identify such events is 
beneficial for mental health care and pre-
vention. Before these services can be 
provided, some meaningful semantic pat-
terns, such as <lost, parents>, have to be 
extracted. In this work, we present a text 
mining framework capable of inducing 
variable-length semantic patterns from 
unannotated psychiatry web resources. 
This framework integrates a cognitive 
motivated model, Hyperspace Analog to 
Language (HAL), to represent words as 
well as combinations of words. Then, a 
cascaded induction process (CIP) boot-
straps with a small set of seed patterns 
and incorporates relevance feedback to 
iteratively induce more relevant patterns. 
The experimental results show that by 
combining the HAL model and relevance 
feedback, the CIP can induce semantic 
patterns from the unannotated web cor-
pora so as to reduce the reliance on anno-
tated corpora. 
1 Introduction 
Depressive disorders have become a major threat 
to mental health. People in their daily life may 
suffer from some negative or stressful life events, 
such as death of a family member, arguments 
with a spouse, loss of a job, and so forth. Such 
life events play an important role in triggering 
depressive symptoms, such as depressed mood, 
suicide attempts, and anxiety. Therefore, it is 
desired to develop a system capable of identify-
ing negative life events to provide more effective 
psychiatric services. For example, through the 
negative life events, the health professionals can 
know the background information about subjects 
so as to make more correct decisions and sugges-
tions. Negative life events are often expressed in 
natural language segments (e.g., sentences). To 
identify them, the critical step is to transform the 
segments into machine-interpretable semantic 
representation. This involves the extraction of 
key semantic patterns from the segments. Con-
sider the following example. 
Two years ago, I lost my parents.     (Event) 
Since that, I have attempted to kill myself 
several times.   (Suicide) 
In this example, the semantic pattern <lost, par-
ents> is constituted by two words, which indi-
cates that the subject suffered from a negative 
life event that triggered the symptom “Suicide”. 
A semantic pattern can be considered as a se-
mantically plausible combination of k words, 
where k is the length of the pattern. Accordingly, 
a semantic pattern may have variable length. In 
Wu et al.’s study (2005), they have presented a 
methodology to identify depressive symptoms. In 
this work, we go a further step to devise a text 
mining framework for variable-length semantic 
pattern induction from psychiatry web resources. 
Traditional approaches to semantic pattern in-
duction can be generally divided into two 
streams: knowledge-based approaches and cor-
pus-based approaches (Lehnert et al., 1992; 
Muslea, 1999). Knowledge-based approaches 
rely on exploiting expert knowledge to design 
handcrafted semantic patterns. The major limita-
tions of such approaches include the requirement 
of significant time and effort on designing the 
handcrafted patterns. Besides, when applying to 
a new domain, these patterns have to be redes-
igned. Such limitations form a knowledge acqui-
sition bottleneck. A possible solution to reducing 
the problem is to use a general-purpose ontology 
945
such as WordNet (Fellbaum, 1998), or a domain-
specific ontology constructed using automatic 
approaches (Yeh et al., 2004). These ontologies 
contain rich concepts and inter-concept relations 
such as hypernymy-hyponymy relations. How-
ever, an ontology is a static knowledge resource, 
which may not reflect the dynamic characteris-
tics of language. For this consideration, we in-
stead refer to the web resources, or more restrict-
edly, the psychiatry web resources as our knowl-
edge resource. 
Corpus-based approaches can automatically 
learn semantic patterns from domain corpora by 
applying statistical methods. The corpora have to 
be annotated with domain-specific knowledge 
(e.g., events). Then, various statistical methods 
can be applied to induce variable-length semantic 
patterns from all possible combinations of words 
in the corpora. However, statistical methods may 
suffer from data sparseness problem, thus they 
require large corpora with annotated information 
to obtain more reliable parameters. For some ap-
plication domains, such annotated corpora may 
be unavailable. Therefore, we propose the use of 
web resources as the corpora. When facing with 
the web corpora, traditional corpus-based ap-
proaches may be infeasible. For example, it is 
impractical for health professionals to annotate 
the whole web corpora. Besides, it is also im-
practical to enumerate all possible combinations 
of words from the web corpora, and then search 
for the semantic patterns.  
To address the problems, we take the notion of 
weakly supervised (Stevenson and Greenwood, 
2005) or unsupervised learning (Hasegawa, 2004; 
Grenager et al., 2005) to develop a framework 
able to bootstrap with a small set of seed patterns, 
and then induce more relevant patterns form the 
unannotated psychiatry web corpora. By this 
way, the reliance on annotated corpora can be 
significantly reduced. The proposed framework 
is divided into two parts: Hyperspace Analog to 
Language (HAL) model (Burgess et al., 1998; 
Bai et al., 2005), and a cascaded induction proc-
ess (CIP). The HAL model, which is a cognitive 
motivated model, provides an informative infra-
structure to make the CIP capable of learning 
from unannotated corpora. The CIP treats the 
variable-length induction task as a cascaded 
process. That is, it first induces the semantic pat-
terns of length two, then length three, and so on. 
In each stage, the CIP initializes the set of se-
mantic patterns to be induced based on the better 
results of the previous stage, rather than enumer-
ating all possible combinations of words. This 
would be helpful to avoid noisy patterns propa-
gating to the next stage, and the search space can 
also be reduced. 
A crucial step for semantic pattern induction is 
the representation of words as well as combina-
tions of words. The HAL model constructs a 
high-dimensional context space for the psychia-
try web corpora. Each word in the HAL space is 
represented as a vector of its context words, 
which means that the sense of a word can be in-
ferred through its contexts. Such notion is de-
rived from the observation of human behavior. 
That is, when an unknown word occurs, human 
beings may determine its sense by referring to 
the words appearing in the contexts. Based on 
the cognitive behavior, if two words share more 
common contexts, they are more semantically 
similar. To further represent a semantic pattern, 
the HAL model provides a mechanism to com-
bine its constituent words over the HAL space. 
Once the HAL space is constructed, the CIP 
takes as input a seed pattern per run, and in turn 
induces the semantic patterns of different lengths. 
For each length, the CIP first creates the initial 
set based on the results of the previous stage. 
Then, the induction process is iteratively per-
formed to induce more patterns relevant to the 
given seed pattern by comparing their context 
distributions. In addition, we also incorporate 
expert knowledge to guide the induction process 
by using relevance feedback (Baeza-Yates and 
Ribeiro-Neto, 1999), the most popular query re-
formulation strategy in the information retrieval 
(IR) community. The induction process is termi-
nated until the termination criteria are satisfied. 
In the remainder of this paper, Section 2 pre-
sents the overall framework for variable-length 
semantic pattern induction. Section 3 describes 
the process of constructing the HAL space. Sec-
tion 4 details the cascaded induction process. 
Section 5 summarizes the experiment results. 
Finally, Section 6 draws some conclusions and 
suggests directions for future work.  
2 Framework for Variable-Length Se-
mantic Pattern Induction 
The overall framework, as illustrated in Figure 1, 
is divided into two parts: the HAL model and the 
cascaded induction process. First of all, the HAL 
space is constructed for the psychiatry web 
corpora after word segmentation. Then, each 
word in HAL space is evaluated by computing its 
distance to a given seed pattern. A smaller 
distance   represents   that    the   word   is   more  
946
Distance
Evaluation
Stop
Induced 
Patterns
Psychiatry
Web Corpora
HAL Space
Construction
Seed
Patterns
Word
Segmentation
HAL model
Iteration +1
Quality
Concepts
length 2 length 3 length k
...
No
Relevance 
Feedback
Iteration=0
Initial Set
(length k)
Yes
k +1
Cascaded Induction Process
Induced
Patterns
Relevant 
Patterns
 
Figure 1. Framework for variable-length seman-
tic pattern induction. 
semantically related to the seed pattern. 
According to the distance measure, the CIP 
generates quality concepts, i.e., a set of 
semantically related words to the seed pattern. 
The quality concepts and the better semantic 
patterns induced in the previous stage are 
combined to generate the initial set for each 
length. For example, in the beginning stage, i.e., 
length two, the initial set is the all possible 
combinations of two quality concepts. In the later 
stages, each initial set is generated by adding a 
quality concept to each of the better semantic 
patterns. After the initial set for a particular 
length is created, each semantic pattern and the 
seed pattern are represented in the HAL space for 
further computing their distance. The more 
similar the context distributions between two 
patterns, the closer they are. Once all the 
semantic patterns are evaluated, the relevance 
feedback is applied to provide a set of relevant 
patterns judged by the health professionals. 
According to the relevant information, the seed 
pattern can be refined to be more similar to the 
relevant set. The refined seed pattern will be 
taken as the reference basis in the next iteration. 
The induction process for each stage is 
performed iteratively until no more patterns are 
judged as relevant or a maximum number of 
iteration is reached. The relevant set produced at 
the last iteration is considered as the result of the 
semantic patterns. 
3 HAL Space Construction 
The HAL model represents each word in the vo-
cabulary  using   a   vector  representation.  Each  
w
1
w
2
wl
-2
wl
-1
wl
nullObservation window of length 
weight =1
2
 
Figure 2. Weighting scheme of the HAL model. 
 two years ago I lost my parents
two 0 0 0 0 0 0 0 
years 5 0 0 0 0 0 0 
ago 4 5 0 0 0 0 0 
I 3 4 5 0 0 0 0 
lost 2 3 4 5 0 0 0 
my 1 2 3 4 5 0 0 
parents 0 1 2 3 4 5 0 
Table 1. Example of HAL Space (window size=5) 
dimension of the vector is a weight representing 
the strength of association between the target 
word and its context word. The weights are com-
puted by applying an observation window of 
length l over the corpus. All words within the 
window are considered as co-occurring with each 
other. Thus, for any two words of distance d 
within the window, the weight between them is 
computed as 1ld− + . Figure 2 shows an exam-
ple. The HAL space views the corpus as a se-
quence of words. Thus, after moving the window 
by one word increment over the whole corpus, 
the HAL space is constructed. The resultant HAL 
space is an N N×  matrix, where N is the vo-
cabulary size. In addition, each word in the HAL 
space is called a concept. Table 1 presents the 
HAL space for the example text “Two years ago, 
I lost my parents.”  
3.1 Representation of a Single Concept 
For each concept in Table 1, the correspond-
ing row vector represents its left context infor-
mation, i.e., the weights of the words preceding it. 
Similarly, the corresponding column vector 
represents its right context information. Accord-
ingly, each concept can be represented by a pair 
of vectors. That is, 
()
12 1 2
(, )
 ,  ,  . . . ,  , ,  ,  . . . ,  ,
ii
i i iN i i iN
left right
icc
left left left right right right
ct ct ct ct ct ct
cvv
ww w w w w
=
=
               (1) 
where 
i
left
c
v and 
i
right
c
v represent the vectors of the 
left context information and right context infor-
mation of a concept 
i
c , respectively, 
ij
ct
w denotes  
947
11 1
 
          ...          
N
Left Context
left left
ct ct
ww
nullnullnullnullnullnullnullnullnullnullnull
1
c
N
c
.
.
.
11 1
 
         ...          
N
Right Context
right right
ct ct
ww
nullnullnullnullnullnullnullnullnullnullnull
 
Figure 3. Conceptual representation of the HAL 
space. 
the weight of the j-th dimension (
j
t ) of a vector, 
and N is the dimensionality of a vector, i.e., vo-
cabulary size. The conceptual representation is 
depicted in Figure 3. 
The weighting scheme of the HAL model is 
frequency-based. For some extremely infrequent 
words, we consider them as noises and remove 
them from the vocabulary. On the other hand, a 
high frequent word tends to get a higher weight, 
but this does not mean the word is informative, 
because it may also appear in many other vectors. 
Thus, to measure the informativeness of a word, 
the number of the vectors the word appears in 
should be taken into account. In principle, the 
more vectors the word appears in, the less infor-
mation it carries to discriminate the vectors. Here 
we use a weighting scheme analogous to TF-IDF 
(Baeza-Yates and Ribeiro-Neto, 1999) to re-
weight the dimensions of each vector, as de-
scribed in Equation (2). 
*log ,
()
ij ij
vector
ct ct
j
N
ww
vf t
=            (2) 
where 
vector
N  denotes the total number of vectors, 
and ( )
j
vf t  denotes the number of vectors with 
j
t  
as the dimension. After each dimension is re-
weighted, the HAL space is transformed into a 
probabilistic framework. Accordingly, each 
weight can be redefined as 
(|) ,
ij
ij
ij
ct
ct j i
ct
j
w
wPtc
w
≡=
∑
           (3) 
where (|)
j i
P tc is the probability that 
j
t  appears 
in the vector of 
i
c . 
3.2 Concept Combination 
A semantic pattern is constituted by a set of con-
cepts, thus it can be represented through concept 
combination over the HAL space. This forms a 
new concept in the HAL space. Let 
1
( ,..., )
S
spc c=  be a semantic pattern with S con-
stituent concepts, i.e., length S. The concept 
combination is defined as 
12 3
((...( ) ) ... ),
sS
cccc c⊕ ≡⊕⊕⊕⊕           (4) 
where ⊕  denotes the symbol representing the 
combination operator over the HAL space, 
s
c⊕  
denotes a new concept generated by the concept 
combination. The new concept is the representa-
tion of a semantic pattern, also a vector represen-
tation. That is, 
()
11
() () () ()
(, )
     ,  . . . ,  , ,  . . . ,  ,
ss
ssNssN
left right
scc
left left right right
ct ct ct ct
cvv
wwww
⊕⊕
⊕⊕⊕⊕
⊕=
=
               (5) 
The combination operator, ⊕ , is implemented 
by the product of the weights of the constituent 
concepts, described as follows. 
()
1
1
         ( | ),
sj sj
S
ct ct
s
S
j s
s
ww
Pt c
⊕
=
=
=
=
∏
∏
            (6) 
where 
()
s j
ct
w
⊕
denotes the weight of the j-th di-
mension of the new concept 
s
c⊕ . 
4 Cascaded Induction Process 
Given a seed pattern, the CIP is to induce a set of 
relevant semantic patterns with variable lengths 
(from 2 to k). Let 
1
( ,..., )
seed R
spcc=  be a seed 
pattern of length R, and 
1
( ,..., )
S
spc c=  be a 
semantic pattern of length S. The formal 
description of the CIP is presented as 
{ }
{} ()
11
  |      
( ,..., )  |  ( ,..., )     iff   , ,
seed
RS rs
sp sp
cc cc Distccλ
−
≡ −∀⊕≤
                (7) 
where |− denotes the symbol representing the 
cascaded induction, 
r
c⊕  and 
s
c⊕  are the two 
new concepts representing 
seed
sp  and sp , respec-
tively, and (  ,  )Dist ii represents the distance 
between two semantic patterns. The main steps 
in the CIP include the initial set generation, dis-
tance measure, and relevance feedback. 
4.1 Initial Set Generation 
The initial set for a particular length contains a 
set of semantic patterns to be induced, i.e., the 
search space. Reducing the search space would 
be helpful for speeding up the induction process, 
948
especially for inducing those patterns with a lar-
ger length. For this purpose, we consider that the 
words and the semantic patterns similar to a 
given seed pattern are the better candidates for 
creating the initial sets. Therefore, we generate 
quality concepts, a set of semantically related 
words to a seed pattern, as the basis to create the 
initial set for each length. Thus, each seed pattern 
will be associated with a set of quality concepts. 
In addition, the better semantic patterns induced 
in the previous stage are also considered. The 
goodness of words and semantic patterns is 
measured by their distance to a seed pattern. 
Here, a word is considered as a quality concept if 
its distance is smaller than the average distance 
of the vocabulary. Similarly, only the semantic 
patterns with a distance smaller than the average 
distance of all semantic patterns in the previous 
stage are preserved to the next stage. By the way, 
the semantically unrelated patterns, possibly 
noisy patterns, will not be propagated to the next 
stage, and the search space can also be reduced. 
The principles of creating the initial sets of se-
mantic patterns are summarized as follows.  
• In the beginning stage, the aim is to cre-
ate the initial set for the semantic pat-
terns with length two. Thus, the initial 
set is the all possible combinations of 
two quality concepts. 
• In the latter stages, each initial set is cre-
ated by adding a quality concept to each 
of the better semantic patterns induced in 
the previous stage. 
4.2 Distance Measure 
The distance measure is to measure the distance 
between the seed patterns and semantic patterns 
to be induced. Let 
1
( ,..., )
S
spc c=  be a semantic 
pattern and 
1
( ,..., )
seed R
spcc=  be a given seed 
pattern, their distance is defined as 
(),(,),
seed s r
Dist sp sp Dist c c=⊕⊕           (8) 
where ( , )
s r
Dist c c⊕⊕  denotes the distance be-
tween two semantic patterns in the HAL space. 
As mentioned earlier, after concept combination, 
a semantic pattern becomes a new concept in the 
HAL space, which means the semantic pattern 
can be represented by its left and right contexts. 
Thus, the distance between two semantic patterns 
can be computed through their context distance. 
Equation (8) thereby can be written as 
( ),(,)(,).
sr s r
left left Right Right
seed c c c c
Dist sp sp Dist v v Dist v v
⊕⊕ ⊕ ⊕
=+  (9) 
Because the weights of the vectors are repre-
sented using a probabilistic framework, each 
vector of a concept can be considered as a prob-
abilistic distribution of the context words. Ac-
cordingly, we use the Kullback-Liebler (KL) dis-
tance (Manning and Schütze, 1999) to compute 
the distance between two probabilistic distribu-
tions, as shown in the following. 
1
()
()()log ,
()
sr
N
js
cc js
j
jr
Pt c
Dv v Pt c
Pt c
⊕⊕
=
⊕
=⊕
⊕
∑
          (10) 
where (    )D ii denotes the KL distance be-
tween two probabilistic distributions. When 
Equation (10) is ill-conditioned, i.e., zero de-
nominator, the denominator will be set to a small 
value (10
-6
). For the consideration of a symmet-
ric distance, we use the divergence measure, 
shown as follows. 
(,) ( ) ( ).
sr s r r s
cc cc cc
Div v v D v v D v v
⊕⊕ ⊕⊕ ⊕⊕
=+       (11) 
By this way, the distance between two probabil-
istic distributions can be computed by their KL 
divergence. Thus, Equation (9) becomes 
(,) (,) ( , ).
sr sr s r
left left Right Right
cc cc c c
Dist v v Div v v Div v v
⊕⊕ ⊕⊕ ⊕ ⊕
=+ (12) 
After each semantic pattern is evaluated, a 
ranked list is produced for relevance judgment. 
4.3 Relevance Feedback 
In the induction process, some non-relevant se-
mantic patterns may have smaller distance to a 
seed pattern, which may decrease the precision 
of the final results. To overcome the problem, 
one possible solution is to incorporate expert 
knowledge to guide the induction process. For 
this purpose, we use the technique of relevance 
feedback. In the IR community, the relevance 
feedback is to enhance the original query from 
the users by indicating which retrieved docu-
ments are relevant. For our task, the relevance 
feedback is applied after each semantic pattern is 
evaluated. Then, the health professionals judge 
which semantic patterns are relevant to the seed 
pattern. In practice, only the top n semantic pat-
terns are presented for relevance judgment. Fi-
nally, the semantic patterns judged as relevant 
are considered to form the relevant set, and the 
others form the non-relevant set. According to 
the relevant and non-relevant information, the 
seed pattern can be refined to be more similar to 
the relevant set, such that the induction process 
can induce more relevant patterns and move 
away from noisy patterns in the future iterations. 
949
The refinement of the seed pattern is to adjust 
its context distributions (left and right). Such ad-
justment is based on re-weighting the dimensions 
of the context vectors of the seed pattern. The 
dimensions more frequently regarded as relevant 
patterns are more significant for identifying rele-
vant patterns. Hence, such dimensions of the 
seed pattern should be emphasized. The signifi-
cance of a dimension is measured as follows. 
()
()
() ,
ik
i
jk
j
ct
cR
k
ct
cR
w
Sig t
w
⊕
⊕∈
⊕
⊕∈
=
∑
∑
          (13) 
where ()
k
Sig t  denotes the significance of the di-
mension 
k
t , 
i
c⊕  and 
j
c⊕  denote the semantic 
patterns of the relevant set and non-relevant set, 
respectively, and 
()
ik
ct
w
⊕
 and 
()
j k
ct
w
⊕
 denote the 
weights of 
k
t  of 
i
c⊕  and 
j
c⊕ , respectively. The 
higher the ratio, the more significant the dimen-
sion is. In order to smooth ()
k
Sig t  to the range 
from zero to one, the following formula is used: 
1
() ( )
1
() .
1
ik jk
i j
k
ct c t
cR cR
Sig t
ww
−
⊕⊕
⊕∈ ⊕∈
=
⎛⎞
⎜⎟
+
⎝⎠
∑∑
    (14) 
The corresponding dimension of the seed pattern 
seed r
spc=⊕  is then re-weighted by 
() ()
().
rk rk
ct ct k
wwSigt
⊕⊕
=+          (15) 
Once the context vectors of the seed pattern 
are re-weighted, they are also transformed into a 
probabilistic form using Equation (3). The re-
fined seed pattern will be taken as the reference 
basis in the next iteration. The relevance feed-
back is performed iteratively until no more se-
mantic patterns are judged as relevant or a 
maximum number of iteration is reached. At the 
same time, the induction process for a particular 
length is also stopped. The whole CIP process is 
stopped until the seed patterns are exhausted 
5 Experimental Results 
To evaluate the performance of the CIP, we built 
a prototype system and provided a set of seed 
patterns. The seed patterns were collected by re-
ferring to the well-defined instruments for as-
sessing negative life events (Brostedt and Peder-
sen, 2003; Pagano et al., 2004). A total of 20 
seed patterns were selected by the health profes-
sionals. Then, the CIP randomly selects one seed 
pattern per run without replacement from the 
seed set, and iteratively induces relevant patterns 
from the psychiatry web corpora. The psychiatry 
web corpora used here include some professional 
mental health web sites, such as PsychPark 
(http://www.psychpark.org) (Bai, 2001) and John 
Tung Foundation (http://www.jtf.org.tw). 
In the following sections, we describe some 
experiments to in turn examine the effect of us-
ing relevance feedback or not, and the coverage 
on real data using the semantic patterns induced 
by different approaches. Because the semantic 
patterns with a length larger than 4 are very rare 
to express a negative life event, we limit the 
length k to the range of 2 to 4. 
5.1 Evaluation on Relevance Feedback 
The relevance feedback employed in this study 
provides the relevant and non-relevant informa-
tion for the CIP so that it can refine the seed pat-
tern to induce more relevant patterns. The rele-
vance judgment is carried out by three experi-
enced psychiatric physicians. For practical con-
sideration, only the top 30 semantic patterns are 
presented to the physicians. During relevance 
judgment, a majority vote mechanism is used to 
handle the disagreements among the physicians. 
That is, a semantic pattern is considered as rele-
vant if any two or more physicians judged it as 
relevant. Finally, the semantic patterns with ma-
jority votes are obtained to form the relevant set. 
To evaluate the effectiveness of the relevance 
feedback, we construct three variants of the CIP, 
RF(5), RF(10), and RF(20), implemented by ap-
plying the relevance feedback for 5, 10, and 20 
iterations, respectively. These three CIP variants 
are then compared to the one without using the 
relevance feedback, denoted as RF(－) . We use 
the evaluation metric, precision at 30 (prec@30), 
over all seed patterns to examine if the relevance 
feedback can help the CIP induce more relevant 
patterns. For a particular seed pattern, prec@n is 
computed as the number of relevant semantic 
patterns ranked in the top n of the ranked list, 
divided by n. Table 2 presents the results for k=2. 
The results reveal that the relevance feedback 
can help the CIP induce more relevant semantic 
patterns. Another observation indicates that ap-
plying the relevance feedback for more iterations  
 RF(－) RF(5) RF(10) RF(20)
prec@30 0.203 0.263 0.318 0.387
Table 2. Effect of applying relevance feedback 
for different number of iterations or not. 
950
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 5 10 15 20 25 30 35 40 45 50
Num. of Iterations
pr
e
c
@
3
0
RF(10)+pseudo
RF(20)
RF(─)
 
Figure 4. Effect of using the combination of rele-
vance feedback and pseudo-relevance feedback. 
can further improve the precision. However, it is 
usually impractical for experts to involve in the 
guiding process for too many iterations. Conse-
quently, we further consider pseudo-relevance 
feedback to automate the guiding process. The 
pseudo-relevance feedback carries out the rele-
vance judgment based on the assumption that the 
top ranked semantic patterns are more likely to 
be the relevant ones. Thus, this approach usually 
relies on setting a threshold or selecting only the 
top n semantic patterns to form the relevant set. 
However, determining the threshold is not trivial, 
and the threshold may be different with different 
seed patterns. Therefore, we apply the pseudo-
relevance feedback only after certain expert-
guided iterations, rather than applying it 
throughout the induction process. The notion is 
that we can get a more reliable threshold value 
by observing the behavior of the relevant seman-
tic patterns in the ranked list for a few iterations. 
 To further examine the effectiveness of the 
combined approach, we additionally construct a 
CIP variant, RF(10)+pseudo, by applying the 
pseudo-relevance feedback after 10 expert-
guided iterations. The threshold is determined by 
the physicians during their judgments in the 10-
th iteration. The results are presented in Figure 4. 
The precision of RF(10)+pseudo is inferior to 
that of RF(20) before the 25-th iteration. Mean-
while, after the 30-th iteration, RF(10)+pseudo 
achieves higher precision than the other methods. 
This indicates that the pseudo-relevance feed-
back can also contribute to semantic pattern in-
duction in the stage without expert intervention.  
5.2 Coverage on Real Data 
The final results of the semantic patterns are the 
relevant sets of the last iteration produced by 
RF(10)+pseudo, denoted as 
CIP
SP . Parts of them 
are shown in Table 3.  
Seed 
Pattern 
< boyfriend, argue > 
Induced 
Patterns 
<girlfriend, break up>; <friend, fight>
Table 3. Parts of induced semantic patterns. 
We compare 
CIP
SP    to    those    created    by   a   
corpus-based approach. The corpus-based ap-
proach relies on an annotated domain corpus and 
a learning mechanism to induce the semantic 
patterns. Thus, we collected 300 consultation 
records from the PsychPark as the domain corpus, 
and each sentence in the corpus is annotated with 
a negative life event or not by the three physi-
cians. After the annotation process, the sentences 
with negative life events are together to form the 
training set. Then, we adopt Mutual Information 
(Manning and Schütze, 1999) to learn variable-
length semantic patterns. The mutual information 
between k words is defined as 
1
11
1
( ,..., )
(,.., ) (,.., )log
()
k
kk
k
i
i
Pw w
MI w w P w w
Pw
=
=
∏
    (16) 
where 
1
( ,... )
k
Pw w  is the probability of the k 
words co-occurring in a sentence in the training 
set, and ( )
i
Pw  is the probability of a single word 
occurring in the training set. Higher mutual in-
formation indicates that the k words are more 
likely to form a semantic pattern of length k. 
Here the length k also ranges from 2 to 4. For 
each k, we compute the mutual information for 
all possible combinations of words in the training 
set, and those with their mutual information 
above a threshold are selected to be the final re-
sults of the semantic patterns, denoted as 
MI
SP . 
In order to obtain reliable mutual information 
values, only words with at least the minimum 
number of occurrences (>5) are considered. 
To examine the coverage of 
CIP
SP  and 
MI
SP  on 
real data, 15 human subjects are involved in cre-
ating a test set. The subjects provide their experi-
enced negative life events in the form of natural 
language sentences. A total of 69 sentences are 
collected to be the test set, of which 39 sentences 
contain a semantic pattern of length two, 21 sen-
tences contain a semantic pattern of length three, 
and 9 sentences contain a semantic pattern of 
length four. The evaluation metric used is out-of-
pattern (OOP) rate, a ratio of unseen patterns 
occurring in the test set. Thus, the OOP can be 
defined as the number of test sentences contain-
ing the semantic patterns not occurring in the 
training set, divided by the total number of sen-
tences in the test set. Table 4 presents the results. 
951
 k=2 k=3 k=4 
CIP
SP  0.36 (14/39) 0.48 (10/21) 0.44 (4/9)
MI
SP  0.51 (20/39) 0.62 (13/21) 0.67 (6/9)
Table 4. OOP rate of the CIP and a corpus-based 
approach. 
The results show that the OOP of 
MI
SP  is 
higher than that of 
CIP
SP . The main reason is the 
lack of a large enough domain corpus with anno-
tated life events. In this circumstance, many se-
mantic patterns, especially for those with a larger 
length, could not be learned, because the number 
of their occurrences would be very rare in the 
training set. With no doubt, one could collect a 
large amount of domain corpus to reduce the 
OOP rate. However, increasing the amount of 
domain corpus also increases the amount of an-
notation and computation complexity. Our ap-
proach, instead, exploits the quality concepts to 
reduce the search space, also applies the rele-
vance feedback to guide the induction process, 
thus it can achieve better results with time-
limited constraints. 
6 Conclusion 
This study has presented an HAL-based cascaded 
model for variable-length semantic pattern in-
duction. The HAL model provides an informa-
tive infrastructure for the CIP to induce semantic 
patterns from the unannotated psychiatry web 
corpora. Using the quality concepts and preserv-
ing the better results from the previous stage, the 
search space can be reduced to speed up the in-
duction process. In addition, combining the rele-
vance feedback and pseudo-relevance feedback, 
the induction process can be guided to induce 
more relevant semantic patterns. The experimen-
tal results demonstrated that our approach can 
not only reduce the reliance on annotated corpora 
but also obtain acceptable results with time-
limited constraints. Future work will be devoted 
to investigating the detection of negative life 
events using the induced patterns so as to make 
the psychiatric services more effective. 
References 
R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern 
Information Retrieval. Addison-Wesley, Reading, 
MA. 
Y. M. Bai, C. C. Lin, J. Y. Chen, and W. C. Liu. 2001. 
Virtual Psychiatric Clinics. American Journal of 
Psychiatry, 158(7):1160-1161.  
J. Bai, D. Song, P. Bruza, J. Y. Nie, and G. Cao. 2005. 
Query Expansion Using Term Relationships in 
Language Models for Information Retrieval. In 
Proc. of the 14th ACM International Conference 
on Information and Knowledge Management, 
pages 688-695. 
E. M. Brostedt and N. L. Pedersen. 2003. Stressful 
Life Events and Affective Illness. Acta Psychiat-
rica Scandinavica, 107:208-215. 
C. Burgess, K. Livesay, and K. Lund. 1998. Explora-
tions in Context Space: Words, Sentences, Dis-
course. Discourse Processes. 25(2&3):211-257. 
C. Fellbaum. 1998. WordNet: An Electronic Lexical 
Database. Cambridge, MA: MIT Press. 
T. Grenager, D. Klein, and C. D. Manning. 2005. Un-
supervised Learning of Field Segmentation Models 
for Information Extraction. In Proc. of the 43th 
Annual Meeting of the ACL, pages 371-378. 
T. Hasegawa, S. Sekine, R. Grishman. 2004. Discov-
ering Relations among Named Entities from Large 
Corpora. In Proc. of the 42th Annual Meeting of 
the ACL,  pages 415-422. 
W.Lehnert, C. Cardie, D. Fisher, J. McCarthy, E. 
Riloff, and S. Soderland. 1992. University of Mas-
sachusetts: Description of the CIRCUS System 
used for MUC-4. In Proc. of the Fourth Message 
Understanding Conference, pages 282-288. 
C. Manning and H. Schütze. 1999. Foundations of 
Statistical Natural Language Processing. MIT 
Press. Cambridge, MA. 
I. Muslea. 1999. Extraction Patterns for Information 
Extraction Tasks: A Survey. In Proc. of the AAAI-
99 Workshop on Machine Learning for Information 
Extraction, pages 1-6. 
M. E. Pagano, A. E. Skodol, R. L. Stout, M. T. Shea, 
S. Yen, C. M. Grilo, C.A. Sanislow, D. S. Bender, 
T. H. McGlashan, M. C. Zanarini, and J. G. Gun-
derson. 2004. Stressful Life Events as Predictors of 
Functioning: Findings from the Collaborative Lon-
gitudinal Personality Disorders Study. Acta Psy-
chiatrica Scandinavica, 110:421-429. 
M. Stevenson and M. A. Greenwood. 2005. A Seman-
tic Approach to IE Pattern Induction. In Proc. of 
the 43th Annual Meeting of the ACL, pages 379-
386. 
C. H. Wu, L. C. Yu, and F. L. Jang. 2005. Using Se-
mantic Dependencies to Mine Depressive Symp-
toms from Consultation Records. IEEE Intelligent 
System, 20(6):50-58.  
J. F. Yeh, C. H. Wu, M. J. Chen, and L. C. Yu. 2004. 
Automated Alignment and Extraction of Bilingual 
Domain Ontology for Cross-Language Domain-
Specific Applications. In Proc. of the 20th COL-
ING, pages 1140-1146. 
952
