File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1042_metho.xml

Size: 2,580 bytes

Last Modified: 2025-10-06 14:13:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1042">
  <Title>Comlex Syntax: Building a Computational Lexicon</Title>
  <Section position="4" start_page="268" end_page="269" type="metho">
    <SectionTitle>
3 Methods
</SectionTitle>
    <Paragraph position="0"> Our basic aplm)acll has been to create an initial lexicon lll&amp;llIUtl\]y a, lld \[,h~ll \[,() list! ;t vtH'i~ty of resolll'ces) both commercial aml corpus-deriw'd, to reline rids lexicon.</Paragraph>
    <Paragraph position="1"> Alth-ugh methods haw~ been dew%ped .ww tile last few years for autovual,ically ideutifyi,g sore,: subcati,gorizati(~ll consl,r:tillts I, llrotlgh corpus ;tllulysis \[2,5\[, these methods are sl,ill lhuited iu the range cf disthlc l, ions they can identify and their Mfility to deal with \](~w-frequency words. (hmsequently \ve have chosen \[,o use manual entry for creaticm of our initial dictio,mry.</Paragraph>
    <Paragraph position="2"> The entry of lexical information is being performed by flmr gll;tdllllte liuguistics studcllts, rel'erled I.o as elves (&amp;quot;elf&amp;quot; = euterer ,,f lexical features). Tile elw:s are provided with a memMmsed interl'~ce c-ded in C-lumort 1,isp using the Garnet GI/I package, aim runuiug on Sun workst.atimls. Tiffs iuterfa.ce also p,'c.vides ac tess t,o a hu'ge text corpus; as ~ wcwd is being', eutered, instances .f t, he word e;m be viewed in one of tim windows. I:,lves rely on cited, ions from the corpus, dellnitiC/ms and citations from any of several printed dictio naries and their own linguistic intuitions in assigninp; features I,o words.</Paragraph>
    <Paragraph position="3"> I)ictiouary entry began ill April 19!)3. Au initial dicti&lt;mary contahlhut ewtries for all the u(u.us, verbs and adjvci,ives ill tile ()AI,I) was coluldetml iu M.y, 1!)9'1.3 We expect t. checlC/ tiffs dicti,mary ;tg;tillSt sevel'a{ SOIIrC(!S, VVe hltelld to CC/)lill)al'e the IilaAlll;t\] sllbcate gorizations for verbs aF.ainsl, I, hose in the ()A\[,I), and would be pleased to make COllI\])a, risous ;I.l.,;a.illst other broad-c~werage dictiouarios if those Cttll be m!tde avail-able tbr this purpose. We also hltend to mMw COml)aris~ms against sewn'al corpus deriw~d lists: at the very least, with w!rb/l~reptMthm and w~rb/partMe pairs wit.h high mutual inf, rmation \[3\] mid, if possible, wil.h the results of recently-developed procedures for ex tractinF, subcai,egorlzal, iou tYames from corpor;t \[2,.ti\]. While tiffs corpus-derived information may not be detailed or accurate e|lough for fu~ly-autonl~tted lexicon</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML