File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0104_intro.xml

Size: 1,563 bytes

Last Modified: 2025-10-06 14:03:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0104">
  <Title>A Core-Tools Statistical NLP Course</Title>
  <Section position="3" start_page="0" end_page="23" type="intro">
    <SectionTitle>
2 Audience
</SectionTitle>
    <Paragraph position="0"> The audience of the class began as a mix of CS PhD students (mostly AI but some systems students), some linguistics graduate students, and  a few advanced CS undergrads. What became apparent after the first homework assignment (see section 4.2) was that while the CS students could at least muddle through the course with weak (or absent) linguistics backgrounds, the linguistics students were unable to acquire the math and programming skills quickly enough to keep up. I have no good ideas about how to address this issue. Moreover, even among the CS students, some of the systems students had trouble with the math and some of the AI/theory students had issues with coding scalable solutions. The course was certainly not optimized for broad accessibility, but the approximately 80% of students who stuck it out did what I considered to be extremely impressive work. For example, one student built a language model which took the mass reserved for new words and distributed it according to a character n-gram model. Another student invented a non-iterative word alignment heuristic which out-performed IBM model 4 on small and medium training corpora. A third student built a maxent part-of-speech tagger with a per-word accuracy of 96.7%, certainly in the state-of-the-art range.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML