File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-2170_concl.xml

Size: 1,478 bytes

Last Modified: 2025-10-06 13:52:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2170">
  <Title>Jurilinguistic Engineering in Cantonese Chinese: An N-gram-based Speech to Text Transcription System</Title>
  <Section position="5" start_page="1124" end_page="1124" type="concl">
    <SectionTitle>
5. Conclusion
</SectionTitle>
    <Paragraph position="0"> We have created a Cantonese Chinese CAT system which uses the phonologically-based stenograph machine. The system delivers encouragingly accurate transcription in a language which has many hon\]ol~honous characters. To resolve problematic ambiguity in the conversion fi'on-i a I)honologically-based code to the logograt)hic Chinese characters, we made use of lhe N-gram statistical model. The Viterbi algorithm has enabled us to identify the most probable sequence of characters from the sels of possible homophonous characters. With the additional use of special encoding and domain-specific training, the Cantonese CAT system has attained 96% transcription accuracy. The success of the Jurilinguistic Engineering project can further enhance the efforts by the Hong Kong Judiciary to conduct trials in the language of the majority population. Further improvement to the system will include (i) more domain-specific training and testing across different case types, (2) firm-tuning for the optimal weights in the trigram formula, and (3) optilnizing the balance between training corpus size and shallow linguistic processing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML