The ACL RD-TEC

The dataset is organized in several folders; each folder may further contain several sub-folders or archive files. Archive files contain tab-separated text files. In these files, the first line starts with character “#” and describes the content of the file.

Below is the list of files and folders available for download. An additional description is provided for each category and file.


NameDescription
annotation/Files that represent the set of manually annotated candidate terms.
annotation_guideline/Annotation guidelines and relevant documentations.
annotator_agreement_test_files/Additional annotations used for calculating the annotator agreement.
candidate_term/The set of all the extracted candidate terms.
cleansed_text/Raw text files in XML format, cleansed and segmented at paragraph level.
external_resource/Resources from ACL ARC.
licenses/Relevant license files.
misc/Additional helpful materials.
sepid_corpus/The pre-processed, segmented, tagged and indexed ACL ARC.

This page last edited on 06 October 2025.




*** ***