A Maximum Entropy Approach to Sentence Boundary Detection of Vietnamese TextsReport as inadecuate




A Maximum Entropy Approach to Sentence Boundary Detection of Vietnamese Texts - Download this document for free, or read online. Document in PDF available to download.

1 KIWI - Knowledge Information and Web Intelligence LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications 2 MSI - Modélisation et Simulation Informatique de systèmes complexes

Abstract : We present for the first time a sentence boundary detection system for identifying sentence boundaries in Vietnamese texts. The system is based on a maximum entropy model. The training procedure requires no hand-crafted rules, lexicon, or domain-specific information. Given a corpus annotated with sentence boundaries, the model learns to classify each occurrence of potential end-of-sentence punctuations as either a valid or invalid sentence boundary. Performance of the system on a Vietnamese corpus achieved a good recall ratio of about 95%. The approach has been implemented to create a software tool named vnSentDetector, a plug-in of the open source software framework vnToolkit which is intended to be a general framework integrating useful tools for processing of Vietnamese texts.





Author: Hong Phuong Le - Tuong Vinh Ho -

Source: https://hal.archives-ouvertes.fr/



DOWNLOAD PDF




Related documents