Similarités induites par mesure de comparabilité : signification et utilité pour le clustering et lalignement de textes comparablesReport as inadecuate




Similarités induites par mesure de comparabilité : signification et utilité pour le clustering et lalignement de textes comparables - Download this document for free, or read online. Document in PDF available to download.

1 SEASIDE - SEarch, Analyze, Synthesize and Interact with Data Ecosystems IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, UBS - Université de Bretagne Sud

Abstract : In the presence of bilingual comparable corpora it is natural to embed the data in two distinct linguistic representation spaces in which a -computational- notion of similarity is potentially defined. As far as these bilingual data are comparable in the sense of a measure of comparability also computable Li et Gaussier, 2010, we can establish a connection between these two areas of linguistic representation by exploiting a weighted mapping that can be represented in the form of a weighted bidirectional graph of comparability. We study in this paper the conceptual and practical consequences of such a similarity-comparability connection, while developing an algorithm Hit-ComSim based on the concept of similarities induced by the topology of the graph of comparability. We try to evaluate the benefit of this algorithm considering some preliminary categorization or clustering tasks of bilingual English-French documents collected from RSS feeds.

Keywords : Clustering Comparability graph Induced similarities Comparable documents Clustering.





Author: Pierre-François Marteau - Gildas Ménier -

Source: https://hal.archives-ouvertes.fr/



DOWNLOAD PDF




Related documents