Finding Syntactic Similarities Between XML DocumentsReport as inadecuate




Finding Syntactic Similarities Between XML Documents - Download this document for free, or read online. Document in PDF available to download.

XML documents, clustering

Additional contributors:

Subject-Keyword: XML documents clustering

Type of item: Computing Science Technical Report

Computing science technical report ID: TR05-16

Language: English

Place:

Time:

Description: Technical report TR05-16. We present a concise and accurate structural summary of XML documents and show that this summary can be used to effectively cluster documents that belong to a structurally similar class. We present efficient formulations of similarity between structural summaries that leads to a better detection of documents that conform to the same DTD. Our formulation is based on the intuition that two documents are likely to be generated by the same DTD if a large fraction of paths in the two documents are the same or similar. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison.

Date created: 2005

DOI: doi:10.7939-R3Q814T79

License information: Creative Commons Attribution 3.0 Unported

Rights:





Author: Rafiei, Davood Moise, Daniel Sun, Dabo

Source: https://era.library.ualberta.ca/


Teaser



Finding Syntactic Similarities Between XML Documents Davood Rafiei Daniel Moise Dabo Sun Dept.
of Computing Science University of Alberta, Canada Dept.
of Computing Science University of Alberta, Canada Dept.
of Computing Science University of Alberta, Canada drafiei@cs.ualberta.ca moise@cs.ualberta.ca dabo@cs.ualberta.ca ABSTRACT                                                      Æ                                                                                                         !            -                          Categories and Subject Descriptors #$% &    ( )        *        1. INTRODUCTION                          ,      -                                                      .
                                   -       0                  &1- 2        -                         -         Æ  3  -    0                ...





Related documents