Harnessing the Deep Web: Present and Future - Computer Science > DatabasesReport as inadecuate




Harnessing the Deep Web: Present and Future - Computer Science > Databases - Download this document for free, or read online. Document in PDF available to download.

Abstract: Over the past few years, we have built a system that has exposed largevolumes of Deep-Web content to Google.com users. The content that our systemexposes contributes to more than 1000 search queries per-second and spans over50 languages and hundreds of domains. The Deep Web has long been acknowledgedto be a major source of structured data on the web, and hence accessingDeep-Web content has long been a problem of interest in the data managementcommunity. In this paper, we report on where we believe the Deep Web providesvalue and where it does not. We contrast two very different approaches toexposing Deep-Web content - the surfacing approach that we used, and thevirtual integration approach that has often been pursued in the data managementliterature. We emphasize where the values of each of the two approaches lie andcaution against potential pitfalls. We outline important areas of futureresearch and, in particular, emphasize the value that can be derived fromanalyzing large collections of potentially disparate structured data on theweb.



Author: Jayant Madhavan Google Inc., Loredana Afanasiev Universiteit van Amsterdam, Lyublena Antova Cornell University, Alon Halevy Googl

Source: https://arxiv.org/







Related documents