Abstract: Over the past few years, we have built a system that has exposed largevolumes of Deep-Web content to Google.com users. The content that our systemexposes contributes to more than 1000 search queries per-second and spans over50 languages and hundreds of domains. The Deep Web has long been acknowledgedto be a major source of structured data on the web, and hence accessingDeep-Web content has long been a problem of interest in the data managementcommunity. In this paper, we report on where we believe the Deep Web providesvalue and where it does not. We contrast two very different approaches toexposing Deep-Web content - the surfacing approach that we used, and thevirtual integration approach that has often been pursued in the data managementliterature. We emphasize where the values of each of the two approaches lie andcaution against potential pitfalls. We outline important areas of futureresearch and, in particular, emphasize the value that can be derived fromanalyzing large collections of potentially disparate structured data on theweb.

