In the early days of the web, we created portals (DMOZ, Yahoo! directory). As the web scaled up, portals were no longer viable, and we moved to metadata search (Altavista, Lycos). Search was more scalable, but results were unreliable and easily gamed. The third generation of web discovery came with PageRank-style search (Google), which used many more cues for search including understanding of content, usage, and linking.
On the web of data, we are still in the portal era. We are often asked “do you know where I can get X data”, and there is an expectation in the wider world that ODI “has all the data”.
We will evolve data discovery by working with and improving Open Data Certificate and Open Data Monitor indexes to help create effective metadata-driven search engines for data, most likely in partnership with member organisations working on similar products. This is analogous to the early web search engines which relied on publisher-defined metadata to characterise pages.
However, by analogy to the web, proper data search is much more than this basic level. Our researchers will look at automated methods of summarising and characterising datasets, allowing a search engine to truly understand the content of datasets. We will also look at how analysing usage and popularity of datasets enables better search. This is similar to the PageRank era of search, which transformed the early web into something that “just works”.