The end of crawling and privacy
posted by shuri on 2007-05-16 12:29:43
tags: news
Many things are changing, sometimes its hard to notice. Crawling used to be the only way of discovering pages and fetching content. Sitemaps was one of the first ways that this changed. Web sites pushing lists of URLs and notifying search engines that the site was updated. The popularization of RSS feeds also offer a never ending stream of URLs with new content.
An interesting paper in www2007 talks about "Navigation-Aided Retrieval" which augments the retrieval model with the assumption that the user is willing to navigate a bit to find what he wants. Could this mean that a less exhaustive crawl would still be just as good?
Google's Web history is another interesting application that sprung lately. Using the google toolbar and any other method they can use they record the URLs you visit. I do not know if they did this before but they are doing this now. Everybody is talking about the implication for personalization. I am saying another hit for crawling and a bit of a hit for privacy.
Regarding privacy and toolbars, Microsoft presented a work in WWW2007 that analyzed password strength. They collected the information for this research using the windows live toolbar. Most of the people that heard the presentation seemed more interested in the use of the private data than in the password strength.
To summarize, there seem to be two loosely related trends, brute force crawling is getting slightly less important than it used to be and you can safely assume that you do not have any privacy.