- concurrency
- consumption of web resources (http/json/XML)
- sorting
- data sets
- data storage
The goal of this exercise is to reimplement a tool I was using in the dial-up, pre-Google era: Copernic 2000!
Copernic was a desktop application that was managing your search by propagating them to the top crawlers such as AltaVista, Deja.com, Excite, HotBot, Infoseek, Lycos, Magellan, WebCrawler and Yahoo. The results where cached locally (to be consulted offline) and ranked based on relevance. At that point, the modem could be disconnected and you could consult your results offline. Once you went through the results, you could reconnect to follow the links that were not yet downloaded if any. The software was also updating the results on a daily basis to let you know if there was anything new.
Given a keyword, your goal is to gather information from multiple sources and cross reference them, sort them and to make them available to the user.
- https://developers.google.com/custom-search/json-api/v1/introduction
- http://www.bing.com/developers/s/apibasics.html
- https://datamarket.azure.com/dataset/5BA839F1-12CE-4CCE-BF57-A49D98D29A44
- http://www.faroo.com/hp/api/api.html
- http://www.entireweb.com/search_api/implementation/
- https://duckduckgo.com/api
- https://dev.twitter.com/docs/api/1.1/get/search/tweets
- http://developer.github.com/v3/
- https://developers.facebook.com/docs/graph-api/
- http://developer.baidu.com/map/webservice.htm
- http://www.nsa.gov/