Quick Demo

To see the tool in action, open a terminal window an go into the /demo directory. Then, type the following command:

python ../server.py config.demo.xml

This will initialiaze the server, setting it to distribute the resources stored in the file resources.csv. Open another terminal window, go into the /demo directory and type:

python ../manager.py config.demo.xml -s extended

The manager will show you extended status information about the data collection process, as informed by the server. Under Global Info section, note that there are 10 resources available to be crawled and 1 already crawled. Type the following command:

python ../client.py config.demo.xml

This will initialize a client. The client is configured to use the crawling code of the DemoCrawler class, which can be found in the file crawler.py. The crawler just receives a resource ID, waits for some time and then returns some information related to the resource to the server.

Besides that, two filters are used to save new resource IDs returned by the client. These new IDs are stored in the files new_resources.csv and new_resources.json. In the case of the JSON file, a RolloverFilePersistenceHandler is used, configured to save a maximum of 5 resources per file. This way, at the end of demo execution you should see 3 JSON files in the folder: resources.json, new_resources.json and new_resources.json.1. The last file is automatically created by the rollover handler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Demo

Clone this wiki locally