-
Notifications
You must be signed in to change notification settings - Fork 1
Quick Setup
fghso edited this page Apr 9, 2015
·
6 revisions
To setup the collection program you basically have to write a crawler class with an implementation of the crawl
method (inside file crawler.py
) that suits your scenario and adjust the settings inside the XML configuration file. Assuming the simplest configuration possible, the setup workflow would be as follows:
- Implement the
crawl
method insidecrawler.py
- Create a XML configuration file, for example
config.xml
- For global settings specify:
- Hostname of the server's machine
- Port where server will be listening for new connections
- For server settings, inidicate the persistence handler to be used (as well as the handler specific configurations)
- For client settings, specify the name of the crawler class
- On server's machine, run the command
python ./server.py config.xml
- On each client machine, start one or more clients by running the command
python ./client.py config.xml
- Monitor and manage the collection process using the script
manager.py
- Wait for the collection to be finished
- Enjoy your new data!
The final config.xml
file would be something like this:
<?xml version="1.0" encoding="ISO8859-1" ?>
<config>
<global>
<connection>
<address>myserver</address>
<port>7777</port>
</connection>
</global>
<server>
<persistence>
<!--- Handler specific configurations--->
</persistence>
</server>
<client>
<crawler>
<class>MyCrawler</class>
</crawler>
</client>
</config>