-
Notifications
You must be signed in to change notification settings - Fork 1
Configurations
All server and client configurations are made in a XML configuration file. Logging and verbose options are also available through the command line and override any settings in the configuration file. Type python .\server.py -h
or python .\client.py -h
to see more information about command line options.
###General
The example bellow shows a complete configuration file, with all available options. For each option, it also shows its value type, wether the option is required or not and what is the default value. String options have to be written without any quotes. Boolean options accepts the following values for True
: true
, t
, yes
, y
, on
, 1
; and the following values for False
: false
, f
, no
, n
, off
, 0
. All boolean values are case insensitive.
<?xml version="1.0" encoding="ISO8859-1" ?>
<config>
<global>
<connection>
<address>String - Required</address>
<port>Integer - Required</port>
</connection>
<feedback>Boolean - Optional (Default: False)</feedback>
<echo>
<!--- The echo section itself is optional --->
<verbose>Boolean - Optional (Default: False)</verbose>
<logging>Boolean - Optional (Default: True)</logging>
<loggingpath>String - Optional (Default: . , i.e., a single dot, meaning current directory)</loggingpath>
<loggingfilemode>String - Optional (Default: overwrite)</loggingfilemode>
</echo>
</global>
<server>
<loopforever>Boolean - Optional (Default: False)</loopforever>
<persistence>
<class>String - Required</class>
<echo>
<!--- The echo section itself is optional --->
<verbose>Boolean - Optional (Default: specified in the global section)</verbose>
<logging>Boolean - Optional (Default: specified in the global section)</logging>
<loggingpath>String - Optional (Default: specified in the global section)</loggingpath>
</echo>
<!--- Other handler specific configurations --->
</persistence>
<filtering>
<!--- The filtering section itself is optional --->
<filter>
<class>String - Required</class>
<name>String - Optional (Default: class name)</name>
<parallel>Boolean - Optional (Default: False)</parallel>
<echo>
<!--- The echo section itself is optional --->
<verbose>Boolean - Optional (Default: specified in the global section)</verbose>
<logging>Boolean - Optional (Default: specified in the global section)</logging>
<loggingpath>String - Optional (Default: specified in the global section)</loggingpath>
</echo>
<!--- Other filter specific configurations --->
</filter>
</filtering>
<echo>
<!--- The echo section itself is optional --->
<verbose>Boolean - Optional (Default: specified in the global section)</verbose>
<logging>Boolean - Optional (Default: specified in the global section)</logging>
<loggingpath>String - Optional (Default: specified in the global section)</loggingpath>
</echo>
</server>
<client>
<crawler>
<class>String - Required</class>
<echo>
<!--- The echo section itself is optional --->
<verbose>Boolean - Optional (Default: specified in the global section)</verbose>
<logging>Boolean - Optional (Default: specified in the global section)</logging>
<loggingpath>String - Optional (Default: specified in the global section)</loggingpath>
</echo>
</crawler>
<echo>
<!--- The echo section itself is optional --->
<verbose>Boolean - Optional (Default: specified in the global section)</verbose>
<logging>Boolean - Optional (Default: specified in the global section)</logging>
<loggingpath>String - Optional (Default: specified in the global section)</loggingpath>
</echo>
</client>
</config>
General options explanations follows:
-
global
-
connection
-
address
: server's machine hostname. -
port
: port where server will be listening for new connections.
-
-
feedback
: enable/disable saving of new resources sent to server. When clients send new resources to server and feedback is enabled, these resources are saved in the same location where the original resources are stored and then become part of the list of resources to be crawled. So, this could be used to do a snowball crawling, for example. -
echo
: default output configuration. The values defined here specifies the default echo behaviour for all subsequent configuration sections. Local settings override the default values.-
verbose
: enable/disable information messages on screen. -
logging
: enable/disable logging on file. -
loggingpath
: define the path of logging file. -
loggingfilemode
: define the mode in which the logging file has to be opened. The possible values are "overwrite" (corresponding to opening the file in "w" mode) or "append" (corresponding to opening the file in "a" mode). Both values are case insensitive.
-
-
-
server
-
loopforever
: define if server should finish or wait when all available resources have been crawled. If disabled, the program just terminates when no available resource exists. Otherwise, the server asks the persistence handler for a new resource from time to time, waiting until one is returned. This is specially usefull in conjuction with thefeedback
option when the rate of insertion of new resources is lower than the rate of crawling of existing resources, avoiding the server to finish prematurely. -
persistence
:-
class
: name of the class of the persistence handler to be used. -
echo
: local output configuration. Options explanations are the same as in the global section. - Other handler specific configurations.
-
-
filtering
: more than one filter can be specified here. In the case of sequential filters, they are executed in the same order in which they appear here.-
filter
-
class
: name of the class of the filter to be used. -
name
: filter name. If no name is given, the filter is named after its class name. -
parallel
: define if it is a sequential or parallel filter. If enabled, a thread is started everytime the filter is executed (for bothapply
andcallback
methods). -
echo
: local output configuration. Options explanations are the same as in the global section. - Other filter specific configurations.
-
-
-
echo
: local output configuration. Options explanations are the same as in the global section.
-
-
client
-
crawler
-
class
: name of the class of the crawler to be used. -
echo
: local output configuration. Options explanations are the same as in the global section.
-
-
echo
: local output configuration. Options explanations are the same as in the global section.
-