Skip to content
paynejd edited this page Mar 21, 2013 · 5 revisions

Summary

This PHP command-line script is used to automate synchronizing form data between Dimagi's CommCare HQ and a local relational database (i.e. MySql).

NOTE: Dimagi is in the process fo developing their own export functionality that should replace this script for most use cases. (Expected to be ready by May 2013)

HQSync Setup Instructions

  1. Download code from: https://github.com/paynejd/hqsync
  2. Unzip code into an executable directory
  3. Create the hqsync database in mysql
  4. Run the hqsync_init.sql script in the newly created hqsync database
  5. Update hqsync.hqsync to sync the desired reports (refer to section below for documentation)
  6. Create databases for each domain (e.g. pathfinder) - this is where the data for each domain will be stored; tables are created automatically
  7. Finally, update configuration in sync.php to use

How to upgrade the sync.php:

  1. Backup existing data (use your'e own username, you will be prompted for password). From the command line, enter:
mysql -u <username> -p pathfinder > pathfinder_backup.sql
mysql -u <username> -p hqsync > hqsync_backup.sql
  1. Drop all pathfinder tables that were created by hqsync - these should all be in the pathfinder database; you can see the list of tables to drop in hqsync.hqsync_table. The sql command is:
drop table [table_name_goes_here];
  1. Purge the hqsync_table and hqsync_log tables
truncate hqsync.hqsync_table
truncate hqsync.hqsync_log
  1. Replace sync.php with new version

    • Browse to location of the sync.php scripts (I think this is under the root user's home directory)
    • Rename existing sync.php (mv sync.php sync.php.backup)
    • Copy in the attached updated sync script
  2. Execute the script (note that this might take 30+ min to finish since it is downloading all of the data)

./sync.sh
  1. Confirm that it works - check the logs to confirm that the import was successful
select * from hqsync.hqsync_log;

How to add a domain to hqsync:

See https://wiki.commcarehq.org/display/commcarepublic/Export+API for corresponding documentation on CommCare HQ

  1. Add the user account that will be performing the syncs to the domain using CommCare HQ (take a look at the hqsync table for an example).
  2. Go to the Reports -> Export Submissions to Excel section of HQ for the domain you would like to add.
  3. Create a database in Mysql named after the domain to store the synced data if you have not already done so
  4. Add a record to hqsync for each form you want to sync. Records should follow this format:
    • domain - should match the domain name in HQ exactly
    • dbname - should match the database name that you created in step #3 exactly
    • url - should follow the format below (let's wait for Cory's response to finalize): https://www.commcarehq.org/a/pih-cholera/reports/export/?export_tag=%22http://openrosa.org/formdesigner/1679BE38-C8DD-4564-B34D-F706E43B0807%22&filename=Done%20Kolera&format=csv
    • form_name - should match the name of the form in HQ exactly
    • uid - this is the username used to login to HQ (NOTE: Don't use your personal account. Its better to use a separate sync account, such as the one I created to sync Pathfinder)
    • pwd - this is the password used to login to HQ
    • active - 0 to inactivate; 1 to activate
    • use_token - this should be 1 for tables that store data sequentially and you don't want to download the entire dataset each time. basically, set this to 1 for all forms; set it to 0 for the user/referrals/case export
    • purge_before_import - this empties the table out before every import; generally only set this to 1 for the user/referrals/case export

HQSync User Documentation

HQSync is a PHP command line utility to synchronize forms from CommCare HQ domains to a mysql database. The utility consists of 2 code files (sync.php, HqSync.inc.php) and 3 database tables (hqsync, hqsync_table, hqsync_log). Execution creates an additional log file and stores data files fetched from the CommCare HQ server. sync.php is the executable and is typically run with a command like this:

php sync.php –v >> hqsync.log 2>&1

This runs sync.php in verbose mode and appends both the standard and error outputs to hqsync.log. During normal usage, a script command like above would be stored in a bash script and set as an automated cron job (i.e. hourly). sync.sh is provided as an example of this. For additional commands, use:

php sync.php –help

There are 3 database tables, but only hqsync should be edited manually. Each record in hqsync corresponds with a single form or report in an HQ domain to be imported. CommCare HQ exports are returned as zip archives of zero or more comma-separated value (CSV) files. If a token is used during a fetch (default behavior), then only records submitted since the token will be returned. Some exports, such as the user/case/referral export, do not use tokens because they are not longitudinal, but rather represent the current state of the user/case/referral tables. In this case, set use_token = 0 and purge_before_import = 1. The other tables are updated automatically by sync.php in the following manner:

  • hqsync_table – This table is used to map the individual CSV files contained in the zip exports to database tables that contain the imported data.
  • hqsync_log – One record is created for each record in hqsync for each execution of the script to log one of 3 possible results: 100 (SUCCESS), 200 (EMPTY_EXPORT), -100 (ERROR). The table also records the input token used to fetch the result and the export token returned in the export header. If an error occurs, the error message is saved to the comments field.

Maintenance

If a log file is used (e.g. hqsync.log) it should be archived regularly using an automated cron job. archivelogs.sh is provided as an example script and contains the following:

FILENAME=hqsynclog_$(date +%Y%m%d)
zip /root/hqsync/logs/$FILENAME.zip /root/hqsync/hqsync.log
rm /root/hqsync/hqsync.log

The hqsync_log database table should also be archived periodically to save space and improve performance. A third script is under development for just this purpose that should be run when appropriate (possibly every 3-6 months).