To keep the git repo from bloating too much, some datasets are submoduled and not versioned directly. To include them, clone this repo with the --recursive
flag, eg
git clone --recursive https://github.com/infochimps-data/infochimps-data
Note: this is many gigabytes of data.