Row 70 of EDGI: Uncrawlable Content refers to
DATA TOOLS & MODELS, which acts as a portal to information about different kinds of energy.
Following the Electricity Data Files link led to a page of links to pages for different surveys.
Ignoring the link to consolidated data,
figuring that the original sources were more important,
and ignoring the link to eia860
,
since that was already on the edgi sheet,
the links point to:
Each of these pages contains lists of links to .zip
, .xls
, or .xlsx
files, so I wanted to download only those.
In the linux bash shell, I setup a for loop through the above links, doing
wget -r -A 'zip,xls,xlsx' $url
(where $url
is substituted with an url in the above list).
I meant to add the parameter to limit the depth to 1, but forgot. The net result is a huge directory tree of files.