Skip to content
David Megginson edited this page Jun 12, 2020 · 5 revisions

The Coding quick start introduced the hxl.data() function for loading HXL-hashtagged data from a file or URL:

import hxl
source = hxl.data("http://www.example.org/data.csv")

The function has several optional parameters to support special data-loading requirements, especially over the web:

Option Description Value type Default
allow_local If True, allow opening local files (in addition to URLs). boolean False
sheet_index If opening an Excel workbook (.xls or .xlsx), this specifies the sheet number to use, starting at 1 for the first tab. If not specified (None), the library will scan the Excel workbook for the first sheet that contains HXL hashtags. Has no effect for non-Excel sources. integer None
timeout Number of seconds to wait before giving up trying to connect to a remote server to download a file. If None, use the system default behaviour. (Note that the remote server may time out earlier on its own.) Has no effect when loading data from local files. integer None
verify_ssl If True, allow secure HTTPS: connections only for verified TLS/SSL certificates. You may want to set this to False if you're working with a self-signed certificate in an internal environment. Has no effect when loading data from local files. boolean True
http_headers A Python dictionary of HTTP headers and values to send with your request (e.g. "Authorization"). Has no effect when loading data from local files dict None
selector A top-level property name or JSONPath expression pointing to the top of the HXL dataset inside a JSON. If not specified, the library will search through the JSON data for something that looks like a dataset. Has no effect when loading data from non-JSON sources. JSONPath (string) None
encoding Specify the character encoding for CSV data. If not specified, the library will use the encoding provided by the web server (for an HTTP connection), or assume UTF-8. Has no effect for non-CSV files. Encoding (string) None

Examples

Load data from a web server with a timeout of 60 seconds:

source = hxl.data("http://example.org/data.csv", timeout=60)

Send the HTTP authorisation header "FOOBAR" with the request:

headers = { "Authorization": "FOOBAR" }
source = hxl.data("http://example.org/data.csv", http_headers=headers)

Load the second sheet from a local Excel file:

source = hxl.data("dataset.xlsx", allow_local=True, sheet=2)

Read HXL data from the value of the top-level "results" property in a JSON file:

source = hxl.data("http://example.org/dataset.json", selector="results")