-
Notifications
You must be signed in to change notification settings - Fork 11
Data loading
David Megginson edited this page Jun 12, 2020
·
5 revisions
The Coding quick start introduced the hxl.data() function for loading HXL-hashtagged data from a file or URL:
import hxl
source = hxl.data("http://www.example.org/data.csv")
The function has several optional parameters to support special data-loading requirements, especially over the web:
Option | Description | Value type | Default |
---|---|---|---|
allow_local | If True, allow opening local files (in addition to URLs). | boolean | False |
sheet_index | If opening an Excel workbook (.xls or .xlsx), this specifies the sheet number to use, starting at 1 for the first tab. If not specified (None), the library will scan the Excel workbook for the first sheet that contains HXL hashtags. Has no effect for non-Excel sources. | integer | None |
timeout | Number of seconds to wait before giving up trying to connect to a remote server to download a file. If None, use the system default behaviour. (Note that the remote server may time out earlier on its own.) Has no effect when loading data from local files. | integer | None |
verify_ssl | If True, allow secure HTTPS: connections only for verified TLS/SSL certificates. You may want to set this to False if you're working with a self-signed certificate in an internal environment. Has no effect when loading data from local files. | boolean | True |
http_headers | A Python dictionary of HTTP headers and values to send with your request (e.g. "Authorization"). Has no effect when loading data from local files | dict | None |
selector | A top-level property name or JSONPath expression pointing to the top of the HXL dataset inside a JSON. If not specified, the library will search through the JSON data for something that looks like a dataset. Has no effect when loading data from non-JSON sources. | JSONPath (string) | None |
encoding | Specify the character encoding for CSV data. If not specified, the library will use the encoding provided by the web server (for an HTTP connection), or assume UTF-8. Has no effect for non-CSV files. | Encoding (string) | None |
Load data from a web server with a timeout of 60 seconds:
source = hxl.data("http://example.org/data.csv", timeout=60)
Send the HTTP authorisation header "FOOBAR" with the request:
headers = { "Authorization": "FOOBAR" }
source = hxl.data("http://example.org/data.csv", http_headers=headers)
Load the second sheet from a local Excel file:
source = hxl.data("dataset.xlsx", allow_local=True, sheet=2)
Read HXL data from the value of the top-level "results" property in a JSON file:
source = hxl.data("http://example.org/dataset.json", selector="results")
Standard: http://hxlstandard.org | Mailing list: hxlproject@googlegroups.com