Responsible for importing a raw data set into a table in a Postgres database
We often have data in a file on our filesystem and we want to import it into a
table in a Postgres database. Other database management systems like Microsoft
SQL Server offer a graphical import wizard, but Postgres does not. The
postgres-import-wizard (henceforth known as the wizard) fills this gap.
The wizard is used to copy a flat-data file such as a CSV or tab-delimited
text file into a Postgres table. Although Postgres provides a built-in
COPY command, it
requires the table to already exist; otherwise there's nothing to copy the data
into. To circumvent this issue, the wizard reads the header of the file you
want to load and uses those fields to define a table with text columns in
a database you specify.
No installation is required. The wizard is meant to be run directly from
source using Python 3.
That said, it's recommended to run this in a Python virtual
environment and install what's in
requirements.txt (which is really just
psycopg).
usage: postgres-import-wizard [-h] [--clean] [--delimiter {,,|, }] --file FILE
[--postgres_connection POSTGRES_CONNECTION]
[--schema SCHEMA] --table TABLE
Responsible for importing a raw data set into a table in a Postgres database
optional arguments:
-h, --help show this help message and exit
--clean Delete any table that already exists at
"SCHEMA"."TABLE" (default: False)
--delimiter {,,|, } Delimiter separating fields in FILE (default: ,)
--file FILE Path to data file to import (default: None)
--postgres_connection POSTGRES_CONNECTION
Postgres connection string for raw database (default:
postgresql://postgres@localhost:5433/raw)
--schema SCHEMA Name of schema in which to create table (default:
public)
--table TABLE Name of table to create (default: None)
Executing the wizard will print log messages on standard out within your
shell. It will also persist messages from your most recent run in a file under
a logs/ folder in your working directory. Log settings can be configured in
log.py, and the logs/ folder can safely be removed whenever.
In this section we provide a few examples illustrating usage of the wizard.
All examples are run using the default Postgres connection string provided by
the utility, so be sure that if you run them verbatim you have a local Postgres
instance running on your machine, available at port 5433, with a database named
raw. Also be sure to use a Postgres password
file to specify
credentials for your database user.
The example data sets provided under the examples/ folder are subsets of the
Private School Universe Survey (PSS) for
the 2017-18 school year provided by the National Center for Education
Statistics. The PSS is one of Niche's data
sources and provides a biennial census of
private schools in the United States of America.
Importing the example CSV data set provided under examples/ can be done by
executing
$ python main.py --file examples/data.csv --table "example1"Issuing that will load the comma-separated data set into a table named
example1 within the default public schema of the database.
We can reimport the data:
$ python main.py --clean --file examples/data.csv --table "example1"If we had omitted the --clean flag the wizard would have thrown an error
like
CRITICAL [2019-10-04 08:56:20] Exception occurred while loading raw data: relation "example1" already existsThis example functions much the same as CSV, Public Schema above except we try to load the data into a non-public schema:
$ python main.py --file examples/data.csv --schema "sonic" --table "example1"
INFO [2019-10-04 09:12:27] Starting the wizard
...
CRITICAL [2019-10-04 09:12:27] Exception occurred while loading raw data: schema "sonic" does not exist
LINE 2: CREATE TABLE "sonic"."example1" (
^
INFO [2019-10-04 09:12:27] Exiting the wizard. Goodbye.Oh no! We need to create the schema before we can create tables in it. After
creating a schema called sonic we can rerun the command above and everything
should complete fine.
The only tricky thing with importing a tab-delimited data file is that we need
to specify a literal \t as the delimiter. This can be accomplished by
escaping it in our argument list with $'\t', like the following
$ python main.py --delimiter $'\t' --file examples/data.txt --table "example2"