Bringing together the power of SQL, Python, and Javascript. Run raw, multi-threaded SQL in an IPython Notebook while concurrently executing python cells, and a ton of other features.
See the tutorials here:
• Most recent feature
• Installation and Configuration
• Features
i. Parameters
ii. Flags
iii. Pass python variables to SQL
iv. psql
metacommands
v. Multi-threading
vi. Buttons
vii. Inline editing
viii. Easy-to-read Query Plan Table
ix. Easy-to-read Query Plan Graph
x. Alter column name and type via the UI
xi. Notifications
xii. pg_dump
support
xiii. Switch Engines (To be documented...)
• In development
i. Built-in PostGIS preview (inspired by postgis-preview
ii. MSSQL support.
• To dos
i. Add UI elements to perform basic, common database tasks, such as adding columns, creating tables, etc.
ii. Need to confirm install process is smooth on non-Mac computers.
iii. Add support for MySQL commands.
iv. Add modifiers and constraints via the UI
Open issues can be found here.
added 12/04/2016
SQLCell now offers the option to view the Query Plan as a sankey graph built with D3.js
Just clone the repo and cp
the sqlcell_app.py
file to Jupyter's startup directory (on my computer, the directory is ~/.ipython/profile_default/startup
, but may be different depending on your OS and version of IPython/Jupyter):
$ cd .ipython/profile_default/startup # or wherever your startup directory is
$ git clone https://github.com/tmthyjames/SQLCell.git
$ cp SQLCell/sqlcell_app.py sqlcell_app.py # place sqlcell_app.py in the startup folder so it will get executed
Then in the engine_config.py file, define your connection variables. If you don't add them to engine_config.py,
then you'll have to pass a connection string to the ENGINE parameter everytime you use %%sql
, like so:
In [2]: %%sql ENGINE='postgresql://username:password@host:port/database'
SELECT * FROM table;
To save the engines:
%%sql --declare_engines new
LOCAL=postgresql://username:password@localhost:5432/
DEV=postgresql://username:password@random.domain.com/
See more about this option in the Declare Engines section
Now you are ready to ditch pgAdmin or whatever SQL interface you use. Continue reading to see all the available options, like writing results to a CSV, using SQLAlchemy named parameters and more.
Available parameters:
• DB
: Determines what database to query. On the first run, this parameter is required. After that, it will remember what database was chosen. To change databases, use this parameter again. Default is the last-specificed database.
• PATH
: Writes results of the query to a CSV (can also be done through the UI). No default.
• MAKE_GLOBAL
: Passes results of the query to the variable you pass to it. If this parameter is specified but the RAW
parameter is not, then the results will be a Pandas DataFrame. If RAW
is set to True
, then the results will be the raw RowProxy returned from the database. No Default.
• RAW
: Determines whether the data will be of type DataFrame or RowProxy. Default: False
.
• DISPLAY
: Determines whether or not to render the results as a table. This is best used in conjunction with the MAKE_GLOBAL
parameter because displaying a table in a busy workflow can be cumbersome and annoying sometimes.
• ENGINE
: Speicifies what host, database to connect to. Default is the connection that is specified in the engine_config.py file. If the engine_config.py file is not configured, then the ENGINE
parameter is required.
• TRANSACTION_BLOCK
': Determines whether the query will be executed inside a transaction block or not. This is useful when creating a database, dropping a database, VACUUM ANALYZE
ing a database, or any other query statements that cannot be run inside a transaction block. Default: True
• EDIT
: Enables inline editing. To use this, you must specify only one table in your query, and that table must have a primary key. Default: False.
• NOTIFY
: Disables notifications for finished queries. Default: True.
Examples of how to use these are below.
After adding your connection details to engines.py, run your first query with the DB argument:
In [3]: %%sql DB=bls
SELECT *
FROM la_unemployment
LIMIT 3
series_id | year | period | value | footnote_codes | |
---|---|---|---|---|---|
1 | LASST470000000000003 | 1976 | M01 | 6.2 | None |
2 | LASST470000000000003 | 1976 | M02 | 6.1 | None |
3 | LASST470000000000003 | 1976 | M03 | 6.0 | None |
For the rest of the session, you won't have to use the DB argument unless you want to change databases. And the last-used DB will be persisted even after you shut down Jupyter and start it back up next time.
In [4]: %%sql
SELECT *
FROM avg_price LIMIT 3
series_id | year | period | value | |
---|---|---|---|---|
1 | APU0000701111 | 1980 | M01 | 0.203 |
2 | APU0000701111 | 1980 | M02 | 0.205 |
3 | APU0000701111 | 1980 | M03 | 0.211 |
To switch databases, just invoke the DB argument again with a different database:
In [5]: %%sql DB=sports
SELECT *
FROM nba LIMIT 3
dateof | team | opp | pts | fg | fg_att | ft | ft_att | fg3 | fg3_att | off_rebounds | def_rebounds | asst | blks | fouls | stls | turnovers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2015-10-27 | DET | ATL | 106 | 37 | 96 | 20 | 26 | 12 | 29 | 23 | 36 | 23 | 3 | 15 | 5 | 15 |
2 | 2015-10-27 | ATL | DET | 94 | 37 | 82 | 12 | 15 | 8 | 27 | 7 | 33 | 22 | 4 | 25 | 9 | 15 |
3 | 2015-10-27 | CLE | CHI | 95 | 38 | 94 | 10 | 17 | 9 | 29 | 11 | 39 | 26 | 7 | 21 | 5 | 11 |
To write the data to a CSV, use the PATH argument:
In [6]: %%sql PATH='/<path>/<to>/<file>.csv'
SELECT *
FROM nba LIMIT 3
dateof | team | opp | pts | fg | fg_att | ft | ft_att | fg3 | fg3_att | off_rebounds | def_rebounds | asst | blks | fouls | stls | turnovers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2015-10-27 | DET | ATL | 106 | 37 | 96 | 20 | 26 | 12 | 29 | 23 | 36 | 23 | 3 | 15 | 5 | 15 |
2 | 2015-10-27 | ATL | DET | 94 | 37 | 82 | 12 | 15 | 8 | 27 | 7 | 33 | 22 | 4 | 25 | 9 | 15 |
3 | 2015-10-27 | CLE | CHI | 95 | 38 | 94 | 10 | 17 | 9 | 29 | 11 | 39 | 26 | 7 | 21 | 5 | 11 |
And my favorite. You can assign the dataframe to a variable like this useing the MAKE_GLOBAL argument:
In [9]: %%sql MAKE_GLOBAL=WHATEVER_NAME_YOU_WANT DB=bls
SELECT *
FROM la_unemployment
WHERE year = 1976
AND period = 'M01'
LIMIT 3
series_id | year | period | value | footnote_codes | |
---|---|---|---|---|---|
1 | LASST470000000000003 | 1976 | M01 | 6.2 | None |
2 | LASST470000000000004 | 1976 | M01 | 111152.0 | None |
3 | LASST470000000000005 | 1976 | M01 | 1691780.0 | None |
And call the variable:
In [10]: WHATEVER_NAME_YOU_WANT
series_id | year | period | value | footnote_codes | |
---|---|---|---|---|---|
1 | LASST470000000000003 | 1976 | M01 | 6.2 | None |
2 | LASST470000000000004 | 1976 | M01 | 111152.0 | None |
3 | LASST470000000000005 | 1976 | M01 | 1691780.0 | None |
You can also return the raw RowProxy from SQLAlchemy by setting the RAW argument to True
and using the MAKE_GLOBAL
argument.
In [10]: %%sql MAKE_GLOBAL=data RAW=True
SELECT *
FROM la_unemployment
LIMIT 3
In [11]: data
[(u'LASST470000000000003', 1976, u'M01', 6.2, None),
(u'LASST470000000000003', 1976, u'M02', 6.1, None),
(u'LASST470000000000003', 1976, u'M03', 6.0, None)]
Query the data without rendering the table (useful if the result set is prohibitively large and displaying the table breaks things) by setting the DISPLAY
parameter to False
. It makes sense to use this parameter in conjunction with the MAKE_GLOBAL
parameter so the data is passed to the variable but the table isn't rendered:
In [10]: %%sql MAKE_GLOBAL=data DISPLAY=False
SELECT *
FROM la_unemployment
The ENGINE
parameter accepts any connection string and creates a connection based on that.
In [10]: %%sql ENGINE='postgresql://username:password@host:port/DB'
SELECT *
FROM la_unemployment
LIMIT 3
Some SQL statements (VACUUM
, CREATE <db>
, DROP <db>
, etc.) must be executed outside of a transaction block by setting the isolation level to 0 (see this)
In [10]: %%sql TRANSACTION_BLOCK=False
VACUUM ANALYZE <table_name>
Enables inline editing.
In [10]: %%sql EDIT=True
SELECT *
FROM la_unemployment
LIMIT 3
Will display a table where the cells can be clicked on and edited.
In [11]: %%sql NOTIFY=False
SELECT * FROM la_unemployment LIMIT 1
Will disable notifications for the remainder of your Jupyter session. To re-enable notifications, just set NOTIFY=True
.
• declare_engines
: Makes adding engines to the engines.py file easy.
• pg_dump
: Run --pg_dump
commands from your Jupyter Notebook.
All engines should be in this format: name=connection_string
. For example:
%%sql --declare_engines new
LOCAL=postgresql://username:password@localhost:5432/
DEV=postgresql://username:password@random.domain.com/
Will create the ENGINES object with only LOCAL and DEV in it. "LOCAL" will be the text that goes on the button, like the following:
To append new engines to an existing ENGINES object:
%%sql --declare_engines append
LOCAL_test=postgresql://username:password@localhost:5432/
DEV_test=postgresql://username:password@random.domain.com/
Where I had the engines LOCAL, DEV, and PROD I now have LOCAL_test and DEV_test also.
In[17]: %%sql --pg_dump
-t nba sports --schema-only
Will output the following:
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: nba; Type: TABLE; Schema: public; Owner: postgres
--
CREATE TABLE nba (
dateof date,
team character varying(5),
opp character varying(5),
pts bigint,
fg integer,
fg_att integer,
ft integer,
ft_att integer,
fg3 integer,
fg3_att integer,
off_rebounds integer,
def_rebounds integer,
asst integer,
blks integer,
fouls integer,
stls integer,
turnovers integer
);
ALTER TABLE nba OWNER TO postgres;
To pass python variables to your queries, just do the following.
In[7]: # define your parameters in a python cell
name = '1976'
period = 'M01'
series_id = ('LASST470000000000005', 'LASST470000000000004', 'LASST470000000000003')
Now in a %%sql
cell:
In [8]: %%sql DB=bls
SELECT *
FROM la_unemployment
WHERE year = %(year)s
AND period = %(period)s AND series_id IN %(series_id)s
LIMIT 3
You can also use a colon to indicate your variables:
In [8]: %%sql DB=bls
SELECT *
FROM la_unemployment
WHERE year = :year
AND period = :period AND series_id IN :series_id
LIMIT 3
Both output the following table:
series_id | year | period | value | footnote_codes | |
---|---|---|---|---|---|
1 | LASST470000000000003 | 1976 | M01 | 6.2 | None |
2 | LASST470000000000004 | 1976 | M01 | 111152.0 | None |
3 | LASST470000000000005 | 1976 | M01 | 1691780.0 | None |
In [1]: %%sql DB=bls
\dp
Schema | Name | Type | Access privileges | Column privileges | Policies | |
---|---|---|---|---|---|---|
1 | public | avg_price | table | nan | nan | nan |
2 | public | la_unemployment | table | nan | nan | nan |
3 | public | tu_atus | table | nan | nan | nan |
In [2]: %%sql
\d avg_price
Column | Type | Modifiers | |
---|---|---|---|
1 | series_id | character varying(17) | nan |
2 | year | integer | nan |
3 | period | character varying(3) | nan |
4 | value | real | nan |
In [3]: %%sql DB=sports
\COPY public.nba (dateof, team, opp, pts, fouls) to '/<path>/<to>/<file>.csv'
Out[3]: <p>COPY 3092</p>
All queries are executed on their own thread, so you can run as many queries as your box will allow while concurrently executing python code.
Buttons include
• Viewing Query Plan with d3.js sankey graph
• Running Explain Analyze on your query
• executing query
• executing query and returning SQLAlchemy results in a variable
• saving to a TSV
• stopping query
• swithcing between user-defined engines (button group on right; see Declare Engines for instructions on how to define engines.)
Set the EDIT
parameter to True
to enable inline editing. As long as you are querying one table and that table has a primary key, then you can edit it using the UI.
This includes a heatmap-like color scale to indicate problem spots in your query.
Sankey graph that uses a heatmap-like color scale to indicate problem spots in your query, built with D3.js.
To edit the column info via the UI, use the \d <table-name>
metacommand and the EDIT
parameter.
In[15]: %%sql DB=sports EDIT=True
\d nba
SQLCell now includes "Growl"-like, Bootstrap-styled notifications using mouse0270's awesome bootstrap-notify. The entire query is in the pre
tag and scrollable, and clicking the notification will focus the window on the results of that query.
And that's it.
Enjoy and contribute.