class: middle, center
- Introduction with example
- Types of databases
- Relational and non relational
- SQL
- Workshop
-
Problem: storing information about 70,000 lines worth of DVC courses in a text file is inefficient, hard to manage, and slow at insertion, updating, and retrieval
-
Parsing the file, then storing the information in data structures such as hash tables is not great, becuase all that information is stored in RAM
-
Facebook stores 300 PB of data, an impossible amount to store in RAM
-
Updating information requires writing to the file
-
Many applications may want to access the data at one time
- Understanding databases is a big plus when finding internships - many ask for some knowledge of SQL
- Database Management Systems (DBMS) - Software that allows access to a database
- "Database" may refer to the both the data and the DBMS
- DBMSs structure data according to different models
- DBMSs use different querying languages to access data
- SQL is the most popular, and is used with many DBMSs
- Others use their own query languages
- Use tables of columns and rows to represent data
- Relate tables together using keys
- Most use SQL (Structured Query Language)
- Not very hip
- Examples: Oracle, MySQL, SQLite
-
A broad term referring to databases that do not use the relational table model
-
Key-Value Stores
- Act similarly to a hash table
- Records can hold different types of data
- Examples: Redis, Amazon DynamoDB
- Document Stores
- Similar to key-value stores - items may have different structures
- can have a nested structure, like JSON
- useful for holding arrays
- Examples: MongoDB, RethinkDB
Think about what data actually needs to be stored. Store it in a way that minimizes data redundancy. Think about what data types you will need and how many objects you have. Each object usually gets its own table.
Primary keys
Each table has a Primary Key (PK) column that can be used to uniquely identify a specific row. Also called IDs. They should be created whenever a new row is made and never changed. These are used when referring to data from other tables. Can be as simple as an auto-incrementing number.
-
One-to-many, aka arrays. Example: a teacher has many classes, but a class only has one teacher. Can't create a column of type array, so need to create a new table.
-
Many-to-many: a customer's order may contain one or more products; and a product can appear in many orders
-
One-to-one: Product may contain optional data. We only create a row in
productDetails
if the optional data is chosen.
- Sign up at heroku.com
- Grab the heroku toolbelt at toolbelt.heroku.com
- If on Mac,
brew install heroku-toolbelt
- If on Mac,
$ heroku login
Enter your Heroku credentials.
Email: jesse@example.com
Password (typing will be hidden):
Authentication successful.
$ mkdir intro-db
$ cd intro-db
$ git init
Initialized empty Git repository in ./intro-db/.git/
$ heroku create
Creating app... done, stack is cedar-14
https://***.herokuapp.com/ | https://git.heroku.com/***.git
-
Create the database
$ heroku addons:create heroku-postgresql:hobby-dev Creating postgresql-trapezoidal-55564... done, (free) Adding postgresql-trapezoidal-55564 to ...... done Setting DATABASE_URL and restarting ...... done, v3 Database has been created and is available ! This database is empty. If upgrading, you can transfer ! data from another database with pg:copy Use `heroku addons:docs heroku-postgresql` to view documentation.
-
Download data
$ heroku run bash Running bash on ....... up, run.7217 ~ $ wget https://github.com/dvcoders/intro-db/releases/download/1.0.0/StateNames.csv.tar.gz ... Saving to: 'StateNames.csv.tar.gz' 100%[=================================>] 31,686,560 41.7MB/s in 0.7s 2016-04-07 20:37:54 (41.7 MB/s) - 'StateNames.csv.tar.gz' saved [31686560/31686560] ~ $ tar -xf StateNames.csv.tar.gz
-
Connect to the database
~ $ psql $DATABASE_URL Running psql $DATABASE_URL on ....... up, run.1083 ... d2bh5qbs8qns39=>
-
Create the table
=> CREATE TABLE names ( id int, name varchar(50), year int, gender varchar(1), state varchar(20), count int );
-
Import data
=> \copy names FROM './StateNames.csv' WITH CSV; COPY 5647426
Structure of database
=> \d
List of relations
Schema | Name | Type | Owner
--------+-------+-------+----------------
public | names | table | cjbawbbpgwxjuc
(1 row)
=> \d names
Table "public.names"
Column | Type | Modifiers
--------+-----------------------+-----------
id | integer |
name | character varying(50) |
year | integer |
gender | character varying(1) |
state | character varying(20) |
count | integer |
Sample queries
=> SELECT * FROM names WHERE name='Jesse' and state='CA' and year=1996;
id | name | year | gender | state | count
--------+-------+------+--------+-------+-------
491828 | Jesse | 1996 | F | CA | 28
658580 | Jesse | 1996 | M | CA | 1191
(2 rows)
=> SELECT name, state FROM names WHERE year=1988 ORDER BY count DESC LIMIT 5;
name | state
-------------+-------
Michael | CA
Jessica | CA
Christopher | CA
Michael | NY
Daniel | CA
(5 rows)
=> SELECT SUM(count) FROM names WHERE year BETWEEN 1999 AND 2001;
sum
---------
9780944
(1 row)
More sample queries
=> SELECT DISTINCT name FROM names ORDER BY name ASC LIMIT 5;
name
---------
Aaban
Aadan
Aadarsh
Aaden
Aadhav
(5 rows)
=> SELECT year, SUM(count) FROM names GROUP BY year ORDER BY sum DESC LIMIT 5;
year | sum
------+---------
1957 | 4002231
1959 | 3955363
1960 | 3948284
1958 | 3933693
1961 | 3929909
(5 rows)