What you'll learn
- Analyze data within a database using SQL and Python.
- Create a relational database and work with multiple tables using DDL commands.
- Construct basic to intermediate level SQL queries using DML commands.
- Compose more powerful queries with advanced SQL techniques like views, transactions, stored procedures, and joins.
Skills you'll gain: Python Programming, Cloud Databases, Relational Database Management System (RDBMS), SQL, Jupyter notebooks
- Module 1: Getting Started with SQL
- Module 2: Introduction to Relational Databases and Tables
- Module 3: Intermediate SQL
- Module 4: Accessing Databases using Python
- Module 5: Course Assignment
- Module 6: Bonus Module: Advanced SQL for Data Engineers (Honors)
- Lab
- References
- Median based salary(wage) : 110k
- job satisfaction score : 4.4/5
- Top spot on Glassoor's best jobs in America
- enables to communicate with databases
- every app needs to store data sommewhere : => big data => Table with a few rows => Small start up => Big Database
- Advantages : ** Boost your professional profile ** Good understanding of relation databases ** SQL stataments are FAST
=> ref : Course 2 : TOOLS FOR DATA SCIENCE => SQL
=> Facts (words, numbers) => Pictures => One of the most critical assets of any business => /!\ need to be secure & STORE => DATABASE
- A repository of data
- prodives the functionality for adding, modifying and querying the* data
- differents kinds of db, sotres data in different forms
- Relational DB (collection of two dimmensional tables like row, column) : => relationship between table
- sw to manage dbs
- controls : access, organization, storage ...
- uses in many industries : bank, tansportation, health ...
- tools : MySQL, Oracle DB, IBM Db2 ..
- Create a table
- Insert
- Select(read/retrieve)
- Update
- Delete
-
See/Retrive row from a table
-
select statement : query
-
result from the query : Result set/table
-
cmd : => select* from
-
retrieve subset or specific column => SELECT <column 1>, <column 2> from
- Restricts the result set
- Always requires a Predicate : => Evaluate to: True, False or unknown => Used in the search condition of the Where clause Ex: select <table_id> , title from WHERE <table_id>='COLOMN_LABLE'
- IBM Dev Lab notes
- COUNT : retrieves the number of rows matching the query criteria statement(query) : select COUNT(*) from tablename
- DISTINCT: remove duplicate values from a result set => retrieve unique values in a columns : statement(query) : select DISTINCT columnname from tablename
- LIMIT: retrieves the number of rows retrieved from the database statement (query) : select * from tablename LIMIT 10 => like head function in pandas
Hands-on Lab: COUNT, DISTINCT, LIMIT Ungraded External Tool : ok
- Create the table (CREATE TABLE Statement)
- Populate table w/ data => add new rows to a table (INSERT Statement) => INSERT INTO [tablename] <([ColumnName], ...)> VALUES([Value], ...) ... ([Value], ...) => Multiple rows -- INSERT INTO table_name (column1, column2, ... ) VALUES (value1, value2, ... ); --
- UPDATE : update a row value => UPDATE [tablename] SET[[ColumnName]=[Value]] SET([Value], ...) <WHERE [Condition]>
- DELETE : remove a row value => DELETE FROM [tablename] <WHERE [Condition]>
Hands-on Lab: INSERT, UPDATE, and DELETE Ungraded External Tool
Practice Quiz Practice Quiz Graded Quiz: Basic SQL Quiz•3 questions
///W2 : Introduction to Relational Databases and Tables///
- Most used data model
- Allows for data independence (logical data, physical data physical storage)
- Data is stored in a tables
- Entity Relationship Diagram (ERD) : => Entity : table (object) => Relationship : dependencies
- collections of entities
- tool to design relation databases : data type, dependencies...
- Entity : person, noun, place ... => attributes : charateristics of the entity or data elements
- Table : combination of rows and columns
- Entities => tables
- attributes get translated into columns => some data values to each of columns the completes the table form
Primary Keys : unique/id value to prevent duplication of data Foreign Keys : to create a link w/ others tables/entities
-
Cloud databases : database server build / accessed throught a CLoud platform
-
ease of use and acess => API => Web Interface
=> Cloud ou Remote Applications -
Scalability & Economics => Expand/Shrink Storage & Compute resources => Pay per use
-
Disaster Recovery => Cloud Backups
- IBM Bd2
- Databases for PostgresSQL
- Oracle Database CLoud Service
- MS Azure SQL Database
- Amazon Relational Database Services (RDS)
Available as : => VMs or Managed Service => Single or Mini-tenant
- DBaaS provides users access to Database resources in cloud w/out setting up and installing sw
- Database service instance holds data in data objects/tables
- use web interface & API => to get the data once data is loaded
(user API/web interface) ===(query)===> (CLoud DB) || <====== Resultset=====||
- On ibm cloud catalog => Db2 services
=> Defining Database Objects(define, change, drop data) : CREATE, ALTER, DROP(delete tables), RENAME, TRUNCATE(donot delete the table)
=> Updating database content(read and modify data) : INSERT, UPDATE, DELETE => Database searches : SELECT
=> Assurance of data integrity and security : GRANT, REVOKE, COMMIT, ROLLBACK...
DDL statement for creating entities(tables) in a relation db
- Syntax :
CREATE TABLE table_name ( column_name_1 datatype optional_parameters, column_name_2 datatype , ... column_name_n datatype , );
where : optional_parameters => constraints : Primary key, Foreign key ..
-- ALTER : => add or remove columns => modify the datatype of the column => add or remove a Keys => add or remove a constraints
- Syntax : => add a column ALTER TABLE <table_name> ADD COLUMN <column_name_1> datatype ... ADD COLUMN <column_name_n> datatype
=> modify a type ALTER TABLE <table_name> ALTER COLUMN <column_name_1> set DATA TYPE <new_datatype>
=> delete a column ALTER TABLE table_name DROP COLUMN telephone_number;
=> delete a table ALTER TABLE table_name;
=> delete the row values of table without deleting the table 'object' TRUNCATE TABLE table_name IMMEDIATE;
///W3 : Refining your Results///
- Syntax : SELECT from TABLE_NAME WHERE LIKE
: %(wild card charater) => use before/after the missing letter between ... and : intervals/range of numbers in ('values1', 'values2') : where valuesx => are columns
-
GROUP BY : => Syntax : => select country, count(country) from table_name GROUP BY country => select country, count(country) as new_col_result_set from table_name GROUP BY country
-
HAVING : => Syntax : => select country, count(country) as new_col_result_set from table_name GROUP BY country having count(country)>4
///w3: Functions, Multiple Tables, and Sub-queries///
- Most databases comme w/ built-in SQL functions
- Build-in functions can be included as part of SQL statements
- Database functions can significantly reduce the amount of data that needs to be retrieved
- Can speed up data processing
- INPUT : Collection of values(eg : entire column)
- OUTPUT : Single value
- Examples : SUM(), MIN(), MAX(), AVG(), etc
- SCALAR : ROUND(),
- STRING : LENGHT, UCASE, LCASE ...
- to extract day, month, year ...
- DATE : YYYYMMDD
- TIME : HHMMSS
- TIMESTAMP (20 digits) : YYYYXXDDHHMMSSZZZZZZ => where Z ... : microseconds
- Date / Time functions : => YEAR(), MONTH(), DAY, DAYOFMONTH, DYOFWEEK, DAYOFYEAR, WEEK, HOUR, MINUTE, SECOND, ...
=> Eg : Select DAY(RESCUEDATE) from PETRESCUE where ANIMAL='Cat'
- Date and Time Arithmetic is also possible : CURENT_DATE, CURRENT_TIME => Result : YMMDD
- Sub-Queries : A query inside another query
- Form more powerful queries
- eg : select COLUMN1 from TABLE where COLUMN2 = (select MAX(COLUMN2) from TABLE)
- 3 scenarios : => From the table => From specific COLUMN (Column Expressions) => From clause : Substitute TABLE name w/ a sub-query(Table Expressions)
-
Sub-queries
-
Implicit JOIN
-
JOIN Operators (INNER JOIN, OUTER JOIN, etc) => CROSS JOIN (also known as Cartesian Join) -- syntax : SELECT column_name(s) FROM table1, table2;
=> INNER JOIN -- syntax : SELECT column_name(s) FROM table1, table2 WHERE table1.column_name = table2.column_name;
SELECT * FROM EMPLOYEES WHERE DEP_ID IN (SELECT DEPT_ID_DEP FROM DEPARTMENTS); WHERE LOC_ID = 'L0002' --> SPECIFIC Value
-- SELECT * FROM EMPLOYEES, DEPARTMENTS; => The result is a full join(or Cartesian join): => Every row in the first table is joined with every row in the 2 one.
=> operations : retrieving elements/records which matches from 2 tables -- SELECT * FROM EMPLOYEES, DEPARTMENTS -- WHERE EMPLOYEES.DEP_ID = DEPARTMENTS.DEPT_ID_DEP
=> Using Aliases : -- SELECT * FROM EMPLOYEES E, DEPARTMENTS D -- WHERE E.DEP_ID = D.DEPT_ID_DEP;
///////////////////////////////////////////// /// W4 : Accessing Databases using Python /// /////////////////////////////////////////////
- Python power & efficiency
- Matlab notebook vs Jupyter notebook vs Apache Zeppelin, apache Spark, Databricks cloud ..
- Python communication w/ DBMS (Client) => (Notebook) => (API calls) => (DBMS)
- Contains functions and operators to communicate w/ DBMS
- requests, errors handling
(SQL-based DBMS ) (SQL API)
--------- MySQL -------- MySQL C API --------- PostgreSQL -------- psyscopg2 --------- IBM DB2 -------- ibm_db --------- SQL Server -------- dblib API --------- Database acces for MS -------- ODBC --------- Oracle -------- OCI --------- Java -------- JDBC
- Python'q standard API for accessing relational databases (RDB)
- Allows a single program that to work with multiple kinds of RDB
- Learn DB-API functions once, use them w/ any database
- Easy to implement and understand
- Encourages similarity between the Python modules used to access db
- Achieve consistency
- Portable acress databases
- Broad reach of database connectivity from Python (DB) (DB API) --------- IBM Db2 -------- ibm_db --------- MySQL -------- MySQL Connector/Python --------- PostgreSQL -------- psyscopg2 --------- MongoDB -------- PyMongo
- Connection Objects => Database connections => Manage transactions
- Curso Objects => Database Queries => Scroll trough result set => Retrieve results
Connection methods
- cursor, commit, roobacj, close ... Cursors methods
- callproc, execute, executemany, fetchone ...
[App] = [Cursor object] => [Database]
- cursor sees a db records as list of files ...
- keep the program current position
- import the dbmodule import connect
- create connection object
- create a cursor object
- run queries
- free resources (Cursor.close, connection.close)
- provides a variety of useful Python functions for acessing and manipulating data an IBM data server Database
- uses IBM Data Server Driver for ODBC and CLI APIs to connect to IBM DB2 and Informix
- using the python db api
- ibm_db.exec_immediate(conn_obj, "sql_queries") =>
- ibm_db.exec_immediate(conn_obj, "sql_queries") =>
- ibm_db.fetching(stmt)
- Using pandas => read_sql("sql_queries", conn_obj )
///W5: Assignment Preparation: Working with real-world data sets and built-in SQL functions///
- Working with CSV files (.CSV : COMMA SEPARATED VALUES)
- the first corresponds of the name of columns
- Data is non-sensitive (low != upper case) => SOLUTION "Id" == ID
- Spaces => Under_cores
- \ backslash are great to scape characters and go to the next line in case of long queries
-
Getting the List of tables in the database => DB2 : SYSCAT.TABLES => SQL Server : INFORMATION_SCHEMA.TABLES => Oracle : ALL_TABLES or USER_TABLES
-
syntax : SELECT * FROM SYSCAT.TABLES or SELECT TABSCHEMA, TABNAME, CREATE_TIME FROM SYSCAT.TABLES WHERE TABSCHEMA = 'DB2_USERNAME'
SELECT * FROM SYSCAT.COLUMNS WHERE TABNAME = 'TABLE_NAME'
or SELECT DISTINCT(NAME), COLTYPE, LENGHT FROM SYSIBM.SYSCOLUMNS WHERE TBNAME = 'TABLE_NAME'
///////////////////////////////////////////// /// W5 : Accessing Databases using Python /// ///////////////////////////////////////////// Assignment => OK
This course is part of: -IBM Data Analyst Professional Certificate -Applied Data Science Specialization -IBM Data Science Professional Certificate -IBM Data Engineering Foundations Specialization -IBM Data Engineering Professional Certificate (Coming in the Future)
As a next step, you can explore other courses in these programs, starting with: -Data Engineering and Data Analytics tracks: Data Analysis with Python -Data Engineering track: NoSQL Fundamentals (Coming soon...)
///////////////////////////////////////////// /// W6 : Advanced SQL For DataScience Engineer/// /////////////////////////////////////////////
/// Views, stored procedures and transactions ///
/// Joins statements ///
///Lab/// ///Quiz///
Notebooks: