Computer Science Projects - Portfolio Project: Creating A Database

This portfolio project was created as part of Codecademy's Computer Science Career Path course.

Project Overview

"Welcome to the portfolio project for CS103: Databases. In this project, you will create a database on your own computer that contains all of the elements that we’ve covered in this course.

If you have worked on other Off-Platform Projects over the course of CS103 (specifically the “Designing A Database From Scratch” project), you will likely already have a database with some of the foundational elements already set up. In that case, feel free to use that database as a starting point to get a bit of a head start." - Codecademy

Project objectives

Research the topic of your choosing to create a entity relationship diagram for your database.
Create the schema of the database using PostgreSQL.
Populate your database with some data.
Update your database to ensure your data stays intact. This could include adding constraints and roles.
Update your database to ensure your database stays fast. This could include adding indexes, normalising your database, and actively monitoring and maintaining your database.

Database: Greenwood High School

For this particular project I decided to extend the “Designing A Database From Scratch” project which had some of the foundational parts already completed for this project.

Database design

The database is based around a hypothetical UK secondary school. I focused the database around people who would be closely associated with the school and how they are linked together to decide on what information to add and how to organise it. I designed the following schema for the database, some modifications were made from the original design that I had (I've included the DBML database markup language file):

Database schema

The database schema was implemented using Postbird (which is an open source PostgreSQL GUI client).

SQL File URL: School Schema Tables

Database data

I got ChatGPT to generate hypothetical data based of the schema:

While using ChatGPT helped speed up the generation of data, it didn't keep the data consistent to what it was supposed to add for each individual. For instance, it added a staff member with a note to state they were an art teacher, but never added art teacher as a job. So there were some inconsistencies when trying to add the data that I needed to clear up when inserting data into the table.

SQL File URL: School Data

Database roles

I added several roles to group different access capabilities. Thereby limiting data to the various different groups of people as appropriate when accessing the database. Various test queries were run to check role permissions were set up correctly for accessing only the information expected for the assigned role.

admin_su (admin super user role):
- Can access and change anything in the database
- Assigned to Headteacher and Deputy Headteacher
student_role:
- Can select their own data
- Can select staff name, job title and department
- Assigned to all students
governor_role:
- Can select their own data
- Can select staff name, job title and department
- Can select governor's start and end date
- Assigned to all governors
staff_role:
- Can select their own data
- Can select staff name, job title and department
- Can view student subject choices

N.B. I only assigned some of the people in the data to the various roles, for testing the role permissions, notes have been left in the file where to create and grant appropriate roles to all users.

SQL File URL: School Roles

Database indexes and constraints

I also added indexes for the person table to help speed up searches on the database. Indexes were created on the columns first_name, last_name and a combined column index for (last_name, first_name). As the majority of the searches in the database will involve searching and looking up a person's name along with associated data about them.

The governor table was altered to add a constraint onto the end_date so that it would always be later than the current date.

SQL File URL: School Indexes

Database maintenance

Database maintenance consists of checking table sizes, analysing tables to check for dead tuples and then using VACUUM to tidy up and free up space within the database.

N.B. As the database currently consists of a small amount of data and no large data changes the maintenance will not have a significant impact on speed and space of the database. However, for more significant amounts of data carrying out such maintenance would have a more significant impact.

SQL File URL: School Maintenance

Process

Think of an idea to base the database around - what topic will it be focused on.
Project brainstorming on what data will be needed, what type of data it is and how to split that up into relevant tables.
Create the schema for the database.
Set up a GitHub repository.
Set up Git version control.
Write SQL queries to create the tables in the database.
Use ChatGPT to help create data to populate the database.
Add roles and grant roles to users.
- Test out the role permissions.
Add indexes.
- Test indexes.
Add constraints.
- Test constraints.
Carry out database maintenance:
- Check table sizes.
- Check for dead tuples.
- Vacuum to tidy up database and free up space.
Create blog post about project. (This README file is my post about the project I have created.)

What I learned

I had some issues with setting of access to users. When I tried to test the james_smith role, it had issues with selecting from certain tables. Upon checking, I found that the admin_su role had been assigned to both users but they hadn't been granted SUPERUSER access with it, so altered users to have superuser access.
```
CREATE ROLE admin_su WITH SUPERUSER;

CREATE USER james_smith WITH SUPERUSER LOGIN PASSWORD 'j.smith_password';
CREATE USER elizabeth_taylor WITH SUPERUSER LOGIN PASSWORD 'e.taylor_password';

GRANT admin_su TO james_smith;
GRANT admin_su TO elizabeth_taylor;
```

The most difficult part I had in this project was to make all of the role permissions work correctly. As I was restricting access to tables, and in some cases only allowing parts of tables to be accessed as well by different roles. This led me to learn about various ways to constrain with the use of policies e.g.:

CREATE POLICY student_person_policy ON person FOR SELECT
  USING (LOWER(first_name) || '_' || LOWER(last_name) = CURRENT_USER);

CREATE POLICY student_staff_policy ON person FOR SELECT
  USING (EXISTS(SELECT 1 FROM staff WHERE staff.person_id = person.id));

CREATE POLICY staff_person_select_policy ON person FOR SELECT
  USING (pg_has_role(current_user, 'staff_role', 'member'));

I learnt how to concatenate results into a string. I did a test query where it showed all subjects for students in years 10 and 11, in a single table cell so that the information about a particular student was all on one row.
```
string_agg(sub.subject, ' -- ') AS subjects
```

I also learnt how to print out size of all tables at once:

SELECT
  table_name,
  pg_size_pretty(pg_total_relation_size(table_schema || '.' || table_name)) AS total_size
FROM
    information_schema.tables
WHERE
    table_type = 'BASE TABLE'
    AND table_schema NOT IN ('pg_catalog', 'information_schema')
ORDER BY
    pg_total_relation_size(table_schema || '.' || table_name) DESC;

Potential improvements to database

Complete creating all users and granting them appropriate role permissions for all the people in the database.
Depending upon regular searches further indexes could be created to speed up searches.
More thorough analysis of database maintenance to take place if a significantly larger dataset used.

Useful resources

Author

V. Tickner

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
chatgpt_school_generated_data.jpg		chatgpt_school_generated_data.jpg
schema_school.png		schema_school.png
school.dbml		school.dbml
school_data.sql		school_data.sql
school_indexes.sql		school_indexes.sql
school_maintenance.sql		school_maintenance.sql
school_roles.sql		school_roles.sql
school_schema_tables.sql		school_schema_tables.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Science Projects - Portfolio Project: Creating A Database

Table of contents

Project Overview

Project objectives

Database: Greenwood High School

Database design

Database schema

Database data

Database roles

Database indexes and constraints

Database maintenance

Process

What I learned

Potential improvements to database

Useful resources

Author

About

Releases

Packages

VTickner/codecademy-creating-a-database

Folders and files

Latest commit

History

Repository files navigation

Computer Science Projects - Portfolio Project: Creating A Database

Table of contents

Project Overview

Project objectives

Database: Greenwood High School

Database design

Database schema

Database data

Database roles

Database indexes and constraints

Database maintenance

Process

What I learned

Potential improvements to database

Useful resources

Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages