Skip to content
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.
/ edgerdb Public archive

This is a repo for a python package that creates a database,seeds it with SEC filings, and provides tools to further analysis on the filings

License

Notifications You must be signed in to change notification settings

lancekrogers/edgerdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

edgerdb

This is a package that builds a local postgres database of SEC filings taken from ftp.sec.gov

This package can be pip installed into the desired directory:

pip install edgerdb

To create the database and insert the index files from ftp.sec.gov do the following:

from edgerdb import EdgerDb
edger = EdgerDb()
edger.create_and_load()

This installs a database with three tables.

  • filings
  • loaded_master_files
  • last_updated

filings is the table that will contain information on all the SEC filings.

loaded_master_files contains a list of all the files currently loaded into the filings table

last_updated has the time that the last file was loaded into the database


To remove the database and user run:

edger.delete_everything()

Some functions are built in and can be used by importing helper_functions:


from edgerdb import helper_functions as hlp

The most used functions will be db(), old_db(), statement(), clear_sessions() and retrieve_document().

db() is used to create a open a connection object with the postgres database.

It is important to close the connection after every operation is performed.


con = hlp.db()

con.close()

statement() is used to run SQL queries on the database. statement() takes in the sql query as a string, a connection object and has optional keyword arguments. If close defaults to True to automatically close the connection after the query is run.

  statement(statement, connection, commit=False, close=True, output=True)

Ex:

top_five_paths = hlp.statement("select path from filings limit 5;", hlp.db(), close=True)

retrieve_document() requires a path to file from filings table. It takes this as input and downloads a copy of the file from edgar and stores it in a "sec_filings" directory in the same directory as your project. This can be changed with the optional directory keyword argument.

Ex:

for path in top_five_paths:
    hlp.retrieve_document(path)

clear_sessions() can be used to clear running sessions on either the sec database or the main postgres database. The function requires two arguments.

clear_sessions(dbname, connection)

dbname is the name of the database and connection is a connection object. To clear sessions on the edgar database use db() and for the generic database use old_db().

Ex:

hlp.clear_sessions('edgar', hlp.db())
hlp.clear_sessions('edgar', old_db())

The database can easily be updated by providing the last date from the files in the database and the list of daily_files

Ex:

from edgerdb import helper_functions as hlp

daily_files = hlp.generate_daily_file_paths()

last_date_in_db = int(hlp.latest_index_in_db('filings', hlp.db())[0])

hlp.load_latest_files(daily_files, last_date=last_date_in_db)

dir() can be used to explore the other functions that come with helper_functions

dir(hlp)

About

This is a repo for a python package that creates a database,seeds it with SEC filings, and provides tools to further analysis on the filings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages