This is a package that builds a local postgres database of SEC filings taken from ftp.sec.gov
This package can be pip installed into the desired directory:
pip install edgerdb
To create the database and insert the index files from ftp.sec.gov do the following:
from edgerdb import EdgerDb edger = EdgerDb() edger.create_and_load()
This installs a database with three tables.
- filings
- loaded_master_files
- last_updated
filings is the table that will contain information on all the SEC filings.
loaded_master_files contains a list of all the files currently loaded into the filings table
last_updated has the time that the last file was loaded into the database
To remove the database and user run:
edger.delete_everything()
Some functions are built in and can be used by importing helper_functions:
from edgerdb import helper_functions as hlp
The most used functions will be db(), old_db(), statement(), clear_sessions() and retrieve_document().
db() is used to create a open a connection object with the postgres database.
It is important to close the connection after every operation is performed.
con = hlp.db() con.close()
statement() is used to run SQL queries on the database. statement() takes in the sql query as a string, a connection object and has optional keyword arguments. If close defaults to True to automatically close the connection after the query is run.
statement(statement, connection, commit=False, close=True, output=True)
Ex:
top_five_paths = hlp.statement("select path from filings limit 5;", hlp.db(), close=True)
retrieve_document() requires a path to file from filings table. It takes this as input and downloads a copy of the file from edgar and stores it in a "sec_filings" directory in the same directory as your project. This can be changed with the optional directory keyword argument.
Ex:
for path in top_five_paths: hlp.retrieve_document(path)
clear_sessions() can be used to clear running sessions on either the sec database or the main postgres database. The function requires two arguments.
clear_sessions(dbname, connection)
dbname is the name of the database and connection is a connection object. To clear sessions on the edgar database use db() and for the generic database use old_db().
Ex:
hlp.clear_sessions('edgar', hlp.db())
hlp.clear_sessions('edgar', old_db())
The database can easily be updated by providing the last date from the files in the database and the list of daily_files
Ex:
from edgerdb import helper_functions as hlpdaily_files = hlp.generate_daily_file_paths()
last_date_in_db = int(hlp.latest_index_in_db('filings', hlp.db())[0])
hlp.load_latest_files(daily_files, last_date=last_date_in_db)
dir() can be used to explore the other functions that come with helper_functions
dir(hlp)