A better version of sec-edgar-downloader
. Includes an alternative
implementation (a wrapper instead of a fork), to keep compatibility with
new sec-edgar-downloader
releases. This library partially uses
nbdev.
Advantages over sec-edgar-downloader
:
Flexibility in Download Process
- Tailored for choosing what, where, and how to download.
- Files stored in memory for faster operations and no unnecessary disk clutter.
Separate Metadata and File Downloads
- Easily skip unneeded files.
- Download metadata first, then selectively download files.
- Option to save metadata for better organization.
More Input Options
- Ticker or CIK (e.g.,
AAPL
,0000320193
) for latest filings. - Accession Number (e.g.,
0000320193-23-000077
). Not supported insec-edgar-downloader
. - SEC EDGAR URL (e.g.,
https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm
). Not supported insec-edgar-downloader
.
pip install sec-downloader
Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR’s fair access policy for programmatic downloading. Source
from sec_downloader import Downloader
dl = Downloader("MyCompanyName", "email@example.com")
Find a filing with an Accession Number
metadatas = dl.get_filing_metadatas("AAPL/0000320193-23-000077")
print(metadatas)
[FilingMetadata(accession_number='0000320193-23-000077',
form_type='10-Q',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm',
items='',
primary_doc_description='10-Q',
filing_date='2023-08-04',
report_date='2023-07-01',
cik='0000320193',
company_name='Apple Inc.',
tickers=[Ticker(symbol='AAPL', exchange='Nasdaq')])]
Alternatively, you can also use any of these to get the same answer:
metadatas = dl.get_filing_metadatas("aapl/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas("0000320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas(CompanyAndAccessionNumber(ticker_or_cik="320193", accession_number="0000320193-23-000077"))
Find the filing matching a SEC EDGAR Filing URL. Only CIK and Accession Number are used from the URL:
metadatas = dl.get_filing_metadatas(
"https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm"
)
print(metadatas)
[FilingMetadata(accession_number='0001193125-23-272204',
form_type='8-K',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm',
items='2.02,9.01',
primary_doc_description='8-K',
filing_date='2023-11-07',
report_date='2023-11-04',
cik='0001067983',
company_name='BERKSHIRE HATHAWAY INC',
tickers=[Ticker(symbol='BRK-B', exchange='NYSE'),
Ticker(symbol='BRK-A', exchange='NYSE')])]
Alternatively, you can also URLs in other formats and get the same answer:
metadatas = dl.get_filing_metadatas("https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm")
Find latest filings by company ticker or CIK:
from sec_downloader.types import RequestedFilings
metadatas = dl.get_filing_metadatas(
RequestedFilings(ticker_or_cik="MSFT", form_type="10-K", limit=2)
)
print(metadatas)
[FilingMetadata(accession_number='0000950170-23-035122',
form_type='10-K',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000095017023035122/msft-20230630.htm',
items='',
primary_doc_description='10-K',
filing_date='2023-07-27',
report_date='2023-06-30',
cik='0000789019',
company_name='MICROSOFT CORP',
tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')]),
FilingMetadata(accession_number='0001564590-22-026876',
form_type='10-K',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000156459022026876/msft-10k_20220630.htm',
items='',
primary_doc_description='10-K',
filing_date='2022-07-28',
report_date='2022-06-30',
cik='0000789019',
company_name='MICROSOFT CORP',
tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')])]
Alternatively, you can also use any of these to get the same answer:
metadatas = dl.get_filing_metadatas("2/msft/10-K")
metadatas = dl.get_filing_metadatas("2/789019/10-K")
metadatas = dl.get_filing_metadatas("2/0000789019/10-K")
The parameters limit
and form_type
are optional. If omitted, limit
defaults to 1, and form_type
defaults to ‘10-Q’.
metadatas = dl.get_filing_metadatas("NFLX")
print(metadatas)
[FilingMetadata(accession_number='0001065280-23-000273',
form_type='10-Q',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/1065280/000106528023000273/nflx-20230930.htm',
items='',
primary_doc_description='10-Q',
filing_date='2023-10-20',
report_date='2023-09-30',
cik='0001065280',
company_name='NETFLIX INC',
tickers=[Ticker(symbol='NFLX', exchange='Nasdaq')])]
Alternatively, you can also use any of these to get the same answer:
metadatas = dl.get_filing_metadatas("nflx")
metadatas = dl.get_filing_metadatas("1/NFLX")
metadatas = dl.get_filing_metadatas("NFLX/10-Q")
metadatas = dl.get_filing_metadatas("1/NFLX/10-Q")
metadatas = dl.get_filing_metadatas(RequestedFilings(ticker_or_cik="NFLX"))
metadatas = dl.get_filing_metadatas(RequestedFilings(limit=1, ticker_or_cik="NFLX", form_type="10-Q"))
After obtaining the Primary Document URL, for example from the metadata, you can proceed to download the HTML using this URL.
for metadata in metadatas:
html = dl.download_filing(url=metadata.primary_doc_url).decode()
print(html[:50])
break # same for all filings, let's just print the first one
'<?xml version="1.0" ?><!--XBRL Document Created wi'
Files are downloaded to a temporary folder, immediately read into memory, and then deleted. Let’s demonstrate how to download a single file (latest 10-Q filing details in HTML format) to memory. The “glob” pattern is used to select which files are read to memory.
from sec_edgar_downloader import Downloader as SecEdgarDownloader
from sec_downloader.download_storage import DownloadStorage
ONLY_HTML = "**/*.htm*"
storage = DownloadStorage(filter_pattern=ONLY_HTML)
with storage as path:
dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
dl.get("10-Q", "AAPL", limit=1, download_details=True)
# all files are now deleted and only stored in memory
content = storage.get_file_contents()[0].content
print(f"{content[:50]}...")
"<?xml version='1.0' encoding='ASCII'?>\n<html xmlns..."
Downloading multiple documents:
storage = DownloadStorage()
with storage as path:
dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
dl.get("10-K", "GOOG", limit=2)
# all files are now deleted and only stored in memory
for path, content in storage.get_file_contents():
print(f"Path: {path}\nContent [len={len(content)}]: {content[:30]}...\n")
('Path: sec-edgar-filings/GOOG/10-K/0001652044-24-000022/full-submission.txt\n'
'Content [len=13927595]: <SEC-DOCUMENT>0001652044-24-00...\n')
('Path: sec-edgar-filings/GOOG/10-K/0001652044-23-000016/full-submission.txt\n'
'Content [len=15264470]: <SEC-DOCUMENT>0001652044-23-00...\n')
Follow these steps to install the project locally for development:
- Install the project with the command
pip install -e ".[dev]"
.
Note We highly recommend using virtual environments for Python development. If you’d like to use virtual environments, follow these steps instead:
- Create a virtual environment
python3 -m venv .venv
- Activate the virtual environment
source .venv/bin/activate
- Install the project with the command
pip install -e ".[dev]"