Skip to content

Elijas/sec-downloader

Repository files navigation

sec-downloader

GitHub Workflow Status PyPI - Python Version PyPI version Licence

A better version of sec-edgar-downloader. Includes an alternative implementation (a wrapper instead of a fork), to keep compatibility with new sec-edgar-downloader releases. This library partially uses nbdev.

Features

Advantages over sec-edgar-downloader:

Flexibility in Download Process

  • Tailored for choosing what, where, and how to download.
  • Files stored in memory for faster operations and no unnecessary disk clutter.

Separate Metadata and File Downloads

  • Easily skip unneeded files.
  • Download metadata first, then selectively download files.
  • Option to save metadata for better organization.

More Input Options

  • Ticker or CIK (e.g., AAPL, 0000320193) for latest filings.
  • Accession Number (e.g., 0000320193-23-000077). Not supported in sec-edgar-downloader.
  • SEC EDGAR URL (e.g., https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm). Not supported in sec-edgar-downloader.

Install

pip install sec-downloader

How to use

Download the metadata

Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR’s fair access policy for programmatic downloading. Source

from sec_downloader import Downloader

dl = Downloader("MyCompanyName", "email@example.com")

Find a filing with an Accession Number

metadatas = dl.get_filing_metadatas("AAPL/0000320193-23-000077")
print(metadatas)
[FilingMetadata(accession_number='0000320193-23-000077',
                form_type='10-Q',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm',
                items='',
                primary_doc_description='10-Q',
                filing_date='2023-08-04',
                report_date='2023-07-01',
                cik='0000320193',
                company_name='Apple Inc.',
                tickers=[Ticker(symbol='AAPL', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("aapl/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas("0000320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas(CompanyAndAccessionNumber(ticker_or_cik="320193", accession_number="0000320193-23-000077"))

Find the filing matching a SEC EDGAR Filing URL. Only CIK and Accession Number are used from the URL:

metadatas = dl.get_filing_metadatas(
    "https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm"
)
print(metadatas)
[FilingMetadata(accession_number='0001193125-23-272204',
                form_type='8-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm',
                items='2.02,9.01',
                primary_doc_description='8-K',
                filing_date='2023-11-07',
                report_date='2023-11-04',
                cik='0001067983',
                company_name='BERKSHIRE HATHAWAY INC',
                tickers=[Ticker(symbol='BRK-B', exchange='NYSE'),
                         Ticker(symbol='BRK-A', exchange='NYSE')])]

Alternatively, you can also URLs in other formats and get the same answer:

metadatas = dl.get_filing_metadatas("https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm")

Find latest filings by company ticker or CIK:

from sec_downloader.types import RequestedFilings

metadatas = dl.get_filing_metadatas(
    RequestedFilings(ticker_or_cik="MSFT", form_type="10-K", limit=2)
)
print(metadatas)
[FilingMetadata(accession_number='0000950170-24-087843',
                form_type='10-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000095017024087843/msft-20240630.htm',
                items='',
                primary_doc_description='10-K',
                filing_date='2024-07-30',
                report_date='2024-06-30',
                cik='0000789019',
                company_name='MICROSOFT CORP',
                tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')]),
 FilingMetadata(accession_number='0000950170-23-035122',
                form_type='10-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000095017023035122/msft-20230630.htm',
                items='',
                primary_doc_description='10-K',
                filing_date='2023-07-27',
                report_date='2023-06-30',
                cik='0000789019',
                company_name='MICROSOFT CORP',
                tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("2/msft/10-K")
metadatas = dl.get_filing_metadatas("2/789019/10-K")
metadatas = dl.get_filing_metadatas("2/0000789019/10-K")

The parameters limit and form_type are optional. If omitted, limit defaults to 1, and form_type defaults to ‘10-Q’.

metadatas = dl.get_filing_metadatas("NFLX")
print(metadatas)
[FilingMetadata(accession_number='0001065280-24-000287',
                form_type='10-Q',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/1065280/000106528024000287/nflx-20240930.htm',
                items='',
                primary_doc_description='10-Q',
                filing_date='2024-10-18',
                report_date='2024-09-30',
                cik='0001065280',
                company_name='NETFLIX INC',
                tickers=[Ticker(symbol='NFLX', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("nflx")
metadatas = dl.get_filing_metadatas("1/NFLX")
metadatas = dl.get_filing_metadatas("NFLX/10-Q")
metadatas = dl.get_filing_metadatas("1/NFLX/10-Q")
metadatas = dl.get_filing_metadatas(RequestedFilings(ticker_or_cik="NFLX"))
metadatas = dl.get_filing_metadatas(RequestedFilings(limit=1, ticker_or_cik="NFLX", form_type="10-Q"))

Download the HTML files

After obtaining the Primary Document URL, for example from the metadata, you can proceed to download the HTML using this URL.

for metadata in metadatas:
    html = dl.download_filing(url=metadata.primary_doc_url).decode()
    print(html[:50])
    break  # same for all filings, let's just print the first one
"<?xml version='1.0' encoding='ASCII'?>\n<!--XBRL Do"

Alternative implementation: Wrapper

Files are downloaded to a temporary folder, immediately read into memory, and then deleted. Let’s demonstrate how to download a single file (latest 10-Q filing details in HTML format) to memory. The “glob” pattern is used to select which files are read to memory.

from sec_edgar_downloader import Downloader as SecEdgarDownloader
from sec_downloader.download_storage import DownloadStorage

ONLY_HTML = "**/*.htm*"

storage = DownloadStorage(filter_pattern=ONLY_HTML)
with storage as path:
    dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
    dl.get("10-Q", "AAPL", limit=1, download_details=True)
# all files are now deleted and only stored in memory

content = storage.get_file_contents()[0].content
print(f"{content[:50]}...")
"<?xml version='1.0' encoding='ASCII'?>\n<!--XBRL Do..."

Downloading multiple documents:

storage = DownloadStorage()
with storage as path:
    dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
    dl.get("10-K", "GOOG", limit=2)
# all files are now deleted and only stored in memory

for path, content in storage.get_file_contents():
    print(f"Path: {path}\nContent [len={len(content)}]: {content[:30]}...\n")
('Path: sec-edgar-filings/GOOG/10-K/0001652044-25-000014/full-submission.txt\n'
 'Content [len=14773739]: <SEC-DOCUMENT>0001652044-25-00...\n')
('Path: sec-edgar-filings/GOOG/10-K/0001652044-24-000022/full-submission.txt\n'
 'Content [len=13927595]: <SEC-DOCUMENT>0001652044-24-00...\n')
from sec_downloader import Downloader
dl = Downloader("MyCompanyName", "my.email@domain.com")
metadata = dl.get_filing_metadatas("https://www.sec.gov/Archives/edgar/data/2488/000000248825000009/0001193125-10-034401-index.htm", include_amends=True)
metadata
[FilingMetadata(accession_number='0000002488-25-000009', form_type='8-K', primary_doc_url='https://www.sec.gov/Archives/edgar/data/2488/000000248825000009/amd-20250204.htm', items='2.02,7.01,9.01', primary_doc_description='8-K', filing_date='2025-02-04', report_date='2025-02-04', cik='0000002488', company_name='ADVANCED MICRO DEVICES INC', tickers=[Ticker(symbol='AMD', exchange='Nasdaq')])]

Frequently Asked Questions (FAQ)

Q: I’m encountering a ValueError stating ‘8-K/A’ forms are not supported. How can I download “8-K/A” filings?

A: To download amended filings such as “8-K/A”, you need to set the parameter include_amends=True. For example, to download this filing, you would use:

from sec_downloader import Downloader
dl = Downloader("MyCompanyName", "my.email@domain.com")
str_cik = "0000002488"
str_acc = "0001193125-10-034401"
metadata = dl.get_filing_metadatas(f"{str_cik}/{str_acc}", include_amends=True)
metadata
[FilingMetadata(accession_number='0001193125-10-034401', form_type='8-K', primary_doc_url='https://www.sec.gov/Archives/edgar/data/2488/000119312510034401/d8ka.htm', items='5.02', primary_doc_description='FORM 8-K/A', filing_date='2010-02-19', report_date='2010-02-16', cik='0000002488', company_name='ADVANCED MICRO DEVICES INC', tickers=[Ticker(symbol='AMD', exchange='Nasdaq')])]

Contributing

Follow these steps to install the project locally for development:

  1. Install the project with the command pip install -e ".[dev]".

Note We highly recommend using virtual environments for Python development. If you’d like to use virtual environments, follow these steps instead:

  • Create a virtual environment python3 -m venv .venv
  • Activate the virtual environment source .venv/bin/activate
  • Install the project with the command pip install -e ".[dev]"