Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP standardizer : TradingSymbol, SecurityExchangeName #12

Closed
ebadi opened this issue Aug 23, 2024 · 2 comments
Closed

CP standardizer : TradingSymbol, SecurityExchangeName #12

ebadi opened this issue Aug 23, 2024 · 2 comments

Comments

@ebadi
Copy link
Contributor

ebadi commented Aug 23, 2024

Hello,
I tried using the code in the medium_intro_secfsds.ipynb notebook to extract the TradingSymbol, SecurityExchangeName from the cover page (CP ) statements.

Screenshot from 2024-08-23 21-51-01

As you can see in my jupyter notebook these tags are present in pre_df. However when I try to merge it with num_df to get their value, I am not successful. From CP statements, I also find other tags such as EntityCommonStockSharesOutstanding , EntityAddressStateOrProvince to be interesting. I really hope that I won't need to write a new CP standardizer.

What is the simplest way to do cik or even adsh to TradingSymbol, SecurityExchangeName lookup?

In general, it would be great if there was a function (similar to ZipCollector) that without any standardization, we get the columns of data (tags) when we pass

  • a report or a company
  • a filter (on column, statement , form ...)

Footnote

I also noticed that the version column in num_df has adsh values!

image

@HansjoergW
Copy link
Owner

Hi @ebadi
The NUM_df, as the name suggests, only contains numeric values. The SEC financial statement data sets (https://www.sec.gov/about/dera_financial-statement-data-set) only contains numeric values. See also the comments in my Medium article (https://medium.com/@hansjoerg.wingeier/understanding-the-sec-financial-statement-data-sets-6148e07d1715) where I describe the "stmt" field for the PRE file:

stmt: Statement type. This can be BS (Balance Sheet), IS (Income Statement), CF (Cash Flow), EQ (Equity), CI (Comprehensive Income), UN (Unclassifiable Statement), CP (Cover Page). The most important ones are BS, IS, and CF. You will only find a few CP entries in the Financial Statement Data Sets, since this data set does not include text values.

If we want "text" data, we would need to use the SEC financial statement and notes data set (https://www.sec.gov/data-research/financial-statement-notes-data-sets). However, this one is about 10 times larger. I might be writing a version 2 of the library somewhen in the future, which will use that larger dataset. But I haven't decided yet.

So bottom line: There is no textual information in the SEC financial statement data set.

However, the SEC does maintain a list with cik to Symbol mapping:

There is also a python library addressing this, but the last release was over 2 years ago:

The about your footnote:

If you see "adsh" number in the "version" column, it means that the company is not using the definition of the official US-GAAP definition tags. It might be the same name as the official tag, but they might use a somewhat different "definition" of it. That should be mentioned somewhere in the notes of a report.

This is actually, what the OfficialTagsOnlyXXFilter filters out.

Fun fact as a side note: The SEC actually trags the "usage" of custom tags on a special page:
https://www.sec.gov/data-research/gaap-xbrl-custom-tags

@ebadi
Copy link
Contributor Author

ebadi commented Aug 25, 2024

Thank you for another comprehensive response.

I am currently using sec-cik-mapper but it misses many records. e.g. unlisted companies. (jadchaar/sec-cik-mapper#5) .The current mapping includes 7,975 entries. I will try the list provided by SEC which appears to be a more comprehensive with 12,084 lines. This might resolve my issue.

@ebadi ebadi closed this as completed Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants