This accelerator will help you process Financial Annual Reports (10K filings) or even Wikipedia data about companies, using John Snow Labs Finance NLP Named Entity Recognition, Relation Extraction and Assertion Status, to extract the following information about companies:
- Information about the Company itself (
Trading Symbol
,State
,Address
, Contact Information) and other names the Company is known by (alias
,former name
). - People (usually management and C-level) working in that company and their past experiences, including roles and companies
Acquisitions
events, including the acquisition dates.Subsidiaries
mentioned.- Other Companies mentioned in the report as
competitors
: we will also run a "Competitor check", to understand if another company is just in the ecosystem / supply chain of the company or it is really a competitor - Temporality (
past
,present
,future
) and Certainty (possible
) of events described, includingForward-looking statements
.
Also, John Snow Labs provides with offline modules to check for Edgar database (Entity Linking to resolve an organization name to its official name and Chunk Mappers to map a normalized name to Edgar Database), which are quarterly updated. We will using them to retrieve the official name of a company
, former names
, dates where names where changed
, etc.
- Juan Martinez @ John Snow Labs juan@johnsnowlabs.com
- john.doe@databricks.com
© 2022 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.
library | description | license | source |
---|---|---|---|
johnsnowlabs==4.2.3 | Financial NLP library | Propietary | https://www.johnsnowlabs.com/finance-nlp/ |
networkx==2.5 | Knowledge Graph creation | 3-clause BSD | https://networkx.org/ |
decorator==5.0.9 | Python decorators | 2-clause BSD | https://github.com/micheles/decorator |
plotly==5.1.0 | Visualization library | MIT | https://plotly.com/ |
To run this accelerator, set up JSL Partner Connect AWS, Azure and navigate to My Subscriptions tab. Make sure you have a valid subscription for the workspace you clone this repo into, then install on cluster as shown in the screenshot below, with the default options. You will receive an email from JSL when the installation completes.
Once the JSL installation completes successfully, clone this repo into a Databricks workspace. Attach the RUNME
notebook to any cluster and execute the notebook via Run-All
. A multi-step-job describing the accelerator pipeline will be created, and the link will be provided. Execute the multi-step-job to see how the pipeline runs.