Skip to content
This repository has been archived by the owner on Oct 29, 2024. It is now read-only.

databricks-industry-solutions/jsl-financial-nlp

Repository files navigation

POC

Financial Solution Accelerator: Drawing a Company Ecosystem Graph

This accelerator will help you process Financial Annual Reports (10K filings) or even Wikipedia data about companies, using John Snow Labs Finance NLP Named Entity Recognition, Relation Extraction and Assertion Status, to extract the following information about companies:

  • Information about the Company itself (Trading Symbol, State, Address, Contact Information) and other names the Company is known by (alias, former name).
  • People (usually management and C-level) working in that company and their past experiences, including roles and companies
  • Acquisitions events, including the acquisition dates. Subsidiaries mentioned.
  • Other Companies mentioned in the report as competitors: we will also run a "Competitor check", to understand if another company is just in the ecosystem / supply chain of the company or it is really a competitor
  • Temporality (past, present, future) and Certainty (possible) of events described, including Forward-looking statements.

Also, John Snow Labs provides with offline modules to check for Edgar database (Entity Linking to resolve an organization name to its official name and Chunk Mappers to map a normalized name to Edgar Database), which are quarterly updated. We will using them to retrieve the official name of a company, former names, dates where names where changed, etc.



John Snow Labs Financial Solution Accelerator


© 2022 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.

library description license source
johnsnowlabs==4.2.3 Financial NLP library Propietary https://www.johnsnowlabs.com/finance-nlp/
networkx==2.5 Knowledge Graph creation 3-clause BSD https://networkx.org/
decorator==5.0.9 Python decorators 2-clause BSD https://github.com/micheles/decorator
plotly==5.1.0 Visualization library MIT https://plotly.com/

Instruction

To run this accelerator, set up JSL Partner Connect AWS, Azure and navigate to My Subscriptions tab. Make sure you have a valid subscription for the workspace you clone this repo into, then install on cluster as shown in the screenshot below, with the default options. You will receive an email from JSL when the installation completes.


Once the JSL installation completes successfully, clone this repo into a Databricks workspace. Attach the RUNME notebook to any cluster and execute the notebook via Run-All. A multi-step-job describing the accelerator pipeline will be created, and the link will be provided. Execute the multi-step-job to see how the pipeline runs.