Smart Documents and Analysis

Analyze a set of documents (from companies) to give a useful snapshot or summary. Original Documents are left unmodified, the only reports modified are in the 'output' folder

Expected folder structure

Most scripts have a parameter to point to the client documents (e.g. '..'). There is also parameters to exclude folders (e.g. like the z_scripts folder itself) from the analysis

_ _todo check _ _

CLIENT DOCS
Client 1 Folder \ sample word doc \ sub folder \ sample xl doc
client 2 Folder

Scripts

The key user scripts are in the main app directory. The top of each script contains parameters to modify the script behaviour

one_client_pdf.py - gather all client data into one pdf
snapshot.py - make a summary of client documents (e.g. last updated, size, number, sentiment etc)

Script Struture

add text ..

What can you do with this project?

This project allows you to:

Quickly summarize key information from a collection of documents.
Identify trends and patterns in client communications.
Automate the process of answering common business questions.
Integrate with other tools like Power Automate for automated workflows.

add text ..

Other key features:

add text ..

Keeping confidential information confidential by default

For obvious reasons only generic code and no information / knowledge is shared in this GitHub project. This has the benefit of you being able to add only your own documents when you run your secure local copy. Other key project features with confidentiality in mind:

Information is stored locally by default in Elastic Search
LLM (Lllama) runs locally (with options to use other remote LLMs)
Confidential information redacted before sending to LLM (see setting in config file is you wish to turn this off)

To further limit exposure, we recommend care in ingesting only emails and documents that have already been sent externally. Since the project is open source, you can fully audit the code before use.

Key sections in this guide

add text ..

Underlying technologies:

add text ..

Configuration

The main configuration file is located at app/config/config.conf. This file controls various aspects of the application, including:

Data source locations
LLM selection (local or remote)
Redaction settings
API keys

Refer to the comments within the app/config/config.conf file for detailed explanations of each setting.

__ todo - implement this __

The main confirmation file is in app/config/config.conf . This config file is shared for the ingest script, the Bot and the Application. Please edit this using the notes in the app/config folder.
Some APIs (Copilot, OpenAI, Teamworks helpdesk) require tokens the first time they are run. Please consult the documentation of these tools to retrieve these.
The script will ask you for these tokens and store them locally. This is a plain text json file(token-storage-local.json). While it is excluded from storage in git, you may wish to review who has access to it locally as it will store sensitive information.

First time Setup

To setup the project on your local machine, then run for the first time:

Checkout / download the project as a folder onto the host computer from the source __ todo update link __ https://github.com/paulbrowne-irl/smart-document-analysis
Install Python (3.12 or higher) in the usual way. Python pip and virtualenv tools are also needed.
Install Python dependencies - in a terminal window, at the project root
- Create virtual environment: virtualenv venv
- Activate virtual environment: source venv/bin/activate
- Install Python dependencies for this environment: pip install -r requirements.txt

Running the application

add text ..

Running the Application

add text ..

Contributing

Contributions to this project are welcome! Please see the CONTRIBUTING.md file for guidelines on how to contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.vscode		.vscode
app		app
cypher		cypher
info		info
lab		lab
output		output
templates		templates
tests		tests
tool_crawl_web_source_to_md		tool_crawl_web_source_to_md
tool_download_via_robotic_edge		tool_download_via_robotic_edge
z_older		z_older
.gitignore		.gitignore
LICENSE		LICENSE
config.ini		config.ini
readme-embed-pdf-extract.md		readme-embed-pdf-extract.md
readme.md		readme.md
requirements.txt		requirements.txt
smart.code-workspace		smart.code-workspace
todo-prev.md		todo-prev.md
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart Documents and Analysis

Table of Contents

Expected folder structure

Scripts

Script Struture

What can you do with this project?

Keeping confidential information confidential by default

Key sections in this guide

Underlying technologies:

Configuration

First time Setup

Running the application

Running the Application

Contributing

About

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

paulbrowne-irl/Smart-Document-Analysis

Folders and files

Latest commit

History

Repository files navigation

Smart Documents and Analysis

Table of Contents

Expected folder structure

Scripts

Script Struture

What can you do with this project?

Keeping confidential information confidential by default

Key sections in this guide

Underlying technologies:

Configuration

First time Setup

Running the application

Running the Application

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages