Skip to content
This repository has been archived by the owner on Apr 10, 2019. It is now read-only.

chakki-works/csr_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

csr_analysis

This repository makes word clowd from Japanese company's CSR report.

The application consists of 2 steps.

  1. PDF to text
  2. Make word cloud from text file

Installation

You have to use python3.

Install package from requirements.

pip install -r requirements.txt

Usage

PDF to text

Make text file from CSR report with PDFMiner.

First, you placed PDF File into the raw directory. Next, edit to parameters you want.

input_file = './data/raw/XXX.pdf'
interim_dir = './data/interim/XXX/'
processed_file = './data/processed/XXX.txt'

Text file is generated after executing the below command.

python pdf_to_text.py

Make word cloud from text file

Making word cloud on iPython notebook.

ipython notebook

Open wordcloud.ipynb and edit the parameter of the first cell to the file name made from previous step.
file_name = './data/processed/XXX.txt'

After that, run all cells.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published