Skip to content

Latest commit

 

History

History
44 lines (30 loc) · 1.56 KB

README.md

File metadata and controls

44 lines (30 loc) · 1.56 KB

Riskcovry-Hackathon

forthebadge energy-drink forthebadge energy-drink

PROBLEM STATEMENT CHOSEN:

Valid Discharge Summary Prediction(Problem 2)

Working website on public gcp url: http://34.70.84.140:8000/ (If the link doesnt work please reach out to us)(jigyas15@gmail.com or nayak.amit.blr@gmail.com)


UPDATE => WORKS AS AN API

api

Note that the PDF should be sent as form-data with key set as 'file'

TECHSTACK

Django, HTML, CSS, JavaScript, Pytorch+FastAI, Google Cloud Platform (VM, Cloud Storage)

BASIC OVERVIEW

  • Extraction of batched text from multipage PDF using Google's Vision API
  • Implementing transfer learning on AWD-LSTM Text classification model
  • Display results on an embedded Django Server running on a Google VM

DETAILS

  • Trained on a handpicked minimal dataset with edges cases like health records and medical research papers. All stored on GCP bucket.
  • ~89% accuracy attained on AWD-LSTM model after 3 epochs with a training set of ~50 PDF files

CODE OVERVIEW

  • TextExtraction.ipynb: handpicked PDF dataset on Cloud Storage -> Vision API -> JSON on Cloud Storage
  • Model.ipynb: JSON on Cloud Storage -> Training of AWD-LSTM -> exporting model
  • scripts/testmodel.py: runs the fine-tuned model on django frontend
  • bytes: django files

HAVE A LOOK

front-end

TEAM BYTES

  • Jigya Shah
  • Amitha Nayak