-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the homepage of the Textension wiki, here you can find technical details about implementations as well as key features.
- Flask - The web framework used (Python 3)
- Jinja2 - Template engine
- Bootstrap - Front-end component library
- Docker - Container / Dependency management
This project is being developed using an iterative approach. Therefore, now releases have yet been made and the project will be subject to drastic changes. No versioning practices will be followed until release. To see a history of changes made to this project, see commit history.
All of the key features relate to the actual text interaction object/page (interact.html
), with some supporting features (file upload, etc) not covered.
- Upload File
- Upload Image
Function | Description |
---|---|
|
- Readme.md - project readme, getting started
-
file_upload.py - Main entrypoint for the flask app, set
export FLASK_APP=file_upload.py
to run -
run.py - alternate entrypoint that redirects to
file_upload.py
Data storage for data used in the operation of the server (such as sample files)
Cascading stylesheets are found here. Additionally, some image resources used are found here too.
Clientside javascript used to drive the UI/UX.
-
autosize.min.js
- Autosize 4.0.0
- http://www.jacklmoore.com/autosize
-
event.js
- while
interact.js
implements the interactive functionality of the various tools,event.js
is what actually creates and updates the corresponding objects in the DOM. - Details here.
- while
-
interact.js
- Bulk of the interaction resides here
- Details found here
-
linguistic.js
- Implementation of the “dictionary” and “context map” tools core functionality
- Details found here
-
main.js
- Entrypoint of the clientside javascript. Initializes values and then enters
interact.js
- Entrypoint of the clientside javascript. Initializes values and then enters
-
textensionModel.js
- New class (intended to be expanded if/when a re-write happens) that will be the single source of truth for the actual textension data
- that means: the OCR’d text, confidence levels, Image/mesh map and locations to the images (TODO: currently stores images inline as base64, make it async file requests instead)
- New class (intended to be expanded if/when a re-write happens) that will be the single source of truth for the actual textension data
Libraries that have discrete functionality stored here
- Bootstrap 3
- Capture
- custom library for capturing from webcams
- Dropzone
- Jquery
Bulk of the python serverside components.
Note: file_upload.py
modifies it’s own system.path
so that it may import
the files within this directory directly without having to address the file through the directory in between. (E.G: import pdf_text_extraction
instead of import static.py.pdf_text_extraction
)
Contains all the flask (jinja2) templates that are rendered server side before being sent to the client.
- interact.html - template for the interact page (the main page used when interacting with the site).
- index.html - Index (home) page.
-
base.html - Basic re-usable base used in
interact.html
This is non-exhaustive, meant to be a place on where to start.
File | Note |
---|---|
interact.html |
html element id #download_data , <a> links named “Download Text” and “Download Page” |
main.js |
javascript function downloadText
|
interact.js |
javascript function print (actually downloads the image of the page) |
“whole backend” | Of course downloading the data from a page downloads all of the data generated about it, which the whole backend is involved in generating. |
File | Note |
---|---|
interact.js |
functions toggleMode , disableMode , setActiveMode openSpaces , closeSpaces , toggleSingeSpace
|
interact.html |
<a> links with text “Open All Spaces” & “Close all Spaces” |
CAIS.py |
Opening and closing spaces between lines of text only works if there are lines to expand, this script “Content Aware Image Slicing” does that. |
file_upload.py |
What responds to the webroute /interact and calls all the routines for processing the data and then templates it into the interact.html
|
Similar to Open/Close All spaces, Vertical/Horizontal space mode buttons changes the modes that those buttons use by calling the function and embedding the data right in the html tag e.g <input id="vertical-space" type="checkbox" onclick="toggleMode(this, false, false, false, false);" data-mode="vertical"/>
File | Note |
---|---|
interact.html |
ocr-text element id and associated elements within it. At the bottom of the file, find var ocr = {{ ocr }} , this is where flask templates data into js on load |
interact.js |
injectMetaData() Loads data from the aforementioned var ocr = {{ocr}} onto the associated html |
event.js |
Event handler for user interaction events, in this case, #ocr-text on click event which loads/unloads the ocr data into the appropriate fields |
textension.py |
Container class pulling together all the functionality implemented in other files |
ocr_*.py |
Various OCR related files, uses tesseract. |
All the same files as OCR, except pertaining specifically to the uncertainty values generated by tesseract.
File | Note |
---|---|
interact.html |
#ngram element id and associated elements within it. At the bottom of the file, find var ngram_plot = {{ ngram_plot }} , this is where flask templates data into js on load |
interact.js |
setUniqueness() and drawOverlay both interact with the n-gram usage plots |
event.js |
Event handler for user interaction events, in this case, #ngram on click event which hides/shows the usage plots |
textension.py |
Container class pulling together all the functionality implemented in other files |
getngrams.py |
Gets the N-Gram chart from google n-gram usage over time portal. |
File | Note |
---|---|
interact.html |
#locations element id and associated elements within it. |
interact.js |
drawLocations turns on/off the drawing of location maps |
event.js |
Event handler for location-related on click events. |
googleMaps.py |
Gets the map image from google maps api |
Very similar to the OCR/OCR Uncertainty headings with the same files, but also:
File | Note |
---|---|
textensionModel.js |
The beginnings of cleaning up the clientside data so that only one source of truth exists. |
File | Note |
---|---|
interact.html |
#draw element id and associated elements within it. (draw-color, draw-line, etc) |
interact.js |
draw turns on/off drawing. changeMode changes to drawing mode. |
event.js |
Event handler for location-related on click events. |
File | Note |
---|---|
interact.html |
#dictionary element id and associated elements within it. (draw-color, draw-line, etc) |
linguistic.js |
defineWord() |
event.js |
Event handler for location-related on click events. |
File | Note |
---|---|
interact.html |
#context* element ids and associated elements within it. (draw-color, draw-line, etc) |
linguistic.js |
createContextMap() |
event.js |
Event handler for location-related on click events. |