Skip to content

Commit

Permalink
Merge pull request #117 from SuffolkLITLab/docx-labeler
Browse files Browse the repository at this point in the history
Add an MVP feature that lets you automatically add labels to a DOCX file with OpenAI
  • Loading branch information
nonprofittechy authored Nov 21, 2023
2 parents 4153990 + af3914a commit 5946b56
Show file tree
Hide file tree
Showing 6 changed files with 446 additions and 3 deletions.
158 changes: 158 additions & 0 deletions docassemble/ALDashboard/data/questions/docx_wrangling.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
include:
- nav.yml
---
modules:
- .docx_wrangling
---
objects:
- new_docx: DAFile
---
id: interview order
mandatory: True
code: |
docx_file
if not started_task.ready():
waiting_screen
show_stats
ask_about_labels
save_changes
show_final_docx
---
event: waiting_screen
question: |
Please wait while we process your file
subquestion: |
<div class="spinner-border" role="status">
<span class="visually-hidden">Processing...</span>
</div>
reload: True
---
continue button field: show_stats
question: |
Your DOCX file has been processed
subquestion: |
GPT-4 found ${ len(draft_labels) } labels in your DOCX file.
On the next screen, you can review and make any necessary changes
to the draft Jinja2 labels.
---
question: |
Upload a DOCX file
subquestion: |
We will use GPT-4 to try to add variables in the
[AssemblyLine
convention](https://suffolklitlab.org/docassemble-AssemblyLine-documentation/docs/label_variables)
to your DOCX file.
Your upload can have up to 300 pages, but the result cannot be larger than about 4,000 words. The result will
only include the modified paragraphs.
fields:
- DOCX file: docx_file
datatype: file
accept: |
".docx, application/vnd.openxmlformats-officedocument.wordprocessingml.document"
- Include custom people names: include_custom_people_names
datatype: yesno
default: False
- Custom people names: custom_people_names_text
datatype: area
show if: include_custom_people_names
help: |
Enter a list of custom nouns to use, one per line. Optionally, add
an explanation after each name, separated by a colon. For example:
```
agents: people making decisions in healthcare proxy or power of attorney
attorneys_in_fact: people granted power of attorney
```
These names will be used to label variables
in the DOCX file.
---
code: |
started_task = background_action('task_draft_labels')
---
event: task_draft_labels
code: |
if include_custom_people_names:
custom_people_names = [tuple(line.split(':')) for line in custom_people_names_text.split('\n')]
else:
custom_people_names = None
draft_labels = get_labeled_docx_runs(docx_file[0].path(), custom_people_names = custom_people_names)
background_response_action('save_draft_labels', draft_labels=draft_labels)
---
event: save_draft_labels
code: |
draft_labels = action_argument('draft_labels')
background_response()
---
objects:
- final_labels: DAList.using(object_type=DAObject, auto_gather=False, gathered=True)
---
code: |
import docx
original_doc = docx.Document(docx_file[0].path())
label_question = []
for idx, item in enumerate(draft_labels):
new_obj = final_labels.appendObject()
new_obj.paragraph = item[0]
new_obj.run = item[1]
new_obj.draft_label_text = item[2]
# Results will be a tuple of paragraph number, run, modified text with label
label_question.append({
'label': original_doc.paragraphs[item[0]].runs[item[1]].text,
'field': f'final_labels[{idx}].label',
'default': item[2],
'label above field': True,
'grid': 8,
'hide if': f'final_labels[{idx}].leave_unchanged'
})
label_question.append({
'label': 'Leave unchanged',
'field': f'final_labels[{idx}].leave_unchanged',
'datatype': 'yesno',
'grid': {
'width': 4,
'end': True,
},
'label above field': True,
})
del docx
del original_doc
---
code: |
new_doc_obj = update_docx(
docx_file[0].path(),
[
(item.paragraph, item.run, item.label, 0)
for item in final_labels
if not item.leave_unchanged
]
)
new_docx.initialize(filename=docx_file[0].filename)
new_doc_obj.save(new_docx.path())
new_docx.commit()
del new_doc_obj
save_changes = True
---
continue button field: ask_about_labels
question: |
Review the labels
subuestion: |
On the left is the original text. On the right, you will see the modified
text with Jinja2 added. Make any changes you need.
fields:
- code: label_question
---
event: show_final_docx
question: |
Here is your new DOCX file
subquestion: |
${ new_docx }
2 changes: 2 additions & 0 deletions docassemble/ALDashboard/data/questions/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ buttons:
image: paperclip
- PDF tools: ${ interview_url(i=user_info().package + ":pdf_wrangling.yml", reset=1) }
image: file-pdf
- Auto-label DOCX: ${ interview_url(i=user_info().package + ":docx_wrangling.yml", reset=1) }
image: file-word
- Compile Bootstrap theme: ${ interview_url(i=user_info().package + ":compile_bootstrap.yml", reset=1) }
image: bootstrap
- Install fonts: ${ interview_url(i=user_info().package + ":install_fonts.yml", reset=1)}
Expand Down
Loading

0 comments on commit 5946b56

Please sign in to comment.