-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #117 from SuffolkLITLab/docx-labeler
Add an MVP feature that lets you automatically add labels to a DOCX file with OpenAI
- Loading branch information
Showing
6 changed files
with
446 additions
and
3 deletions.
There are no files selected for viewing
158 changes: 158 additions & 0 deletions
158
docassemble/ALDashboard/data/questions/docx_wrangling.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
--- | ||
include: | ||
- nav.yml | ||
--- | ||
modules: | ||
- .docx_wrangling | ||
--- | ||
objects: | ||
- new_docx: DAFile | ||
--- | ||
id: interview order | ||
mandatory: True | ||
code: | | ||
docx_file | ||
if not started_task.ready(): | ||
waiting_screen | ||
show_stats | ||
ask_about_labels | ||
save_changes | ||
show_final_docx | ||
--- | ||
event: waiting_screen | ||
question: | | ||
Please wait while we process your file | ||
subquestion: | | ||
<div class="spinner-border" role="status"> | ||
<span class="visually-hidden">Processing...</span> | ||
</div> | ||
reload: True | ||
--- | ||
continue button field: show_stats | ||
question: | | ||
Your DOCX file has been processed | ||
subquestion: | | ||
GPT-4 found ${ len(draft_labels) } labels in your DOCX file. | ||
On the next screen, you can review and make any necessary changes | ||
to the draft Jinja2 labels. | ||
--- | ||
question: | | ||
Upload a DOCX file | ||
subquestion: | | ||
We will use GPT-4 to try to add variables in the | ||
[AssemblyLine | ||
convention](https://suffolklitlab.org/docassemble-AssemblyLine-documentation/docs/label_variables) | ||
to your DOCX file. | ||
Your upload can have up to 300 pages, but the result cannot be larger than about 4,000 words. The result will | ||
only include the modified paragraphs. | ||
fields: | ||
- DOCX file: docx_file | ||
datatype: file | ||
accept: | | ||
".docx, application/vnd.openxmlformats-officedocument.wordprocessingml.document" | ||
- Include custom people names: include_custom_people_names | ||
datatype: yesno | ||
default: False | ||
- Custom people names: custom_people_names_text | ||
datatype: area | ||
show if: include_custom_people_names | ||
help: | | ||
Enter a list of custom nouns to use, one per line. Optionally, add | ||
an explanation after each name, separated by a colon. For example: | ||
``` | ||
agents: people making decisions in healthcare proxy or power of attorney | ||
attorneys_in_fact: people granted power of attorney | ||
``` | ||
These names will be used to label variables | ||
in the DOCX file. | ||
--- | ||
code: | | ||
started_task = background_action('task_draft_labels') | ||
--- | ||
event: task_draft_labels | ||
code: | | ||
if include_custom_people_names: | ||
custom_people_names = [tuple(line.split(':')) for line in custom_people_names_text.split('\n')] | ||
else: | ||
custom_people_names = None | ||
draft_labels = get_labeled_docx_runs(docx_file[0].path(), custom_people_names = custom_people_names) | ||
background_response_action('save_draft_labels', draft_labels=draft_labels) | ||
--- | ||
event: save_draft_labels | ||
code: | | ||
draft_labels = action_argument('draft_labels') | ||
background_response() | ||
--- | ||
objects: | ||
- final_labels: DAList.using(object_type=DAObject, auto_gather=False, gathered=True) | ||
--- | ||
code: | | ||
import docx | ||
original_doc = docx.Document(docx_file[0].path()) | ||
label_question = [] | ||
for idx, item in enumerate(draft_labels): | ||
new_obj = final_labels.appendObject() | ||
new_obj.paragraph = item[0] | ||
new_obj.run = item[1] | ||
new_obj.draft_label_text = item[2] | ||
# Results will be a tuple of paragraph number, run, modified text with label | ||
label_question.append({ | ||
'label': original_doc.paragraphs[item[0]].runs[item[1]].text, | ||
'field': f'final_labels[{idx}].label', | ||
'default': item[2], | ||
'label above field': True, | ||
'grid': 8, | ||
'hide if': f'final_labels[{idx}].leave_unchanged' | ||
}) | ||
label_question.append({ | ||
'label': 'Leave unchanged', | ||
'field': f'final_labels[{idx}].leave_unchanged', | ||
'datatype': 'yesno', | ||
'grid': { | ||
'width': 4, | ||
'end': True, | ||
}, | ||
'label above field': True, | ||
}) | ||
del docx | ||
del original_doc | ||
--- | ||
code: | | ||
new_doc_obj = update_docx( | ||
docx_file[0].path(), | ||
[ | ||
(item.paragraph, item.run, item.label, 0) | ||
for item in final_labels | ||
if not item.leave_unchanged | ||
] | ||
) | ||
new_docx.initialize(filename=docx_file[0].filename) | ||
new_doc_obj.save(new_docx.path()) | ||
new_docx.commit() | ||
del new_doc_obj | ||
save_changes = True | ||
--- | ||
continue button field: ask_about_labels | ||
question: | | ||
Review the labels | ||
subuestion: | | ||
On the left is the original text. On the right, you will see the modified | ||
text with Jinja2 added. Make any changes you need. | ||
fields: | ||
- code: label_question | ||
--- | ||
event: show_final_docx | ||
question: | | ||
Here is your new DOCX file | ||
subquestion: | | ||
${ new_docx } |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.