Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
* xref:index.adoc[IDP Overview]
* xref:release-notes.adoc[Release Notes]
* xref:document-processing.adoc[]
* xref:document-quality-and-model-performance.adoc[]
* xref:analyzing-documents-with-einstein.adoc[]
* xref:creating-document-actions.adoc[]
** xref:enhancing-data-extraction-with-einstein.adoc[]
Expand Down
11 changes: 11 additions & 0 deletions modules/ROOT/pages/_partials/document-preparation.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// tag::documentPreparation[]
== Document Preparation and Testing

Before creating document actions, ensure your sample documents represent the quality and variety of documents to process in production. The accuracy of your document actions depend significantly on the quality and diversity of your sample documents.

Include both high-quality and challenging examples in your test set. Test with various document layouts and formats. Use documents with different font styles and sizes. Include examples with tables, forms, and complex layouts.

Start with high-quality native digital PDFs to establish baseline accuracy. Gradually test with more challenging documents such as scanned PDFs or images. Monitor confidence scores across different document types. Adjust prompts and thresholds based on results.

For additional details, see xref:document-quality-and-model-performance.adoc[]
// end::documentPreparation[]
3 changes: 3 additions & 0 deletions modules/ROOT/pages/analyzing-documents-with-einstein.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ include::partial$permissions.adoc[tag=permissionBuild]

include::partial$einstein.adoc[tags=einsteinRequisites;!shortIntro]

//Document Preparation and Testing
include::partial$document-preparation.adoc[tag=documentPreparation]

== Create a Generic Document Action and Enable Customize Schema

To analyze documents and fully customize the output structure, create a document action of the Generic type and enable *Customize Schema*:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,18 @@ include::partial$document-action.adoc[tag=modelUsage]

The Submit Document to MuleSoft IDP action step executes document actions by impersonating a user in your organization. Therefore, you must use authentication credentials of a user that has the Execute Published Actions permission in Anypoint Platform.

See xref:rpa-builder::toolbox-mulesoft-idp-submit-document-to-mulesoft-idp.adoc[] for configuration details.
See xref:rpa-builder::toolbox-mulesoft-idp-submit-document-to-mulesoft-idp.adoc[MuleSoft RPA: Submit Document to MuleSoft IDP] for configuration details.

== Retrieve the Results of the Execution

To retrieve the results of a document action execution, use the Retrieve Results from MuleSoft IDP action step in RPA. This action step enables you to query the results of a document action execution by providing an Execution ID that you used before in the corresponding Submit Document to MuleSoft IDP execution.

include::partial$document-action.adoc[tag=modelUsage]

See xref:rpa-builder::toolbox-mulesoft-idp-retrieve-results-from-mulesoft-idp.adoc[] for configuration details.
See xref:rpa-builder::toolbox-mulesoft-idp-retrieve-results-from-mulesoft-idp.adoc[MuleSoft RPA: Retrieve Results from MuleSoft IDP] for configuration details.

== See Also

* xref:document-quality-and-model-performance.adoc[]
* xref:creating-document-actions.adoc[]
* xref:publishing-document-actions.adoc[]
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,8 @@ To confirm the endpoints to call to trigger document action executions and retri

== See Also

* xref:rpa-builder::toolbox-mulesoft-idp-submit-document-to-mulesoft-idp.adoc[]
* xref:rpa-builder::toolbox-mulesoft-idp-retrieve-results-from-mulesoft-idp.adoc[]
* xref:document-quality-and-model-performance.adoc[]
* xref:rpa-builder::toolbox-mulesoft-idp-submit-document-to-mulesoft-idp.adoc[MuleSoft RPA: Submit Document to MuleSoft IDP]
* xref:rpa-builder::toolbox-mulesoft-idp-retrieve-results-from-mulesoft-idp.adoc[MuleSoft RPA: Retrieve Results from MuleSoft IDP]
* xref:creating-document-actions.adoc[]
* xref:publishing-document-actions.adoc[]
3 changes: 3 additions & 0 deletions modules/ROOT/pages/creating-document-actions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ include::partial$permissions.adoc[tag=permissionManage]

include::partial$permissions.adoc[tag=permissionBuild]

//Document Preparation and Testing
include::partial$document-preparation.adoc[tag=documentPreparation]

[[upload-files]]
== Upload Sample Files and Preview the Results

Expand Down
1 change: 1 addition & 0 deletions modules/ROOT/pages/document-processing.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ For configuration and usage instructions, see: xref:automate-document-processing

== See Also

* xref:document-quality-and-model-performance.adoc[]
* xref:creating-document-actions.adoc[]
* xref:publishing-document-actions.adoc[]
* xref:reviewing-processed-documents.adoc[]
63 changes: 63 additions & 0 deletions modules/ROOT/pages/document-quality-and-model-performance.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
= Document Quality and Model Performance

The accuracy of data extraction in MuleSoft IDP depends significantly on the quality and type of documents you process. Understanding these factors helps you set realistic expectations and achieve optimal results. Document types and their characteristics have a significant impact on extraction accuracy.

== Native Digital Documents

Native digital documents contain embedded text that is directly accessible within the document's internal structure. When processing these documents:

* LLMs can extract text without requiring OCR (Optical Character Recognition) processing
* Extraction typically yields high accuracy results with confidence scores of 90% or higher
* These documents are recommended for achieving the best extraction performance

== Scanned Documents and Images

Scanned documents and images require OCR processing to convert visual elements into machine-readable text. When processing these documents:

* Model accuracy depends heavily on the performance of the underlying OCR technology
* Results vary based on image quality and document complexity
* These documents may require human review more frequently than native digital documents

== Factors Affecting Data Extraction

The following factors impact the accuracy of data extraction from scanned documents and images:

* *Image Quality*
+
Higher resolution images provide better results. Clear, sharp images with good contrast improve extraction accuracy. Background artifacts, shadows, or blurring reduce accuracy.

* *Document Layout*
+
Documents with multiple columns, overlapping elements, or irregular layouts are more challenging to process. Inconsistent spacing, unusual fonts, or mixed formatting styles can affect results. Skewed or rotated documents may require preprocessing.

* *Text Characteristics*
+
Standard fonts are easier to process than decorative or unusual fonts. Very small or very large text may be difficult to extract accurately. Most models struggle with handwritten content.

== Improving Extraction Results

When you encounter inaccurate extraction results, consider these aspects:

* *Document Quality Improvements*

** Use higher quality source documents when possible.
** Improve scanning resolution and quality.
** Standardize document formats across your organization.

* *Prompt Optimization*

** Be specific about field locations and expected formats.
** Include examples in prompts for complex fields.
** Test prompts with various document qualities.
** Iterate and refine based on results.

* *Model Selection*

** Test different models with your specific document types.

== See Also

* xref:document-processing.adoc[]
* xref:creating-document-actions.adoc[]
* xref:supported-models.adoc[]
* xref:analyzing-documents-with-einstein.adoc[]
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,5 @@ include::partial$document-action.adoc[tag=modelUsage]
== See Also

* xref:example-einstein-prompts.adoc[]
* xref:document-quality-and-model-performance.adoc[]
* xref:creating-document-actions.adoc[]
1 change: 1 addition & 0 deletions modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Einstein doesn't use customer data to train any models for document analysis in
== See Also

* xref:document-processing.adoc[]
* xref:document-quality-and-model-performance.adoc[]
* xref:analyzing-documents-with-einstein.adoc[]
* xref:creating-document-actions.adoc[]
* xref:publishing-document-actions.adoc[]
Expand Down
5 changes: 5 additions & 0 deletions modules/ROOT/pages/ms-automation-credits-2.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ Analyzing documents with Einstein consumes Automation Credits and Einstein Reque

The services contain features that use generative AI technology that may be provided by one or more third-parties as listed in the documentation applicable to the services. This documentation provides information and product requirements specific to these generative AI features and providers, including applicable third-party acceptable use policies which the customer must comply with when using the generative AI technology. Due to the nature of generative AI, the output that it generates may be unpredictable, and may include inaccurate or harmful responses. Before using any generative AI output, the customer is solely responsible for reviewing the output for accuracy, safety, and compliance with applicable laws and third-party acceptable use policies. The customer assumes all responsibility for output generated by the services and, as between Salesforce and the customer, such output is customer data.

include::idp::partial$einstein-model.adoc[]

[NOTE]
Einstein doesn't use customer data to train any models for document analysis in IDP.

== See Also

* xref:ms-automation-credits-usage-types.adoc[]
1 change: 1 addition & 0 deletions modules/ROOT/pages/reviewing-processed-documents.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ If there are more than one results page, click *Submit and Next* and continue th

== See Also

* xref:document-quality-and-model-performance.adoc[]
* xref:automate-document-processing-with-the-idp-api.adoc[]
* xref:automate-document-processing-with-rpa.adoc[]
* xref:adding-reviewers.adoc[]
1 change: 1 addition & 0 deletions modules/ROOT/pages/supported-models.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,6 @@ Select *Show properties* under *choices* to see the details.

== See Also

* xref:document-quality-and-model-performance.adoc[]
* https://platform.openai.com/docs/guides/text#prompt-engineering[OpenAI's Prompt Engineering Guide]
* xref:analyzing-documents-with-einstein.adoc[]