utils.converters.PDF2Text (new) convert PDF to text #347

DavidDurman · 2025-01-17T11:38:50Z

No description provided.

vtalas · 2025-01-24T14:37:49Z

src/appmixer/utils/converters/converters.js

+        // getDocument() does not work over streams.
+        const loadingTask = pdfjslib.getDocument({ data: await arrayBuffer(readStream) });
+        const pdfDoc = await loadingTask.promise;
+        for (let i = 0; i < pdfDoc.numPages; i++) {


based on this code, it seems that the entire PDF file is not loaded into memory all at once; instead, it is processed page by page. Is that correct?

However, the entire output text is stored in memory. I believe we should consider sending the output text chunks into a stream, or we should introduce a strict limit on the input size of the PDF file. With larger PDF files containing a lot of text, there's a possibility of reaching the memory limit on the instance.

DavidDurman added 2 commits January 17, 2025 12:40

utils.converters.PDF2Text (new) convert PDF to text

5a95487

fix lint; increase version in bundle

d3abda1

DavidDurman force-pushed the pdf2text branch from adb9066 to d3abda1 Compare January 17, 2025 11:41

vtalas requested changes Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utils.converters.PDF2Text (new) convert PDF to text #347

utils.converters.PDF2Text (new) convert PDF to text #347

DavidDurman commented Jan 17, 2025

vtalas Jan 24, 2025

utils.converters.PDF2Text (new) convert PDF to text #347

Are you sure you want to change the base?

utils.converters.PDF2Text (new) convert PDF to text #347

Conversation

DavidDurman commented Jan 17, 2025

vtalas Jan 24, 2025

Choose a reason for hiding this comment