This project implements a simulated Optical Character Recognition (OCR) service that extracts text from PDF files uploaded by users. Built with Node.js and utilizing several libraries such as Express, Multer, and pdf-parse, this application is designed to be easy to set up and integrate into other systems needing PDF text extraction capabilities.
- PDF Text Extraction: Allows users to upload PDF files and extracts readable text from them.
- File Upload Management: Utilizes Multer for efficient handling of file uploads with customizable storage options.
- Error Handling: Robust error management to ensure stability and provide meaningful error messages to the client.
- Node.js: The script runs in a Node.js environment.
- express: Web framework for Node.js.
- multer: Middleware for handling multipart/form-data, used for uploading files.
- pdf-parse: Library to parse and extract text from PDF files.
- fs.promises: Part of Node.js File System module to handle file operations using promises.
- path: Utilities for handling and transforming file paths.
Before installing, ensure you have Node.js and npm (Node Package Manager) installed on your system. You can download and install Node.js from Node.js official website.
To install and use pdf-extract-api-digitalocean, follow these steps:
Clone the Repository: Begin by cloning the repository containing the pdf-extract-api-digitalocean to your local machine.
git clone https://github.com/samestrin/pdf-extract-api-digitalocean/
Set PORT environment variable to define the port on which the server will listen. Default is 3000
Navigate to your project's root directory and run:
npm start
Endpoint: /extract
Method: POST
Extract text from a PDF file.
file
: PDF file
Use a tool like Postman or curl to make a request:
curl -F "file=@path_to_pdf_file.pdf" http://localhost:[PORT]/extract
The server will process the uploaded file and return the extracted text in JSON format.
The API handles errors gracefully and returns appropriate error responses.
- 400 Bad Request: Invalid request parameters.
- 500 Internal Server Error: Unexpected server error.
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.
This project is licensed under the MIT License - see the LICENSE file for details.