Document Processing and Speech API

A TypeScript-based Express API for document processing (PDF/DOCX), speech-to-text, and text-to-speech conversion using Google Cloud services.

Features

📄 Document Processing (PDF & DOCX)
🎤 Speech to Text Conversion
🔊 Text to Speech Conversion
⚡ Rate Limiting
🔒 Type Safety
📝 Standardized API Responses

Prerequisites

Node.js (v14 or higher)
TypeScript (v4 or higher)
Google Cloud Account with Speech & Text-to-Speech APIs enabled
Service Account Key from Google Cloud

Installation

Clone the repository:

git clone <repository-url>
cd document-speech-api

Install dependencies:

npm install

Set up environment variables: Create a .env file in the root directory:

PORT=3000
GOOGLE_APPLICATION_CREDENTIALS="path/to/your/service-account-key.json"
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100

Project Structure

src/
├── controllers/
│   ├── documentReaderController.ts
│   ├── speechToTextController.ts
│   └── textToSpeechController.ts
├── services/
│   ├── documentReaderService.ts
│   ├── speechToTextService.ts
│   └── textToSpeechService.ts
├── routes/
│   └── apiRoutes.ts
├── types/
│   ├── express.d.ts
│   └── api_response.ts
├── utils/
|   ├── api_response.ts
│   └── logger.ts
├── middlewares/
|   ├── errorHandlerMiddleware.ts
│   └── rateLimitMiddleware.ts
├── app.ts
└── server.ts

Dependencies

{
  "dependencies": {
    "@google-cloud/speech": "^latest",
    "@google-cloud/text-to-speech": "^latest",
    "express": "^latest",
    "multer": "^latest",
    "pdf-parse": "^latest",
    "mammoth": "^latest",
    "express-rate-limit": "^latest",
    "dotenv": "^latest"
  },
  "devDependencies": {
    "@types/express": "^latest",
    "@types/multer": "^latest",
    "@types/node": "^latest",
    "typescript": "^latest",
    "ts-node": "^latest",
    "nodemon": "^latest"
  }
}

API Endpoints

1. Document Reading

POST /api/read-document
Content-Type: multipart/form-data

Request

file: PDF or DOCX file

Response

{
  "code": 0,
  "status": "success",
  "message": "Document read successfully",
  "data": {
    "text": "extracted text content",
    "fileName": "document.pdf",
    "fileType": "application/pdf"
  }
}

2. Speech to Text

POST /api/speech-to-text
Content-Type: multipart/form-data

Request

audio: Audio file (MP3)

Response

{
  "code": 0,
  "status": "success",
  "message": "Audio transcribed successfully",
  "data": {
    "text": "transcribed text",
    "audioFileName": "audio.mp3",
    "duration": 10.5
  }
}

3. Text to Speech

POST /api/text-to-speech
Content-Type: application/json

Request

{
  "text": "Text to convert to speech",
  "voice": "en-US",  // optional
  "speed": 1.0       // optional
}

Response

Audio stream (audio/mpeg) if successful
Error response if failed:

{
  "code": 1,
  "status": "error",
  "message": "Error message"
}

Error Codes

0: Success
400: Bad Request
401: Unauthorized
403: Forbidden
404: Not Found
500: Internal Server Error

Usage Examples

Using axios

import axios from 'axios';

// Document Reading
const readDocument = async (file: File) => {
  const formData = new FormData();
  formData.append('file', file);
  
  try {
    const response = await axios.post('/api/read-document', formData, {
      headers: {
        'Content-Type': 'multipart/form-data'
      }
    });
    return response.data;
  } catch (error) {
    console.error('Error reading document:', error);
    throw error;
  }
};

// Speech to Text
const convertSpeechToText = async (audioFile: File) => {
  const formData = new FormData();
  formData.append('audio', audioFile);
  
  try {
    const response = await axios.post('/api/speech-to-text', formData, {
      headers: {
        'Content-Type': 'multipart/form-data'
      }
    });
    return response.data;
  } catch (error) {
    console.error('Error converting speech to text:', error);
    throw error;
  }
};

// Text to Speech
const convertTextToSpeech = async (text: string) => {
  try {
    const response = await axios.post('/api/text-to-speech', 
      { text },
      { responseType: 'blob' }
    );
    return response.data;
  } catch (error) {
    console.error('Error converting text to speech:', error);
    throw error;
  }
};

Running the Application

Development mode:

npm run dev

Production mode:

npm run build
npm start

Setting Up Google Cloud Credentials

Create a project in Google Cloud Console
Enable Speech-to-Text and Text-to-Speech APIs
Create a service account and download the key file
Set the path to your key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable

Rate Limiting

The API includes rate limiting to prevent abuse. Default settings:

100 requests per 15 minutes window
Customize these values in the .env file

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
logs		logs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Processing and Speech API

Features

Prerequisites

Installation

Project Structure

Dependencies

API Endpoints

1. Document Reading

Request

Response

2. Speech to Text

Request

Response

3. Text to Speech

Request

Response

Error Codes

Usage Examples

Using axios

Running the Application

Setting Up Google Cloud Credentials

Rate Limiting

Contributing

License

About

Releases

Packages

Languages

License

Cypher-O/voice-bridge

Folders and files

Latest commit

History

Repository files navigation

Document Processing and Speech API

Features

Prerequisites

Installation

Project Structure

Dependencies

API Endpoints

1. Document Reading

Request

Response

2. Speech to Text

Request

Response

3. Text to Speech

Request

Response

Error Codes

Usage Examples

Using axios

Running the Application

Setting Up Google Cloud Credentials

Rate Limiting

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages