A TypeScript-based Express API for document processing (PDF/DOCX), speech-to-text, and text-to-speech conversion using Google Cloud services.
- π Document Processing (PDF & DOCX)
- π€ Speech to Text Conversion
- π Text to Speech Conversion
- β‘ Rate Limiting
- π Type Safety
- π Standardized API Responses
- Node.js (v14 or higher)
- TypeScript (v4 or higher)
- Google Cloud Account with Speech & Text-to-Speech APIs enabled
- Service Account Key from Google Cloud
- Clone the repository:
git clone <repository-url>
cd document-speech-api
- Install dependencies:
npm install
- Set up environment variables:
Create a
.env
file in the root directory:
PORT=3000
GOOGLE_APPLICATION_CREDENTIALS="path/to/your/service-account-key.json"
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100
src/
βββ controllers/
β βββ documentReaderController.ts
β βββ speechToTextController.ts
β βββ textToSpeechController.ts
βββ services/
β βββ documentReaderService.ts
β βββ speechToTextService.ts
β βββ textToSpeechService.ts
βββ routes/
β βββ apiRoutes.ts
βββ types/
β βββ express.d.ts
β βββ api_response.ts
βββ utils/
| βββ api_response.ts
β βββ logger.ts
βββ middlewares/
| βββ errorHandlerMiddleware.ts
β βββ rateLimitMiddleware.ts
βββ app.ts
βββ server.ts
{
"dependencies": {
"@google-cloud/speech": "^latest",
"@google-cloud/text-to-speech": "^latest",
"express": "^latest",
"multer": "^latest",
"pdf-parse": "^latest",
"mammoth": "^latest",
"express-rate-limit": "^latest",
"dotenv": "^latest"
},
"devDependencies": {
"@types/express": "^latest",
"@types/multer": "^latest",
"@types/node": "^latest",
"typescript": "^latest",
"ts-node": "^latest",
"nodemon": "^latest"
}
}
POST /api/read-document
Content-Type: multipart/form-data
file
: PDF or DOCX file
{
"code": 0,
"status": "success",
"message": "Document read successfully",
"data": {
"text": "extracted text content",
"fileName": "document.pdf",
"fileType": "application/pdf"
}
}
POST /api/speech-to-text
Content-Type: multipart/form-data
audio
: Audio file (MP3)
{
"code": 0,
"status": "success",
"message": "Audio transcribed successfully",
"data": {
"text": "transcribed text",
"audioFileName": "audio.mp3",
"duration": 10.5
}
}
POST /api/text-to-speech
Content-Type: application/json
{
"text": "Text to convert to speech",
"voice": "en-US", // optional
"speed": 1.0 // optional
}
- Audio stream (audio/mpeg) if successful
- Error response if failed:
{
"code": 1,
"status": "error",
"message": "Error message"
}
- 0: Success
- 400: Bad Request
- 401: Unauthorized
- 403: Forbidden
- 404: Not Found
- 500: Internal Server Error
import axios from 'axios';
// Document Reading
const readDocument = async (file: File) => {
const formData = new FormData();
formData.append('file', file);
try {
const response = await axios.post('/api/read-document', formData, {
headers: {
'Content-Type': 'multipart/form-data'
}
});
return response.data;
} catch (error) {
console.error('Error reading document:', error);
throw error;
}
};
// Speech to Text
const convertSpeechToText = async (audioFile: File) => {
const formData = new FormData();
formData.append('audio', audioFile);
try {
const response = await axios.post('/api/speech-to-text', formData, {
headers: {
'Content-Type': 'multipart/form-data'
}
});
return response.data;
} catch (error) {
console.error('Error converting speech to text:', error);
throw error;
}
};
// Text to Speech
const convertTextToSpeech = async (text: string) => {
try {
const response = await axios.post('/api/text-to-speech',
{ text },
{ responseType: 'blob' }
);
return response.data;
} catch (error) {
console.error('Error converting text to speech:', error);
throw error;
}
};
- Development mode:
npm run dev
- Production mode:
npm run build
npm start
- Create a project in Google Cloud Console
- Enable Speech-to-Text and Text-to-Speech APIs
- Create a service account and download the key file
- Set the path to your key file in the
GOOGLE_APPLICATION_CREDENTIALS
environment variable
The API includes rate limiting to prevent abuse. Default settings:
- 100 requests per 15 minutes window
- Customize these values in the
.env
file
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.