A modern, real-time speech-to-text transcription tool built with Next.js 15, TypeScript, and the Web Speech API. Features live audio visualization, and seamless copy to clipboard capabilities.
- Real-time Transcription - Live speech-to-text using Web Speech API
- Live-captions Visualization - Dynamic Live-captions indicators
- Smart Restart Logic - Minimizes word loss during recognition restarts
- Copy to clipboard - Download transcripts as text files
- Pause/Resume - Full control over recording sessions
- TypeScript - Full type safety and excellent DX
- Responsive Design - Works on desktop and mobile devices
- Error Handling - Graceful degradation and user-friendly messages
- Performance Optimized - Efficient audio processing and rendering
- Modern Interface - Clean, intuitive design with Tailwind CSS
- Dark/Light Mode - Customizable appearance (coming soon)
- Settings Panel - Language selection and customization options (coming soon)
- Real-time Status - Visual indicators for recording state(coming soon)
- Live Captions - Live Captions and metrics
- Node.js 18.0 or later
- npm, yarn, or pnpm
- Modern browser with Web Speech API support
# Clone the repository
git clone https://github.com/yourusername/live-transcription-tool.git
cd live-transcription-tool
# Install dependencies
npm install
# Start development server
npm run devOpen http://localhost:3000 in your browser.
# Build for production
npm run build
# Start production server
npm start- Start Recording: Click the "Start Recording" button
- Grant Permissions: Allow microphone access when prompted
- Speak Naturally: The tool will transcribe your speech in real-time
- Copy Transcript: Copy to clipboard
- Clear Chat: Clear conversation
#### useAudioTranscription Hook
The core hook that manages:
- Speech recognition lifecycle
- Audio level monitoring
- Error handling and recovery
- Transcript state management
```typescript
const {
isListening,
transcript,
startRecording,
stopRecording,
// ... other methods
} = useAudioTranscription();
Prevents word loss during recognition restarts:
- 100ms delay between restarts
- Duplicate detection and prevention
- Context-aware error recovery
- Confidence-based filtering
npm run dev # Start development server
npm run build # Build for production
npm run start # Start production server
npm run lint # Run ESLint
npm run lint:fix # Fix ESLint issues
npm run format # Format code with Prettier
npm run type-check # TypeScript type checking
npm run test # Run tests
npm run test:watch # Run tests in watch modeThis project uses comprehensive tooling for code quality:
- ESLint - Code linting with TypeScript rules
- Prettier - Code formatting
- Husky - Git hooks for quality checks
- lint-staged - Run linters on staged files
- Commitlint - Conventional commit messages
Every commit automatically runs:
- ESLint fixes
- Prettier formatting
- TypeScript type checking
- Related tests
- Commit message validation
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverage__tests__/
โโโ components/
โ โโโ LiveTranscriptionTool.test.tsx
โ โโโ AudioVisualizer.test.tsx
โโโ hooks/
โ โโโ useAudioTranscription.test.ts
โ โโโ useAudioPermissions.test.ts
โโโ utils/
โโโ audio.test.ts
โโโ export.test.ts
- โ Chrome 25+
- โ Edge 79+
- โ Safari 14.1+
- โ Firefox (limited support)
- โ Mobile browsers (iOS Safari, Chrome Mobile)
- Web Speech API - For speech recognition
- MediaDevices API - For microphone access
- Web Audio API - For audio level monitoring
- File API - For transcript exports
The application is fully responsive and works on mobile devices:
- Touch-friendly interface
- Mobile-optimized audio processing
- Responsive design with Tailwind CSS
- Support for mobile browsers
# Install Vercel CLI
npm i -g vercel
# Deploy
vercelWe welcome contributions! Please see our Contributing Guide.
- Fork the repository
- Clone your fork locally
- Create a new branch for your feature
- Make your changes with tests
- Run quality checks:
npm run lint && npm run type-check && npm test - Commit using conventional commits
- Push and create a Pull Request
feat: add new transcription feature
fix: resolve audio dropout issue
docs: update API documentation
style: format code with prettier
refactor: improve error handling
test: add unit tests for hooks
chore: update dependencies- Initial JS: ~150KB gzipped
- First Load: ~200KB total
- Runtime: Minimal memory footprint
- Audio Processing: Optimized with Web Audio API
- Performance: 100
- Accessibility: 93
- Best Practices: 100
- SEO: 100
- No Server Processing - All transcription happens locally
- No Data Storage - Transcripts remain on your device
- Secure by Default - HTTPS required for microphone access
- Privacy First - No tracking or analytics
This project is licensed under the MIT License - see the LICENSE file for details.
- Web Speech API - Core speech recognition
- Next.js - React framework
- Tailwind CSS - Styling
- Lucide React - Icons
- Husky - Git hooks
- ๐ Bug Reports: GitHub Issues
- ๐ก Feature Requests: GitHub Discussions
- ๐ง Email: sagarvpednekar@gmail.com
- Cloud speech recognition APIs (Google, AWS, Azure)
- Real-time collaboration
- Advanced export formats (PDF, DOCX)
- Custom vocabulary support
- Speaker identification
- Dark mode theme
- Custom Language Support
We are planning to supports multiple languages:
- English (US/UK)
- Spanish
- French
- German
- Japanese
- Chinese (Mandarin)
- WebSocket integration
- Multi-language detection
- Transcript search and filtering
- Integration with popular platforms
- Mobile app (React Native)
โญ Star this repo โข ๐ด Fork it โข ๐ Report issues
Made with โค๏ธ by Sagar Pednekar