Skip to content

sagarpednekar/live-transcript-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽ™๏ธ Live Audio Transcription Tool

Next.js TypeScript React Tailwind CSS License: MIT CI

A modern, real-time speech-to-text transcription tool built with Next.js 15, TypeScript, and the Web Speech API. Features live audio visualization, and seamless copy to clipboard capabilities.

โœจ Features

๐ŸŽฏ Core Functionality

  • Real-time Transcription - Live speech-to-text using Web Speech API
  • Live-captions Visualization - Dynamic Live-captions indicators
  • Smart Restart Logic - Minimizes word loss during recognition restarts
  • Copy to clipboard - Download transcripts as text files
  • Pause/Resume - Full control over recording sessions

๐Ÿ› ๏ธ Technical Features

  • TypeScript - Full type safety and excellent DX
  • Responsive Design - Works on desktop and mobile devices
  • Error Handling - Graceful degradation and user-friendly messages
  • Performance Optimized - Efficient audio processing and rendering

๐ŸŽจ UI/UX

  • Modern Interface - Clean, intuitive design with Tailwind CSS
  • Dark/Light Mode - Customizable appearance (coming soon)
  • Settings Panel - Language selection and customization options (coming soon)
  • Real-time Status - Visual indicators for recording state(coming soon)
  • Live Captions - Live Captions and metrics

๐Ÿš€ Quick Start

Prerequisites

  • Node.js 18.0 or later
  • npm, yarn, or pnpm
  • Modern browser with Web Speech API support

Installation

# Clone the repository
git clone https://github.com/yourusername/live-transcription-tool.git
cd live-transcription-tool

# Install dependencies
npm install

# Start development server
npm run dev

Open http://localhost:3000 in your browser.

Production Build

# Build for production
npm run build

# Start production server
npm start

๐ŸŽฎ Usage

Basic Usage

  1. Start Recording: Click the "Start Recording" button
  2. Grant Permissions: Allow microphone access when prompted
  3. Speak Naturally: The tool will transcribe your speech in real-time
  4. Copy Transcript: Copy to clipboard
  5. Clear Chat: Clear conversation

๐Ÿ—๏ธ Architecture

Project Structure

#### useAudioTranscription Hook
The core hook that manages:
- Speech recognition lifecycle
- Audio level monitoring
- Error handling and recovery
- Transcript state management

```typescript
const {
  isListening,
  transcript,
  startRecording,
  stopRecording,
  // ... other methods
} = useAudioTranscription();

Smart Restart Logic

Prevents word loss during recognition restarts:

  • 100ms delay between restarts
  • Duplicate detection and prevention
  • Context-aware error recovery
  • Confidence-based filtering

๐Ÿ”ง Development

Available Scripts

npm run dev          # Start development server
npm run build        # Build for production
npm run start        # Start production server
npm run lint         # Run ESLint
npm run lint:fix     # Fix ESLint issues
npm run format       # Format code with Prettier
npm run type-check   # TypeScript type checking
npm run test         # Run tests
npm run test:watch   # Run tests in watch mode

Code Quality

This project uses comprehensive tooling for code quality:

  • ESLint - Code linting with TypeScript rules
  • Prettier - Code formatting
  • Husky - Git hooks for quality checks
  • lint-staged - Run linters on staged files
  • Commitlint - Conventional commit messages

Pre-commit Hooks

Every commit automatically runs:

  • ESLint fixes
  • Prettier formatting
  • TypeScript type checking
  • Related tests
  • Commit message validation

๐Ÿงช Testing

Running Tests

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

Test Structure

__tests__/
โ”œโ”€โ”€ components/
โ”‚   โ”œโ”€โ”€ LiveTranscriptionTool.test.tsx
โ”‚   โ””โ”€โ”€ AudioVisualizer.test.tsx
โ”œโ”€โ”€ hooks/
โ”‚   โ”œโ”€โ”€ useAudioTranscription.test.ts
โ”‚   โ””โ”€โ”€ useAudioPermissions.test.ts
โ””โ”€โ”€ utils/
    โ”œโ”€โ”€ audio.test.ts
    โ””โ”€โ”€ export.test.ts

๐Ÿ” Browser Compatibility

Supported Browsers

  • โœ… Chrome 25+
  • โœ… Edge 79+
  • โœ… Safari 14.1+
  • โœ… Firefox (limited support)
  • โœ… Mobile browsers (iOS Safari, Chrome Mobile)

Required APIs

  • Web Speech API - For speech recognition
  • MediaDevices API - For microphone access
  • Web Audio API - For audio level monitoring
  • File API - For transcript exports

๐Ÿ“ฑ Mobile Support

The application is fully responsive and works on mobile devices:

  • Touch-friendly interface
  • Mobile-optimized audio processing
  • Responsive design with Tailwind CSS
  • Support for mobile browsers

๐Ÿš€ Deployment

Vercel (Recommended)

Deploy with Vercel

# Install Vercel CLI
npm i -g vercel

# Deploy
vercel

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide.

Development Workflow

  1. Fork the repository
  2. Clone your fork locally
  3. Create a new branch for your feature
  4. Make your changes with tests
  5. Run quality checks: npm run lint && npm run type-check && npm test
  6. Commit using conventional commits
  7. Push and create a Pull Request

Commit Convention

feat: add new transcription feature
fix: resolve audio dropout issue
docs: update API documentation
style: format code with prettier
refactor: improve error handling
test: add unit tests for hooks
chore: update dependencies

๐Ÿ“Š Performance

Bundle Size

  • Initial JS: ~150KB gzipped
  • First Load: ~200KB total
  • Runtime: Minimal memory footprint
  • Audio Processing: Optimized with Web Audio API

Lighthouse Scores

  • Performance: 100
  • Accessibility: 93
  • Best Practices: 100
  • SEO: 100

๐Ÿ”’ Privacy & Security

  • No Server Processing - All transcription happens locally
  • No Data Storage - Transcripts remain on your device
  • Secure by Default - HTTPS required for microphone access
  • Privacy First - No tracking or analytics

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ž Support

๐Ÿ—บ๏ธ Roadmap

v2.0.0 (Coming Soon)

  • Cloud speech recognition APIs (Google, AWS, Azure)
  • Real-time collaboration
  • Advanced export formats (PDF, DOCX)
  • Custom vocabulary support
  • Speaker identification
  • Dark mode theme
  • Custom Language Support We are planning to supports multiple languages:
    • English (US/UK)
    • Spanish
    • French
    • German
    • Japanese
    • Chinese (Mandarin)

v2.1.0

  • WebSocket integration
  • Multi-language detection
  • Transcript search and filtering
  • Integration with popular platforms
  • Mobile app (React Native)

โญ Star this repo โ€ข ๐Ÿด Fork it โ€ข ๐Ÿ“ Report issues

Made with โค๏ธ by Sagar Pednekar