This repository was created as an exploratory project to bridge my background in signal processing (EEG / biosignals) with audio processing and ASR systems.
Rather than building a production-ready solution, the goal of this work is to:
- understand how real-world, noise-heavy audio scenarios affect Transformer-based ASR models,
- experiment with basic audio preprocessing techniques,
- and gain hands-on experience in model evaluation and reporting, aligned with LLM & Speech-focused R&D roles.
The audio samples used in this study were not collected as a standardized dataset. Instead, a small number of scenario-driven recordings were selected from publicly available LTC (Jidoka) quadruped robot videos on YouTube.
These videos were intentionally chosen to:
- simulate realistic operational noise (motor hum, footsteps),
- keep the analysis controlled and interpretable,
- and support qualitative error analysis rather than statistical benchmarking.
This design choice reflects the exploratory nature of the project.
Simple demo interface used to inspect ASR outputs and preprocessing effects during experiments.