This project aims to create a simulation system in Unity, utilising the Steam Audio plugin, for generating multichannel spatial audio data. The primary objective is to assess the effectiveness of spatial audio processing software in strengthening speech-enhancing machine learning algorithms. High-quality audio recordings from diverse environments are essential for training algorithms to improve the clarity of noisy speech signals. However, obtaining a comprehensive database of such recordings is time-intensive. To address this challenge, the project proposes a simulation approach using software to generate these recordings, potentially streamlining the process. The simulation employs a virtual microphone in Unity to produce audio in various simulated acoustical spaces. The quality of the simulated audio renders will be evaluated by com- paring them with real-life equivalents.
This project aimed to determine whether simulated acoustical data, generated in a 3D development suite with a spatial audio plugin, can effectively serve as input for voice-enhancing algorithms. The results obtained from the jammer suppression test and voice quality measurements demonstrate that simulated acoustical data, rendered via a 3D development suite using a spatial audio plu- gin, does perform well enough to be utilised as input for voice- enhancing algorithms. Moreover, it exhibits only a slight degradation in voice quality compared to its real counterpart, with a noticeable decline in jammer suppression performance. It is noteworthy that the Unity-trained model utilised significantly less data compared to the other models, indicating substantial potential for improvement. The system requirements status indicates that customization of room dimensions is unfinished. However, this can be achieved by creating a new virtual room, albeit with a slightly more time-consuming process. The same applies to GUI customization of the render duration.