-
Notifications
You must be signed in to change notification settings - Fork 2
Whistle Detection
The UNSW Australia team thanks Thomas Hamboeck and the Austrian Kangaroos for their contribution to the Standard Platform League code releases, and Alexandre Mazel for his Stack Overflow answer that allowed earlier versions of this code to work with Naoqi under the 1.14 toolchain.
This script was designed to work as a (more or less) standalone python process:
# On PC or Mac, this should run pc_wav_test()
# which retests the implementation against audio recordings in rUNSWift/test/audio
python whistle_detector.py
The same whistle_detector.py Python module also runs on the Nao under the 2.1 toolchain / Nao V4 & V5.
Try changing VERBOSITY
to a higher integer value to get a feel for what is going on, then follow the print statements.
e.g. It has self-terminating code built in. This self-termination was designed to preserve the existing runswift interface so most developers shouldn't need to know how to start the whistle detector independently of their main executable, or hopefully need to be concerned about rogue whistle-detectors. It should be easy to update that if you wish to run with a different main executable, or comment it out for easier debugging.
Whistle detection analysis was performed with the help of the Audacity Spectrogram visualisation - basically humans are good at looking for patterns. This was used to develop the algorithms and tune the parameters.
Conceptually from the image above, you can think of the whistle_detector as looking for a red on blue, or white on red rectangle, somewhere in the 2000-4000 Hz range, lasting for a minimum time of 250 ms, among other things in the Developer configurable settings section.
The main algorithm is in the interrogate function. The algorithm:
- Assumes it can get 48000 Hz input data from pyalsaaudio or some other audio source.
- Performs a numpy.fft.rfft
- Checks it has at least spectra_per_second = 47 spectra of data, used to determine what is background noise temporally.
- Calculates the whistle_threshold as the maximum of the spectrum_threshold and temporal_threshold (on review, this probably could be done a bit later which may save CPU if we can early exit on background noise).
- Adaptively grows background noise zones based on the sound spectrum to focus on the whistle "signal".
- Filters the remaining spectrum based on the whistle_threshold determined earlier.
- If they all succeed, increments a counter until we get 12 successes, or ~250ms, this being a "whistle heard".
- The counter is reset on 4 misses, or ~83ms.
As whistle_detector.py consumes around 30-40% of the Nao's Atom CPU, we decided only to run whistle detection in the READY and SET game states. Running in READY is partly a historical trade-off from the 1.14 toolchain, where we spent up to 10 seconds longer telling Naoqi to release the microphones so we could use them.
On the Nao, whistle_detector.py will generate ~1-2 second long timestamped files when it detects a whistle. These are then read by the GameController thread.
Some of those files were extracted from the robot, and manually classified so they can be run by the pc_wav_test()
. They are stored in rUNSWift/test/audio.
We found this to be very useful in testing gear noises which caused many false positives on earlier implementations, and also early on at competition as we suspected a noisy crowd would false positive, which it did requiring further parameter tuning (and thankfully no further algorithmic development, though we were prepared for this possibility).
In the runswift GameController thread, we also implemented a voting over WiFi as we found the Aldebaran microphones sometimes mysteriously failed in our lab games (seemed like loose wires as they also sometimes worked perfectly). This was particularly annoying when just our striker had the bad microphone.
whistle_detector.py currently uses around 30-40% of the Nao's Atom CPU as measured by the top
command.
- The implementation would probably work with 8000 Hz input data as a possible future optimisation to lower CPU usage. This would also be a good opportunity to review the developer configurable settings as many feel duplicated and clunkily hard-coded to the specific soccer whistle sound.
- It may also be beneficial to integrate at a different part of the rather complicated Linux sound stack to the current ALSA, as sometimes we think Naoqi or libagent unexpectedly locked the microphones, requiring a system reboot. It may also allow
audioin
oraudiodevice
to be put back intonaoqi/autoload.ini
- Future work could look into PyPI or additional Python profiling.
- It might also be nice to remove the
wtb_pip
hacks if we are prepared toeasy_install pip
as part of our robot setup process. Unfortunatelypip
was not bundled withpython
until very recent versions. It is disappointing that Aldebaran does not ship more recent versions of Python 3 with the Nao system image (ours has 2.7.x and 3.1.x, while Python 3.5 was recently released).