Representing Sounds as Neural Amplitude Fields: A Benchmark of Coordinate-MLPs and A Fourier Kolmogorov-Arnold Framework
[AAAI 2025]
Linfei Li · Lin Zhang* · Zhong Wang · Fengyi Zhang · Zelin Li · Ying Shen
-
Configure a Python environment and install related dependencies.
pip install -r requirements.txt -
Download the required dataset from the following websites.
-
Organize the data set according to the following file structure.
--data --demo --gt_bach.wav --gt_counting.wav --gt_blues00000.wav # from GTZAN dataset blues_00000.wav --gtzan --genres --blues ... --VCTK --wav48_silence_trimmed --p231 ...
- Testing on
Bach,Counting, andBlues.
bash scripts/benchmark_MLPs_demo.sh
- Testing on
CSTR VCTKdataset.
bash scripts/benchmark_MLPs_vctk.sh
- Testing on
GTZANdataset.
bash scripts/benchmark_MLPs_gtzan.sh
- Testing on
Bach,Counting, andBlues.
bash scripts/benchmark_KANs_demo.sh
- Testing on
CSTR VCTKdataset.
bash scripts/benchmark_KANs_vctk.sh
- Testing on
GTZANdataset.
bash scripts/benchmark_KANs_gtzan.sh
-
RFFpositional encoding is sensitive to the dimension parameter$L$ .
bash scripts/benchmark_FFN_L.sh
-
RFFpositional encoding is sensitive to the variance parameter$\sigma$ .
bash scripts/benchmark_FFN_sigma.sh
-
NeFFpositional encoding is sensitive to the dimension parameter$L$ .
bash scripts/benchmark_NeRF_L.sh
-
Gaussian-typeactivation functions are sensitive to the variance factor$a$ .
bash scripts/benchmark_gaussian.sh
-
Sine-typeactivation functions are sensitive to the frequency factor$\omega$ .
# Sine
bash scripts/benchmark_siren.sh
# Incode-Sine
bash scripts/benchmark_incode-sine.sh
bash scripts/benchmark_sensitive_init.sh
When model capacity is limited, larger
bash scripts/benchmark_Fourier_omega.sh