Skip to content

Daily ArXiv Paper: Research Related to Direction of Arrival (DoA)

Notifications You must be signed in to change notification settings

hengxt/DailyArXiv

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

541 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Daily Papers - DoA Estimation

Automatically fetches the latest arXiv papers on Direction of Arrival (DoA) estimation and Array Signal Processing. Strictly filtered for Signal Processing (eess.SP, eess.AS) and Audio (cs.SD) fields.

Last update: 2026-02-11

MUSIC Array

Title Date Abstract Comment
Joint single-shot ToA and DoA estimation for VAA-based BLE ranging with phase ambiguity: A deep learning-based approach 2026-01-21
Show

Conventional direction-of-arrival (DoA) estimation methods rely on multi-antenna arrays, which are costly to implement on size-constrained Bluetooth Low Energy (BLE) devices. Virtual antenna array (VAA) techniques enable DoA estimation with a single antenna, making angle estimation feasible on such devices. However, BLE only provides a single-shot two-way channel frequency response (CFR) with a binary phase ambiguity issue, which hinders the direct application of VAA. To address this challenge, we propose a unified model that combines VAA with BLE two-way CFR, and introduce a neural network based phase recovery framework that employs row / column predictors with a voting mechanism to resolve the ambiguity. The recovered one-way CFR then enables super resolution algorithms such as MUSIC for joint time of arrival (ToA) and DoA estimation. Simulation results demonstrate that the proposed method achieves superior performance under non-uniform VAAs, with mean square errors approaching the Cramer Rao bound at SNR $\geq$ 5 dB.

An Fluid Antenna Array-Enabled DOA Estimation Method: End-Fire Effect Suppression 2025-12-22
Show

Direction of Arrival (DOA) estimation serves as a critical sensing technology poised to play a vital role in future intelligent and ubiquitous communication systems. Despite the development of numerous mature super-resolution algorithms, the inherent end-fire effect problem in fixed antenna arrays remains inadequately addressed. This work proposed a novel array architecture composed of fluid antennas. By exploiting the spatial reconfigurability of their positions to equivalently modulate the array steering vector and integrating it with the classical MUSIC algorithm, this approach achieved high-precision DOA estimation. Simulation results demonstrated that the proposed method delivers outstanding estimation performance even in highly challenging end-fire regions.

Adaptive MIMO Radar Architecture for Energy-Efficient Wireless Sensing in the D-Band 2025-12-12
Show

The D-band offering an untapped wide bandwidth is promising for high data rate communication and high-resolution wireless sensing. However, these potentials are hindered by the low performance and energy efficiency of the D-band circuits and systems. We present an adaptive multi-input multi-output (MIMO) radar architecture for energy-efficient wireless sensing in the D-band, leveraging a reconfigurable 2D array of radar transceiver front-ends, a scaling approach for the receiver (RX) signal-to-noise ratio (SNR) and the transmitter (TX) output power ($P_{\rm TX}$) with target distance, and dynamic selection of the direction-of-arrival (DOA) estimation algorithm. The reconfigurable radar array, providing an adaptive radar resolution, enhances the energy efficiency by reducing power consumption in the radar RF front-end and lowering the computational complexity in the radar back-end. The RX SNR and the TX output power are scaled with the distance as ${\rm SNR} \propto d^{-p}$ and $P_{\rm TX} \propto d^{4-p}$, where $0 < p < 4$, leading to more efficient resource allocation in varying target distance conditions. Additionally, DOA estimation results using MUSIC and MVDR algorithms indicate that the optimum algorithm, in terms of the accuracy and computational complexity, should be selected based on the number of radar array elements. Furthermore, we develop a hardware model for the MIMO radar RF front-end to evaluate the power consumption of the TX, RX, and local oscillator (LO) distribution network. It is shown that the power consumption of the LO distribution network, which can dominate the power consumption for a large MIMO radar, can be minimized through a distribution strategy for LO amplifiers employed for compensating passive losses. Performance of the adaptive MIMO radar is evaluated in the free-space and the through-wall indoor sensing scenarios in the D-band.

DoA Estimation with Sparse Arrays: Effects of Antenna Element Patterns and Nonidealities 2025-11-28
Show

This paper studies the effects of directional antenna element complex gain patterns and nonidealities in direction of arrival (DoA) estimation. We compare sparse arrays and classical uniform linear arrays, harnessing EM simulation tools to accurately model the electromagnetic behavior of both patch and Vivaldi antenna element including mutual coupling effects. We show that with sparse array configurations, the performance impacts are significant in terms of DoA estimation accuracy and operable SNR ranges. Specifically, in the scenarios considered, both the usage of directional antenna elements and a sparse array result in over 90% reduction in average direction finding error, compared to a uniform omnidirectional array with the same number of elements (in this case eight), when estimating the directions of two sources using the MUSIC algorithm. For a fixed angular RMSE, the improvements in array sensitivity are shown to yield a 4 to 15-fold increase in one-way coverage distance (assuming free-space path loss). Among the studied options, the best performance was obtained using sparse arrays with either patch or Vivaldi elements for field of views of 100$^\circ$ or 120$^\circ$, respectively.

Spatial Signal Focusing and Noise Suppression for Direction-of-Arrival Estimation in Large-Aperture 2D Arrays under Demanding Conditions 2025-10-13
Show

Direction-of-Arrival (DOA) estimation in sensor arrays faces limitations under demanding conditions, including low signal-to-noise ratio, single-snapshot scenarios, coherent sources, and unknown source counts. Conventional beamforming suffers from sidelobe interference, adaptive methods (e.g., MVDR) and subspace algorithms (e.g., MUSIC) degrade with limited snapshots or coherent signals, while sparse-recovery approaches (e.g., L1-SVD) incur high computational complexity for large arrays. In this article, we construct the concept of the optimal spatial filter to solve the DOA estimation problem under demanding conditions by utilizing the sparsity of spatial signals. By utilizing the concept of the optimal spatial filter, we have transformed the DOA estimation problem into a solution problem for the optimal spatial filter. We propose the Spatial Signal Focusing and Noise Suppression (SSFNS) algorithm, which is a novel DOA estimation framework grounded in the theoretical existence of an optimal spatial filter, to solve for the optimal spatial filter and obtain DOA. Through experiments, it was found that the proposed algorithm is suitable for large aperture two-dimensional arrays and experiments have shown that our proposed algorithm performs better than other algorithms in scenarios with few snapshots or even a single snapshot, low signal-to-noise ratio, coherent signals, and unknown signal numbers in two-dimensional large aperture arrays.

Single-Snapshot Localization Using Sparse Extremely Large Aperture Arrays 2025-09-22
Show

This paper investigates single-snapshot direction-of-arrival (DOA) estimation and target localization with coherent sparse extremely large aperture arrays (ELAAs) in automotive radar applications. Far-field and near-field signal models are formulated for distributed bistatic configurations. To enable noncoherent processing, a single-snapshot MUSIC (SS-MUSIC) algorithm is proposed to fuse local spectra from individual subarrays and extended to near-field localization via geometric intersection. For coherent processing, a single-snapshot ESPRIT (SS-ESPRIT) method with ambiguity dealiasing is developed to fully exploit the aperture of sparse ELAAs for high-resolution angle estimation. Simulation results demonstrate that SS-ESPRIT provides superior angular resolution for closely spaced far-field targets, while SS-MUSIC offers robustness in near-field localization and flexibility in hybrid scenarios.

ICASS...

ICASSP 2026 manuscript under review

Direction of Arrival Estimation: A Tutorial Survey of Classical and Modern Methods 2025-09-02
Show

Direction of arrival (DOA) estimation is a fundamental problem in array signal processing with applications spanning radar, sonar, wireless communications, and acoustic signal processing. This tutorial survey provides a comprehensive introduction to classical and modern DOA estimation methods, specifically designed for students and researchers new to the field. We focus on narrowband signal processing using uniform linear arrays, presenting step-by-step mathematical derivations with geometric intuition. The survey covers classical beamforming methods, subspace-based techniques (MUSIC, ESPRIT), maximum likelihood approaches, and sparse signal processing methods. Each method is accompanied by Python implementations available in an open-source repository, enabling reproducible research and hands-on learning. Through systematic performance comparisons across various scenarios, we provide practical guidelines for method selection and parameter tuning. This work aims to bridge the gap between theoretical foundations and practical implementation, making DOA estimation accessible to beginners while serving as a comprehensive reference for the field. See https://github.com/AmgadSalama/DOA for detail implementation of the methods.

DOA S...

DOA Survey, 44 pages, Not published yet

Fluid Antenna Enabled Direction-of-Arrival Estimation Under Time-Constrained Mobility 2025-08-14
Show

Fluid antenna (FA) technology has emerged as a promising approach in wireless communications due to its capability of providing increased degrees of freedom (DoFs) and exceptional design flexibility. This paper addresses the challenge of direction-of-arrival (DOA) estimation for aligned received signals (ARS) and non-aligned received signals (NARS) by designing two specialized uniform FA structures under time-constrained mobility. For ARS scenarios, we propose a fully movable antenna configuration that maximizes the virtual array aperture, whereas for NARS scenarios, we design a structure incorporating a fixed reference antenna to reliably extract phase information from the signal covariance. To overcome the limitations of large virtual arrays and limited sample data inherent in time-varying channels (TVC), we introduce two novel DOA estimation methods: TMRLS-MUSIC for ARS, combining Toeplitz matrix reconstruction (TMR) with linear shrinkage (LS) estimation, and TMR-MUSIC for NARS, utilizing sub-covariance matrices to construct virtual array responses. Both methods employ Nystrom approximation to significantly reduce computational complexity while maintaining estimation accuracy. Theoretical analyses and extensive simulation results demonstrate that the proposed methods achieve underdetermined DOA estimation using minimal FA elements, outperform conventional methods in estimation accuracy, and substantially reduce computational complexity.

13 pages
DOA Estimation via Continuous Aperture Arrays: MUSIC and CRLB 2025-07-28
Show

Direction-of-arrival (DOA) estimation using continuous aperture array (CAPA) is studied. Compared to the conventional spatially discrete array (SPDA), CAPA significantly enhances the spatial degrees-of-freedoms (DoFs) for DOA estimation, but its infinite-dimensional continuous signals render the conventional estimation algorithm non-applicable. To address this challenge, a new multiple signal classification (MUSIC) algorithm is proposed for CAPAs. In particular, an equivalent continuous-discrete transformation is proposed to facilitate the eigendecomposition of continuous operators. Subsequently, the MUSIC spectrum is accurately approximated using the Gauss-Legendre quadrature, effectively reducing the computational complexity. Furthermore, the Cramér-Rao lower bounds (CRLBs) for DOA estimation using CAPAs are analyzed for both cases with and without priori knowledge of snapshot signals. It is theoretically proved that CAPAs significantly improve the DOA estimation accuracy compared to traditional SPDAs. Numerical results further validate this insight and demonstrate the effectiveness of the proposed MUSIC algorithm for CAPA. The proposed method achieves near-optimal estimation performance while maintaining a low computational complexity.

Submi...

Submit to possible IEEE journal

Near Field Localization via AI-Aided Subspace Methods 2025-06-27
Show

The increasing demands for high-throughput and energy-efficient wireless communications are driving the adoption of extremely large antennas operating at high-frequency bands. In these regimes, multiple users will reside in the radiative near-field, and accurate localization becomes essential. Unlike conventional far-field systems that rely solely on DOA estimation, near-field localization exploits spherical wavefront propagation to recover both DOA and range information. While subspace-based methods, such as MUSIC and its extensions, offer high resolution and interpretability for near-field localization, their performance is significantly impacted by model assumptions, including non-coherent sources, well-calibrated arrays, and a sufficient number of snapshots. To address these limitations, this work proposes AI-aided subspace methods for near-field localization that enhance robustness to real-world challenges. Specifically, we introduce NF-SubspaceNet, a deep learning-augmented 2D MUSIC algorithm that learns a surrogate covariance matrix to improve localization under challenging conditions, and DCD-MUSIC, a cascaded AI-aided approach that decouples angle and range estimation to reduce computational complexity. We further develop a novel model-order-aware training method to accurately estimate the number of sources, that is combined with casting of near field subspace methods as AI models for learning. Extensive simulations demonstrate that the proposed methods outperform classical and existing deep-learning-based localization techniques, providing robust near-field localization even under coherent sources, miscalibrations, and few snapshots.

Under...

Under review for publication in the IEEE

Physically Parameterized Differentiable MUSIC for DoA Estimation with Uncalibrated Arrays 2025-03-20
Show

Direction of arrival (DoA) estimation is a common sensing problem in radar, sonar, audio, and wireless communication systems. It has gained renewed importance with the advent of the integrated sensing and communication paradigm. To fully exploit the potential of such sensing systems, it is crucial to take into account potential hardware impairments that can negatively impact the obtained performance. This study introduces a joint DoA estimation and hardware impairment learning scheme following a model-based approach. Specifically, a differentiable version of the multiple signal classification (MUSIC) algorithm is derived, allowing efficient learning of the considered impairments. The proposed approach supports both supervised and unsupervised learning strategies, showcasing its practical potential. Simulation results indicate that the proposed method successfully learns significant inaccuracies in both antenna locations and complex gains. Additionally, the proposed method outperforms the classical MUSIC algorithm in the DoA estimation task.

Direction Finding for Software Defined Radios with Switched Uniform Circular Arrays 2025-02-12
Show

Accurate Direction of Arrival (DoA) estimation is critical for applications in robotics and communication, but high costs and complexity of coherent multi-channel receivers hinder accessibility. This work proposes a cost-effective DoA estimation system for continuous wave (CW) signals in the 2.4 GHz ISM band. A two-channel software-defined radio (SDR) with time-division multiplexing (TDM) enables pseudo-coherent sampling of an eight-element uniform circular array (UCA) with low hardware complexity. A central reference antenna mitigates phase jitter and sampling errors. The system applies an enhanced MUSIC algorithm with spatial smoothing to handle light multipath interference in indoor and outdoor environments. Experiments in an anechoic chamber validate accuracy under ideal conditions, while real-world tests confirm robust performance in multipath-prone scenarios. With 5 Hz DoA updates and post-processing to enhance tracking, the system provides an accessible and reliable solution for DoA estimation in real-world environments.

4 pag...

4 pages, 8 figures, IEEE IMS 2025

A Hybrid Dynamic Subarray Architecture for Efficient DOA Estimation in THz Ultra-Massive Hybrid MIMO Systems 2025-01-30
Show

Terahertz (THz) communication combined with ultra-massive multiple-input multiple-output (UM-MIMO) technology is promising for 6G wireless systems, where fast and precise direction-of-arrival (DOA) estimation is crucial for effective beamforming. However, finding DOAs in THz UM-MIMO systems faces significant challenges: while reducing hardware complexity, the hybrid analog-digital (HAD) architecture introduces inherent difficulties in spatial information acquisition the large-scale antenna array causes significant deviations in eigenvalue decomposition results; and conventional two-dimensional DOA estimation methods incur prohibitively high computational overhead, hindering fast and accurate realization. To address these challenges, we propose a hybrid dynamic subarray (HDS) architecture that strategically divides antenna elements into subarrays, ensuring phase differences between subarrays correlate exclusively with single-dimensional DOAs. Leveraging this architectural innovation, we develop two efficient algorithms for DOA estimation: a reduced-dimension MUSIC (RD-MUSIC) algorithm that enables fast processing by correcting large-scale array estimation bias, and an improved version that further accelerates estimation by exploiting THz channel sparsity to obtain initial closed-form solutions through specialized two-RF-chain configuration. Furthermore, we develop a theoretical framework through Cramér-Rao lower bound analysis, providing fundamental insights for different HDS configurations. Extensive simulations demonstrate that our solution achieves both superior estimation accuracy and computational efficiency, making it particularly suitable for practical THz UM-MIMO systems.

Performance evaluation of non-uniform sensor spacing in a linear array configuration for MUSIC algorithm 2025-01-25
Show

In this paper, the performance of non-uniform spacing of sensors is evaluated for the MUSIC algorithm which estimates the direction of arrival (DOA) of a narrowband plane wave impinging on an array of sensors. Unlike uniform sensor spacing arrangement, where sensors are equidistant (equal to half the wavelength), we consider non-uniform spacing for the arrangement of sensors, where the distance between consecutive sensors increases progressively. We observe that the non-uniform sensor spacing configuration (with lesser number of sensors) can provide similar or better accuracy in DOA estimation compared to uniform sensor spacing configuration despite more number of sensors at identical array length.

Publi...

Published in the Proceedings of IEEE International Conference on Signal and Image Processing, 7-9 Dec 2006, Hubli, India (ICSIP 2006)

Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers 2025-01-13
Show

To estimate the direction of arrival (DOA) of multiple speakers, subspace-based prototype transfer function matching methods such as multiple signal classification (MUSIC) or relative transfer function (RTF) vector matching are commonly employed. In general, these methods require calibrated microphone arrays, which are characterized by a known array geometry or a set of known prototype transfer functions for several directions. In this paper, we consider a partially calibrated microphone array, composed of a calibrated binaural hearing aid and a (non-calibrated) external microphone at an unknown location with no available set of prototype transfer functions. We propose a procedure for completing sets of prototype transfer functions by exploiting the orthogonality of subspaces, allowing to apply matching-based DOA estimation methods with partially calibrated microphone arrays. For the MUSIC and RTF vector matching methods, experimental results for two speakers in noisy and reverberant environments clearly demonstrate that for all locations of the external microphone DOAs can be estimated more accurately with completed sets of prototype transfer functions than with incomplete sets. \c{opyright}20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Accep...

Accepted for ICASSP 2025

Low Complexity DoA-ToA Signature Estimation for Multi-Antenna Multi-Carrier Systems 2024-09-13
Show

Accurate direction of arrival (DoA) and time of arrival (ToA) estimation is an stringent requirement for several wireless systems like sonar, radar, communications, and dual-function radar communication (DFRC). Due to the use of high carrier frequency and bandwidth, most of these systems are designed with multiple antennae and subcarriers. Although the resolution is high in the large array regime, the DoA-ToA estimation accuracy of the practical on-grid estimation methods still suffers from estimation inaccuracy due to the spectral leakage effect. In this article, we propose DoA-ToA estimation methods for multi-antenna multi-carrier systems with an orthogonal frequency division multiplexing (OFDM) signal. In the first method, we apply discrete Fourier transform (DFT) based coarse signature estimation and propose a low complexity multistage fine-tuning for extreme enhancement in the estimation accuracy. The second method is based on compressed sensing, where we achieve the super-resolution by taking a 2D-overcomplete angle-delay dictionary than the actual number of antenna and subcarrier basis. Unlike the vectorized 1D-OMP method, we apply the low complexity 2D-OMP method on the matrix data model that makes the use of CS methods practical in the context of large array regimes. Through numerical simulations, we show that our proposed methods achieve the similar performance as that of the subspace-based 2D-MUSIC method with a significant reduction in computational complexity.

5 pag...

5 pages, 4 figures, 1 table

Direction of Arrival Estimation with Sparse Subarrays 2024-08-17
Show

This paper proposes design techniques for partially-calibrated sparse linear subarrays and algorithms to perform direction-of-arrival (DOA) estimation. First, we introduce array architectures that incorporate two distinct array categories, namely type-I and type-II arrays. The former breaks down a known sparse linear geometry into as many pieces as we need, and the latter employs each subarray such as it fits a preplanned sparse linear geometry. Moreover, we devise two Direction of Arrival (DOA) estimation algorithms that are suitable for partially-calibrated array scenarios within the coarray domain. The algorithms are capable of estimating a greater number of sources than the number of available physical sensors, while maintaining the hardware and computational complexity within practical limits for real-time implementation. To this end, we exploit the intersection of projections onto affine spaces by devising the Generalized Coarray Multiple Signal Classification (GCA-MUSIC) in conjunction with the estimation of a refined projection matrix related to the noise subspace, as proposed in the GCA root-MUSIC algorithm. An analysis is performed for the devised subarray configurations in terms of degrees of freedom, as well as the computation of the Cramèr-Rao Lower Bound for the utilized data model, in order to demonstrate the good performance of the proposed methods. Simulations assess the performance of the proposed design methods and algorithms against existing approaches.

15 pages, 8 figures
Analysis of Partially-Calibrated Sparse Subarrays for Direction Finding with Extended Degrees of Freedom 2024-08-06
Show

This paper investigates the problem of direction-of-arrival (DOA) estimation using multiple partially-calibrated sparse subarrays. In particular, we present the Generalized Coarray Multiple Signal Classification (GCA-MUSIC) DOA estimation algorithm to scenarios with partially-calibrated sparse subarrays. The proposed GCA-MUSIC algorithm exploits the difference coarray for each subarray, followed by a specific pseudo-spectrum merging rule that is based on the intersection of the signal subspaces associated to each subarray. This rule assumes that there is no a priori knowledge about the cross-covariance between subarrays. In that way, only the second-order statistics of each subarray are used to estimate the directions with increased degrees of freedom, i.e., the estimation procedure preserves the coarray Multiple Signal Classification and sparse arrays properties to estimate more sources than the number of physical sensors in each subarray. Numerical simulations show that the proposed GCA-MUSIC has better performance than other similar strategies.

6 pages, 5 figures
Near-Field Localization with an Exact Propagation Model in Presence of Mutual Coupling 2024-07-28
Show

Localizing near-field sources considering practical arrays is important in wireless communications. Array-based apertures exhibit mutual coupling between the array elements, which can significantly degrade the performance of the localization method. In this paper, we propose two methods to localize near-field sources by direction of arrival (DOA) and range estimations in the presence of mutual coupling. The first method utilizes a two-dimensional search to estimate DOA and the range of the source. Therefore, it suffers from a high computational load. The second method reduces the two-dimensional search to one-dimensional, thus decreasing the computational complexity while offering similar DOA and range estimation performance. Besides, our second method reduces computational time by over 50% compared to the multiple signal classification (MUSIC) algorithm.

Proce...

Proceedings of 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring)

SBL Array

Title Date Abstract Comment
Spatially Filtered Sparse Bayesian Learning for Direction-of-Arrival Estimation with Leaky-Wave Antennas 2025-10-12
Show

Direction-of-arrival (DoA) estimation with leaky-wave antennas (LWAs) offers a compact and cost-effective alternative to conventional antenna arrays but remains challenging in the presence of coherent sources. To address this issue, we propose a spatially filtered sparse Bayesian learning (SF-SBL) framework. Firstly, the field of view (FoV) is divided into angular sectors according to the frequency beam-scanning property of LWAs, and Bayesian inverse problems are then solved within each sector to improve efficiency and reduce computational cost. Both on-grid SBL and off-grid SBL formulations are developed. Simulation results show that the proposed approach achieves robust and accurate DoA estimation, even with coherent sources.

Prepr...

Preprint submitted to ICASSP 2026. 4 pages, 3 figures

Sparse Bayesian Learning for DOA Estimation in Heteroscedastic Noise 2017-11-08
Show

The paper considers direction of arrival (DOA) estimation from long-term observations in a noisy environment. In such an environment the noise source might evolve, causing the stationary models to fail. Therefore a heteroscedastic Gaussian noise model is introduced where the variance can vary across observations and sensors. The source amplitudes are assumed independent zero-mean complex Gaussian distributed with unknown variances (i.e. the source powers), inspiring stochastic maximum likelihood DOA estimation. The DOAs of plane waves are estimated from multi-snapshot sensor array data using sparse Bayesian learning (SBL) where the noise is estimated across both sensors and snapshots. This SBL approach is more flexible and performs better than high-resolution methods since they cannot estimate the heteroscedastic noise process. An alternative to SBL is simple data normalization, whereby only the phase across the array is utilized. Simulations demonstrate that taking the heteroscedastic noise into account improves DOA estimation.

Submi...

Submitted to IEEE TSP

Subspace Array

Title Date Abstract Comment
Sensing for Free: Learn to Localize More Sources than Antennas without Pilots 2026-01-08
Show

Integrated sensing and communication (ISAC) represents a key paradigm for future wireless networks. However, existing approaches require waveform modifications, dedicated pilots, or overhead that complicates standards integration. We propose sensing for free - performing multi-source localization without pilots by reusing uplink data symbols, making sensing occur during transmission and directly compatible with 3GPP 5G NR and 6G specifications. With ever-increasing devices in dense 6G networks, this approach is particularly compelling when combined with sparse arrays, which can localize more sources than uniform arrays via an enlarged virtual array. Existing pilot-free multi-source localization algorithms first reconstruct an extended covariance matrix and apply subspace methods, incurring cubic complexity and limited to second-order statistics. Performance degrades under non-Gaussian data symbols and few snapshots, and higher-order statistics remain unexploited. We address these challenges with an attention-only transformer that directly processes raw signal snapshots for grid-less end-to-end direction-of-arrival (DOA) estimation. The model efficiently captures higher-order statistics while being permutation-invariant and adaptive to varying snapshot counts. Our algorithm greatly outperforms state-of-the-art AI-based benchmarks with over 30x reduction in parameters and runtime, and enjoys excellent generalization under practical mismatches. Applied to multi-user MIMO beam training, our algorithm can localize uplink DOAs of multiple users during data transmission. Through angular reciprocity, estimated uplink DOAs prune downlink beam sweeping candidates and improve throughput via sensing-assisted beam management. This work shows how reusing existing data transmission for sensing can enhance both multi-source localization and beam management in 3GPP efforts towards 6G.

17 pa...

17 pages, 14 figures, 1 table. This paper was accepted by the IEEE Journal on Selected Areas in Communications (JSAC) on Jan. 5, 2026

Efficient Decoders for Sensing Subspace Code 2025-12-04
Show

Sparse antenna array sensing of source/target via direction of arrival (DoA) estimation motivates design of the sensing framework in joint communication and sensing (JCAS) systems for sixth generation (6G) communication systems. Recently, it is established by Mahdavifar, Rajamäki, and Pal that array geometry of sparse arrays has fundamental connections with the design of subspace codes in coding theory. This was then utilized to design efficient \textit{sensing subspace codes} that estimate the DoA with good resolution. Specifically, the Bose-Chowla sensing subspace code provides near optimal code design for unique DoA estimation with tight theoretical upper bound on the error performance. However, the currently known decoder for these codes, to estimate the DoA, is a traditional \textit{Maximum-a-Posterior (MAP) decoder} with complexity that is cubic with the number of antennas. In this work, we propose novel efficient decoding algorithms for sensing subspace codes, that reduce the complexity down to quadratic while providing new knobs to tune in order to tradeoff complexity with error performance. The decoders are further evaluated for their performance via Monte Carlo simulations for a range of SNRs demonstrating promising performance that smoothly approaches the MAP performance as the complexity grows from quadratic to cubic in the number of antennas.

This ...

This paper was accepted for presentation at the 59th Annual Asilomar Conference on Signals, Systems, and Computers

Spatial Signal Focusing and Noise Suppression for Direction-of-Arrival Estimation in Large-Aperture 2D Arrays under Demanding Conditions 2025-10-13
Show

Direction-of-Arrival (DOA) estimation in sensor arrays faces limitations under demanding conditions, including low signal-to-noise ratio, single-snapshot scenarios, coherent sources, and unknown source counts. Conventional beamforming suffers from sidelobe interference, adaptive methods (e.g., MVDR) and subspace algorithms (e.g., MUSIC) degrade with limited snapshots or coherent signals, while sparse-recovery approaches (e.g., L1-SVD) incur high computational complexity for large arrays. In this article, we construct the concept of the optimal spatial filter to solve the DOA estimation problem under demanding conditions by utilizing the sparsity of spatial signals. By utilizing the concept of the optimal spatial filter, we have transformed the DOA estimation problem into a solution problem for the optimal spatial filter. We propose the Spatial Signal Focusing and Noise Suppression (SSFNS) algorithm, which is a novel DOA estimation framework grounded in the theoretical existence of an optimal spatial filter, to solve for the optimal spatial filter and obtain DOA. Through experiments, it was found that the proposed algorithm is suitable for large aperture two-dimensional arrays and experiments have shown that our proposed algorithm performs better than other algorithms in scenarios with few snapshots or even a single snapshot, low signal-to-noise ratio, coherent signals, and unknown signal numbers in two-dimensional large aperture arrays.

Joint DOA and Attitude Sensing Based on Tri-Polarized Continuous Aperture Array 2025-10-02
Show

This paper investigates joint direction-of-arrival (DOA) and attitude sensing using tri-polarized continuous aperture arrays (CAPAs). By employing electromagnetic (EM) information theory, the spatially continuous received signals in tri-polarized CAPA are modeled, thereby enabling accurate DOA and attitude estimation. To facilitate subspace decomposition for continuous operators, an equivalent continuous-discrete transformation technique is developed. Moreover, both self- and cross-covariances of tri-polarized signals are exploited to construct a tri-polarized spectrum, significantly enhancing DOA estimation performance. Theoretical analyses reveal that the identifiability of attitude information fundamentally depends on the availability of prior target snapshots. Accordingly, two attitude estimation algorithms are proposed: one capable of estimating partial attitude information without prior knowledge, and the other achieving full attitude estimation when such knowledge is available. Numerical results demonstrate the feasibility and superiority of the proposed framework.

13 pages, 10 figures
Direction of Arrival Estimation: A Tutorial Survey of Classical and Modern Methods 2025-09-02
Show

Direction of arrival (DOA) estimation is a fundamental problem in array signal processing with applications spanning radar, sonar, wireless communications, and acoustic signal processing. This tutorial survey provides a comprehensive introduction to classical and modern DOA estimation methods, specifically designed for students and researchers new to the field. We focus on narrowband signal processing using uniform linear arrays, presenting step-by-step mathematical derivations with geometric intuition. The survey covers classical beamforming methods, subspace-based techniques (MUSIC, ESPRIT), maximum likelihood approaches, and sparse signal processing methods. Each method is accompanied by Python implementations available in an open-source repository, enabling reproducible research and hands-on learning. Through systematic performance comparisons across various scenarios, we provide practical guidelines for method selection and parameter tuning. This work aims to bridge the gap between theoretical foundations and practical implementation, making DOA estimation accessible to beginners while serving as a comprehensive reference for the field. See https://github.com/AmgadSalama/DOA for detail implementation of the methods.

DOA S...

DOA Survey, 44 pages, Not published yet

Near Field Localization via AI-Aided Subspace Methods 2025-06-27
Show

The increasing demands for high-throughput and energy-efficient wireless communications are driving the adoption of extremely large antennas operating at high-frequency bands. In these regimes, multiple users will reside in the radiative near-field, and accurate localization becomes essential. Unlike conventional far-field systems that rely solely on DOA estimation, near-field localization exploits spherical wavefront propagation to recover both DOA and range information. While subspace-based methods, such as MUSIC and its extensions, offer high resolution and interpretability for near-field localization, their performance is significantly impacted by model assumptions, including non-coherent sources, well-calibrated arrays, and a sufficient number of snapshots. To address these limitations, this work proposes AI-aided subspace methods for near-field localization that enhance robustness to real-world challenges. Specifically, we introduce NF-SubspaceNet, a deep learning-augmented 2D MUSIC algorithm that learns a surrogate covariance matrix to improve localization under challenging conditions, and DCD-MUSIC, a cascaded AI-aided approach that decouples angle and range estimation to reduce computational complexity. We further develop a novel model-order-aware training method to accurately estimate the number of sources, that is combined with casting of near field subspace methods as AI models for learning. Extensive simulations demonstrate that the proposed methods outperform classical and existing deep-learning-based localization techniques, providing robust near-field localization even under coherent sources, miscalibrations, and few snapshots.

Under...

Under review for publication in the IEEE

Mainlobe Jamming Suppression Using MIMO-STCA Radar 2025-05-14
Show

Radar jamming suppression, particularly against mainlobe jamming, has become a critical focus in modern radar systems. This article investigates advanced mainlobe jamming suppression techniques utilizing a novel multiple-input multiple-output space-time coding array (MIMO-STCA) radar. Extending the capabilities of traditional MIMO radar, the MIMO-STCA framework introduces additional degrees of freedom (DoFs) in the range domain through the utilization of transmit time delays, offering enhanced resilience against interference. One of the key challenges in mainlobe jamming scenarios is the difficulty in obtaining interference-plus-noise samples that are free from target signal contamination. To address this, the study introduces a cumulative sampling-based non-homogeneous sample selection (CS-NHSS) algorithm to remove target-contaminated samples, ensuring accurate interference-plus-noise covariance matrix estimation and effective noise subspace separation. Building on this, the subsequent step is to apply the proposed noise subspace-based jamming mitigation (NSJM) algorithm, which leverages the orthogonality between noise and jamming subspace for effective jamming mitigation. However, NSJM performance can degrade due to spatial frequency mismatches caused by DoA or range quantization errors. To overcome this limitation, the study further proposes the robust jamming mitigation via noise subspace (RJNS) algorithm, incorporating adaptive beampattern control to achieve a flat-top mainlobe and broadened nulls, enhancing both anti-jamming effectiveness and robustness under non-ideal conditions. Simulation results verify the effectiveness of the proposed algorithms. Significant improvements in mainlobe jamming suppression are demonstrated through transmit-receive beampattern analysis and enhanced signal-to-interference-plus-noise ratio (SINR) curve.

A Comparative Study of Invariance-Aware Loss Functions for Deep Learning-based Gridless Direction-of-Arrival Estimation 2025-03-16
Show

Covariance matrix reconstruction has been the most widely used guiding objective in gridless direction-of-arrival (DoA) estimation for sparse linear arrays. Many semidefinite programming (SDP)-based methods fall under this category. Although deep learning-based approaches enable the construction of more sophisticated objective functions, most methods still rely on covariance matrix reconstruction. In this paper, we propose new loss functions that are invariant to the scaling of the matrices and provide a comparative study of losses with varying degrees of invariance. The proposed loss functions are formulated based on the scale-invariant signal-to-distortion ratio between the target matrix and the Gram matrix of the prediction. Numerical results show that a scale-invariant loss outperforms its non-invariant counterpart but is inferior to the recently proposed subspace loss that is invariant to the change of basis. These results provide evidence that designing loss functions with greater degrees of invariance is advantageous in deep learning-based gridless DoA estimation.

5 pag...

5 pages. Accepted at ICASSP 2025

Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers 2025-01-13
Show

To estimate the direction of arrival (DOA) of multiple speakers, subspace-based prototype transfer function matching methods such as multiple signal classification (MUSIC) or relative transfer function (RTF) vector matching are commonly employed. In general, these methods require calibrated microphone arrays, which are characterized by a known array geometry or a set of known prototype transfer functions for several directions. In this paper, we consider a partially calibrated microphone array, composed of a calibrated binaural hearing aid and a (non-calibrated) external microphone at an unknown location with no available set of prototype transfer functions. We propose a procedure for completing sets of prototype transfer functions by exploiting the orthogonality of subspaces, allowing to apply matching-based DOA estimation methods with partially calibrated microphone arrays. For the MUSIC and RTF vector matching methods, experimental results for two speakers in noisy and reverberant environments clearly demonstrate that for all locations of the external microphone DOAs can be estimated more accurately with completed sets of prototype transfer functions than with incomplete sets. \c{opyright}20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Accep...

Accepted for ICASSP 2025

Low Complexity DoA-ToA Signature Estimation for Multi-Antenna Multi-Carrier Systems 2024-09-13
Show

Accurate direction of arrival (DoA) and time of arrival (ToA) estimation is an stringent requirement for several wireless systems like sonar, radar, communications, and dual-function radar communication (DFRC). Due to the use of high carrier frequency and bandwidth, most of these systems are designed with multiple antennae and subcarriers. Although the resolution is high in the large array regime, the DoA-ToA estimation accuracy of the practical on-grid estimation methods still suffers from estimation inaccuracy due to the spectral leakage effect. In this article, we propose DoA-ToA estimation methods for multi-antenna multi-carrier systems with an orthogonal frequency division multiplexing (OFDM) signal. In the first method, we apply discrete Fourier transform (DFT) based coarse signature estimation and propose a low complexity multistage fine-tuning for extreme enhancement in the estimation accuracy. The second method is based on compressed sensing, where we achieve the super-resolution by taking a 2D-overcomplete angle-delay dictionary than the actual number of antenna and subcarrier basis. Unlike the vectorized 1D-OMP method, we apply the low complexity 2D-OMP method on the matrix data model that makes the use of CS methods practical in the context of large array regimes. Through numerical simulations, we show that our proposed methods achieve the similar performance as that of the subspace-based 2D-MUSIC method with a significant reduction in computational complexity.

5 pag...

5 pages, 4 figures, 1 table

Direction of Arrival Estimation with Sparse Subarrays 2024-08-17
Show

This paper proposes design techniques for partially-calibrated sparse linear subarrays and algorithms to perform direction-of-arrival (DOA) estimation. First, we introduce array architectures that incorporate two distinct array categories, namely type-I and type-II arrays. The former breaks down a known sparse linear geometry into as many pieces as we need, and the latter employs each subarray such as it fits a preplanned sparse linear geometry. Moreover, we devise two Direction of Arrival (DOA) estimation algorithms that are suitable for partially-calibrated array scenarios within the coarray domain. The algorithms are capable of estimating a greater number of sources than the number of available physical sensors, while maintaining the hardware and computational complexity within practical limits for real-time implementation. To this end, we exploit the intersection of projections onto affine spaces by devising the Generalized Coarray Multiple Signal Classification (GCA-MUSIC) in conjunction with the estimation of a refined projection matrix related to the noise subspace, as proposed in the GCA root-MUSIC algorithm. An analysis is performed for the devised subarray configurations in terms of degrees of freedom, as well as the computation of the Cramèr-Rao Lower Bound for the utilized data model, in order to demonstrate the good performance of the proposed methods. Simulations assess the performance of the proposed design methods and algorithms against existing approaches.

15 pages, 8 figures
Analysis of Partially-Calibrated Sparse Subarrays for Direction Finding with Extended Degrees of Freedom 2024-08-06
Show

This paper investigates the problem of direction-of-arrival (DOA) estimation using multiple partially-calibrated sparse subarrays. In particular, we present the Generalized Coarray Multiple Signal Classification (GCA-MUSIC) DOA estimation algorithm to scenarios with partially-calibrated sparse subarrays. The proposed GCA-MUSIC algorithm exploits the difference coarray for each subarray, followed by a specific pseudo-spectrum merging rule that is based on the intersection of the signal subspaces associated to each subarray. This rule assumes that there is no a priori knowledge about the cross-covariance between subarrays. In that way, only the second-order statistics of each subarray are used to estimate the directions with increased degrees of freedom, i.e., the estimation procedure preserves the coarray Multiple Signal Classification and sparse arrays properties to estimate more sources than the number of physical sensors in each subarray. Numerical simulations show that the proposed GCA-MUSIC has better performance than other similar strategies.

6 pages, 5 figures
SubspaceNet: Deep Learning-Aided Subspace Methods for DoA Estimation 2024-07-11
Show

Direction of arrival (DoA) estimation is a fundamental task in array processing. A popular family of DoA estimation algorithms are subspace methods, which operate by dividing the measurements into distinct signal and noise subspaces. Subspace methods, such as Multiple Signal Classification (MUSIC) and Root-MUSIC, rely on several restrictive assumptions, including narrowband non-coherent sources and fully calibrated arrays, and their performance is considerably degraded when these do not hold. In this work we propose SubspaceNet; a data-driven DoA estimator which learns how to divide the observations into distinguishable subspaces. This is achieved by utilizing a dedicated deep neural network to learn the empirical autocorrelation of the input, by training it as part of the Root-MUSIC method, leveraging the inherent differentiability of this specific DoA estimator, while removing the need to provide a ground-truth decomposable autocorrelation matrix. Once trained, the resulting SubspaceNet serves as a universal surrogate covariance estimator that can be applied in combination with any subspace-based DoA estimation method, allowing its successful application in challenging setups. SubspaceNet is shown to enable various DoA estimation algorithms to cope with coherent sources, wideband signals, low SNR, array mismatches, and limited snapshots, while preserving the interpretability and the suitability of classic subspace methods.

Under...

Under review for publication in the IEEE

Subspace Coding for Spatial Sensing 2024-07-03
Show

A subspace code is defined as a collection of subspaces of an ambient vector space, where each information-encoding codeword is a subspace. This paper studies a class of spatial sensing problems, notably direction of arrival (DoA) estimation using multisensor arrays, from a novel subspace coding perspective. Specifically, we demonstrate how a canonical (passive) sensing model can be mapped into a subspace coding problem, with the sensing operation defining a unique structure for the subspace codewords. We introduce the concept of sensing subspace codes following this structure, and show how these codes can be controlled by judiciously designing the sensor array geometry. We further present a construction of sensing subspace codes leveraging a certain class of Golomb rulers that achieve near-optimal minimum codeword distance. These designs inspire novel noise-robust sparse array geometries achieving high angular resolution. We also prove that codes corresponding to conventional uniform linear arrays are suboptimal in this regard. This work is the first to establish connections between subspace coding and spatial sensing, with the aim of leveraging insights and methodologies in one field to tackle challenging problems in the other.

©2024...

©2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Gridless Parameter Estimation in Partly Calibrated Rectangular Arrays 2024-06-23
Show

Spatial frequency estimation from a mixture of noisy sinusoids finds applications in various fields. While subspace-based methods offer cost-effective super-resolution parameter estimation, they demand precise array calibration, posing challenges for large antennas. In contrast, sparsity-based approaches outperform subspace methods, especially in scenarios with limited snapshots or correlated sources. This study focuses on direction-of-arrival (DOA) estimation using a partly calibrated rectangular array with fully calibrated subarrays. A gridless sparse formulation leveraging shift invariances in the array is developed, yielding two competitive algorithms under the alternating direction method of multipliers (ADMM) and successive convex approximation frameworks, respectively. Numerical simulations show the superior error performance of our proposed method, particularly in highly correlated scenarios, compared to the conventional subspace-based methods. It is demonstrated that the proposed formulation can also be adopted in the fully calibrated case to improve the robustness of the subspace-based methods to the source correlation. Furthermore, we provide a generalization of the proposed method to a more challenging case where a part of the sensors is unobservable due to failures.

16 pa...

16 pages, 5 figures. This work has been submitted to the IEEE Transactions on Signal Processing for possible publication

Auto-Calibration and 2D-DOA Estimation in UCAs via an Integrated Wideband Dictionary 2024-04-26
Show

In this paper, we present a novel auto-calibration scheme for the joint estimation of the two-dimensional (2-D) direction-of-arrival (DOA) and the mutual coupling matrix (MCM) for a signal measured using uniform circular arrays. The method employs an integrated wideband dictionary to mitigate the detrimental effects of the discretization of the continuous parameter space over the considered azimuth and elevation angles. This leads to a reduction of the computational complexity and obtaining of more accurate DOA estimates. Given the more reliable DOA estimates, the method also allows for the estimation of more accurate mutual coupling coefficients. The method utilizes an integrated dictionary in order to iteratively refine the active parameter space, thereby reducing the required computational complexity without reducing the overall performance. The complexity is further reduced by employing only the dominant subspace of the measured signal. Furthermore, the proposed method does not require a constraint on the prior knowledge of the number of nonzero coupling coefficients nor suffer from ambiguity problems. Moreover, a simple formulation for 2-D non-numerical integration is presented. Simulation results show the effectiveness of the proposed method.

This ...

This is a completed version of a work which will be sent to 2024 Asilomar Conference on Signals, Systems, and Computers

Sparse Spatial Smoothing: Reduced Complexity and Improved Beamforming Gain via Sparse Sub-Arrays 2024-03-10
Show

This paper addresses the problem of single snapshot Direction-of-Arrival (DOA) estimation, which is of great importance in a wide-range of applications including automotive radar. A popular approach to achieving high angular resolution when only one temporal snapshot is available is via subspace methods using spatial smoothing. This involves leveraging spatial shift-invariance in the antenna array geometry, typically a uniform linear array (ULA), to rearrange the single snapshot measurement vector into a spatially smoothed matrix that reveals the signal subspace of interest. However, conventional approaches using spatially shifted ULA sub-arrays can lead to a prohibitively high computational complexity due to the large dimensions of the resulting spatially smoothed matrix. Hence, we propose to instead employ judiciously designed sparse sub-arrays, such as nested arrays, to reduce the computational complexity of spatial smoothing while retaining the aperture and identifiability of conventional ULA-based approaches. Interestingly, this idea also suggests a novel beamforming method which linearly combines multiple spatially smoothed matrices corresponding to different sets of shifts of the sparse (nested) sub-array. This so-called shift-domain beamforming method is demonstrated to boost the effective SNR, and thereby resolution, in a desired angular region of interest, enabling single snapshot low-complexity DOA estimation with identifiability guarantees.

©2024...

©2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

TransMUSIC: A Transformer-Aided Subspace Method for DOA Estimation with Low-Resolution ADCs 2024-01-04
Show

Direction of arrival (DOA) estimation employing low-resolution analog-to-digital convertors (ADCs) has emerged as a challenging and intriguing problem, particularly with the rise in popularity of large-scale arrays. The substantial quantization distortion complicates the extraction of signal and noise subspaces from the quantized data. To address this issue, this paper introduces a novel approach that leverages the Transformer model to aid the subspace estimation. In this model, multiple snapshots are processed in parallel, enabling the capture of global correlations that span them. The learned subspace empowers us to construct the MUSIC spectrum and perform gridless DOA estimation using a neural network-based peak finder. Additionally, the acquired subspace encodes the vital information of model order, allowing us to determine the exact number of sources. These integrated components form a unified algorithmic framework referred to as TransMUSIC. Numerical results demonstrate the superiority of the TransMUSIC algorithm, even when dealing with one-bit quantized data. The results highlight the potential of Transformer-based techniques in DOA estimation.

5 pages, 5 figures
Deep Learning-Aided Subspace-Based DOA Recovery for Sparse Arrays 2023-12-17
Show

Sparse arrays enable resolving more direction of arrivals (DoAs) than antenna elements using non-uniform arrays. This is typically achieved by reconstructing the covariance of a virtual large uniform linear array (ULA), which is then processed by subspace DoA estimators. However, these method assume that the signals are non-coherent and the array is calibrated; the latter often challenging to achieve in sparse arrays, where one cannot access the virtual array elements. In this work, we propose Sparse-SubspaceNet, which leverages deep learning to enable subspace-based DoA recovery from sparse miscallibrated arrays with coherent sources. Sparse- SubspaceNet utilizes a dedicated deep network to learn from data how to compute a surrogate virtual array covariance that is divisible into distinguishable subspaces. By doing so, we learn to cope with coherent sources and miscalibrated sparse arrays, while preserving the interpretability and the suitability of model-based subspace DoA estimators.

Proje...

Project is still under work

Speech

Title Date Abstract Comment
A Study of Binaural Deep Beamforming With Interpretable Beampatterns Guided by Time-Varying RTF 2025-11-13
Show

In this work, a deep beamforming framework for speech enhancement in dynamic acoustic environments is studied. The time-varying beamformer weights are estimated from the noisy multichannel signals by minimizing an SI-SDR loss. The estimation is guided by the continuously tracked relative transfer functions (RTFs) of the moving target speaker. The spatial behavior of the network is evaluated through both narrowband and wideband beampatterns under three settings: (i) oracle guidance using true RTFs, (ii) estimated RTFs obtained by a subspace tracking method, and (iii) without the RTF guidance. Results show that RTF-guided models produce smoother, spatially consistent beampatterns that accurately track the target's direction of arrival. In contrast, the model fails to maintain a clear spatial focus when guidance is absent. Using the estimated RTFs as guidance closely matches the oracle RTF behavior, confirming the effectiveness of the tracking scheme. The model also outputs a binaural signal to preserve the speaker's spatial cues, which promotes hearing aid and hearables applications.

5 pages, 6 figures
DOA Estimation with Lightweight Network on LLM-Aided Simulated Acoustic Scenes 2025-11-11
Show

Direction-of-Arrival (DOA) estimation is critical in spatial audio and acoustic signal processing, with wide-ranging applications in real-world. Most existing DOA models are trained on synthetic data by convolving clean speech with room impulse responses (RIRs), which limits their generalizability due to constrained acoustic diversity. In this paper, we revisit DOA estimation using a recently introduced dataset constructed with the assistance of large language models (LLMs), which provides more realistic and diverse spatial audio scenes. We benchmark several representative neural-based DOA methods on this dataset and propose LightDOA, a lightweight DOA estimation model based on depthwise separable convolutions, specifically designed for mutil-channel input in varying environments. Experimental results show that LightDOA achieves satisfactory accuracy and robustness across various acoustic scenes while maintaining low computational complexity. This study not only highlights the potential of spatial audio synthesized with the assistance of LLMs in advancing robust and efficient DOA estimation research, but also highlights LightDOA as efficient solution for resource-constrained applications.

Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers 2025-09-25
Show

We propose a novel mixture of experts framework for field-of-view enhancement in binaural signal matching. Our approach enables dynamic spatial audio rendering that adapts to continuous talker motion, allowing users to emphasize or suppress sounds from selected directions while preserving natural binaural cues. Unlike traditional methods that rely on explicit direction-of-arrival estimation or operate in the Ambisonics domain, our signal-dependent framework combines multiple binaural filters in an online manner using implicit localization. This allows for real-time tracking and enhancement of moving sound sources, supporting applications such as speech focus, noise reduction, and world-locked audio in augmented and virtual reality. The method is agnostic to array geometry offering a flexible solution for spatial audio capture and personalized playback in next-generation consumer audio devices.

5 pages, 3 figures
GAN-Based Multi-Microphone Spatial Target Speaker Extraction 2025-09-22
Show

Spatial target speaker extraction isolates a desired speaker's voice in multi-speaker environments using spatial information, such as the direction of arrival (DoA). Although recent deep neural network (DNN)-based discriminative methods have shown significant performance improvements, the potential of generative approaches, such as generative adversarial networks (GANs), remains largely unexplored for this problem. In this work, we demonstrate that a GAN can effectively leverage both noisy mixtures and spatial information to extract and generate the target speaker's speech. By conditioning the GAN on intermediate features of a discriminative spatial filtering model in addition to DoA, we enable steerable target extraction with high spatial resolution of 5 degrees, outperforming state-of-the-art discriminative methods in perceptual quality-based objective metrics.

Learning Robust Spatial Representations from Binaural Audio through Feature Distillation 2025-08-28
Show

Recently, deep representation learning has shown strong performance in multiple audio tasks. However, its use for learning spatial representations from multichannel audio is underexplored. We investigate the use of a pretraining stage based on feature distillation to learn a robust spatial representation of binaural speech without the need for data labels. In this framework, spatial features are computed from clean binaural speech samples to form prediction labels. These clean features are then predicted from corresponding augmented speech using a neural network. After pretraining, we throw away the spatial feature predictor and use the learned encoder weights to initialize a DoA estimation model which we fine-tune for DoA estimation. Our experiments demonstrate that the pretrained models show improved performance in noisy and reverberant environments after fine-tuning for direction-of-arrival estimation, when compared to fully supervised models and classic signal processing methods.

To ap...

To appear in Proc. WASPAA 2025, October 12-15, 2025, Tahoe, US. Copyright (c) 2025 IEEE. 5 pages, 2 figures, 2 tables

Sound Source Localization for Human-Robot Interaction in Outdoor Environments 2025-07-29
Show

This paper presents a sound source localization strategy that relies on a microphone array embedded in an unmanned ground vehicle and an asynchronous close-talking microphone near the operator. A signal coarse alignment strategy is combined with a time-domain acoustic echo cancellation algorithm to estimate a time-frequency ideal ratio mask to isolate the target speech from interferences and environmental noise. This allows selective sound source localization, and provides the robot with the direction of arrival of sound from the active operator, which enables rich interaction in noisy scenarios. Results demonstrate an average angle error of 4 degrees and an accuracy within 5 degrees of 95% at a signal-to-noise ratio of 1dB, which is significantly superior to the state-of-the-art localization methods.

End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios 2025-07-28
Show

Target Speaker Extraction (TSE) plays a critical role in enhancing speech signals in noisy and multi-speaker environments. This paper presents an end-to-end TSE model that incorporates Direction of Arrival (DOA) and beamwidth embeddings to extract speech from a specified spatial region centered around the DOA. Our approach efficiently captures spatial and temporal features, enabling robust performance in highly complex scenarios with multiple simultaneous speakers. Experimental results demonstrate that the proposed model not only significantly enhances the target speech within the defined beamwidth but also effectively suppresses interference from other directions, producing a clear and isolated target voice. Furthermore, the model achieves remarkable improvements in downstream Automatic Speech Recognition (ASR) tasks, making it particularly suitable for real-world applications.

Accep...

Accepted by INTERSPEECH 2025

End-to-end multi-channel speaker extraction and binaural speech synthesis 2025-07-11
Show

Speech clarity and spatial audio immersion are the two most critical factors in enhancing remote conferencing experiences. Existing methods are often limited: either due to the lack of spatial information when using only one microphone, or because their performance is highly dependent on the accuracy of direction-of-arrival estimation when using microphone array. To overcome this issue, we introduce an end-to-end deep learning framework that has the capacity of mapping multi-channel noisy and reverberant signals to clean and spatialized binaural speech directly. This framework unifies source extraction, noise suppression, and binaural rendering into one network. In this framework, a novel magnitude-weighted interaural level difference loss function is proposed that aims to improve the accuracy of spatial rendering. Extensive evaluations show that our method outperforms established baselines in terms of both speech quality and spatial fidelity.

Multi-Channel Acoustic Echo Cancellation Based on Direction-of-Arrival Estimation 2025-06-06
Show

Acoustic echo cancellation (AEC) is an important speech signal processing technology that can remove echoes from microphone signals to enable natural-sounding full-duplex speech communication. While single-channel AEC is widely adopted, multi-channel AEC can leverage spatial cues afforded by multiple microphones to achieve better performance. Existing multi-channel AEC approaches typically combine beamforming with deep neural networks (DNN). This work proposes a two-stage algorithm that enhances multi-channel AEC by incorporating sound source directional cues. Specifically, a lightweight DNN is first trained to predict the sound source directions, and then the predicted directional information, multi-channel microphone signals, and single-channel far-end signal are jointly fed into an AEC network to estimate the near-end signal. Evaluation results show that the proposed algorithm outperforms baseline approaches and exhibits robust generalization across diverse acoustic environments.

Accep...

Accepted by Interspeech 2025

Spatial Audio Processing with Large Language Model on Wearable Devices 2025-04-25
Show

Integrating spatial context into large language models (LLMs) has the potential to revolutionize human-computer interaction, particularly in wearable devices. In this work, we present a novel system architecture that incorporates spatial speech understanding into LLMs, enabling contextually aware and adaptive applications for wearable technologies. Our approach leverages microstructure-based spatial sensing to extract precise Direction of Arrival (DoA) information using a monaural microphone. To address the lack of existing dataset for microstructure-assisted speech recordings, we synthetically create a dataset called OmniTalk by using the LibriSpeech dataset. This spatial information is fused with linguistic embeddings from OpenAI's Whisper model, allowing each modality to learn complementary contextual representations. The fused embeddings are aligned with the input space of LLaMA-3.2 3B model and fine-tuned with lightweight adaptation technique LoRA to optimize for on-device processing. SING supports spatially-aware automatic speech recognition (ASR), achieving a mean error of $25.72^\circ$-a substantial improvement compared to the 88.52$^\circ$ median error in existing work-with a word error rate (WER) of 5.3. SING also supports soundscaping, for example, inference how many people were talking and their directions, with up to 5 people and a median DoA error of 16$^\circ$. Our system demonstrates superior performance in spatial speech understanding while addressing the challenges of power efficiency, privacy, and hardware constraints, paving the way for advanced applications in augmented reality, accessibility, and immersive experiences.

On Ambisonic Source Separation with Spatially Informed Non-negative Tensor Factorization 2025-01-17
Show

This article presents a Non-negative Tensor Factorization based method for sound source separation from Ambisonic microphone signals. The proposed method enables the use of prior knowledge about the Directions-of-Arrival (DOAs) of the sources, incorporated through a constraint on the Spatial Covariance Matrix (SCM) within a Maximum a Posteriori (MAP) framework. Specifically, this article presents a detailed derivation of four algorithms that are based on two types of cost functions, namely the squared Euclidean distance and the Itakura-Saito divergence, which are then combined with two prior probability distributions on the SCM, that is the Wishart and the Inverse Wishart. The experimental evaluation of the baseline Maximum Likelihood (ML) and the proposed MAP methods is primarily based on first-order Ambisonic recordings, using four different source signal datasets, three with musical pieces and one containing speech utterances. We consider under-determined, determined, as well as over-determined scenarios by separating two, four and six sound sources, respectively. Furthermore, we evaluate the proposed algorithms for different spherical harmonic orders and at different reverberation time levels, as well as in non-ideal prior knowledge conditions, for increasingly more corrupted DOAs. Overall, in comparison with beamforming and a state-of-the-art separation technique, as well as the baseline ML methods, the proposed MAP approach offers superior separation performance in a variety of scenarios, as shown by the analysis of the experimental evaluation results, in terms of the standard objective separation measures, such as the SDR, ISR, SIR and SAR.

Robust Target Speaker Direction of Arrival Estimation 2024-12-25
Show

In multi-speaker environments the direction of arrival (DOA) of a target speaker is key for improving speech clarity and extracting target speaker's voice. However, traditional DOA estimation methods often struggle in the presence of noise, reverberation, and particularly when competing speakers are present. To address these challenges, we propose RTS-DOA, a robust real-time DOA estimation system. This system innovatively uses the registered speech of the target speaker as a reference and leverages full-band and sub-band spectral information from a microphone array to estimate the DOA of the target speaker's voice. Specifically, the system comprises a speech enhancement module for initially improving speech quality, a spatial module for learning spatial information, and a speaker module for extracting voiceprint features. Experimental results on the LibriSpeech dataset demonstrate that our RTS-DOA system effectively tackles multi-speaker scenarios and established new optimal benchmarks.

HRTF Estimation using a Score-based Prior 2024-10-02
Show

We present a head-related transfer function (HRTF) estimation method which relies on a data-driven prior given by a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. The impulse response of the room is estimated along with the HRTF by optimizing a parametric model of reverberation based on the statistical behaviour of room acoustics. The posterior distribution of HRTF given the reverberant measurement and excitation signal is modelled using the score-based HRTF prior and a log-likelihood approximation. We show that the resulting method outperforms several baselines, including an oracle recommender system that assigns the optimal HRTF in our training set based on the smallest distance to the true HRTF at the given direction of arrival. In particular, we show that the diffusion prior can account for the large variability of high-frequency content in HRTFs.

Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting 2024-09-22
Show

Direction-of-arrival estimation of multiple speakers in a room is an important task for a wide range of applications. In particular, challenging environments with moving speakers, reverberation and noise, lead to significant performance degradation for current methods. With the aim of better understanding factors affecting performance and improving current methods, in this paper multi-speaker direction-of-arrival (DOA) estimation is investigated using a modified version of the local space domain distance (LSDD) algorithm in a noisy, dynamic and reverberant environment employing a wearable microphone array. This study utilizes the recently published EasyCom speech dataset, recorded using a wearable microphone array mounted on eyeglasses. While the original LSDD algorithm demonstrates strong performance in static environments, its efficacy significantly diminishes in the dynamic settings of the EasyCom dataset. Several enhancements to the LSDD algorithm are developed following a comprehensive performance and system analysis, which enable improved DOA estimation under these challenging conditions. These improvements include incorporating a weighted reliability approach and introducing a new quality measure that reliably identifies the more accurate DOA estimates, thereby enhancing both the robustness and accuracy of the algorithm in challenging environments.

PlumberNet: Fixing interference leakage after GEV beamforming 2024-09-11
Show

Spatial filters can exploit deep-learning-based speech enhancement models to increase their reliability in scenarios with multiple speech sources scenarios. To further improve speech quality, it is common to perform postfiltering on the estimated target speech obtained with spatial filtering. In this work, Generalized Eigenvalue (GEV) beamforming is employed to provide the leakage estimation, along with the estimation of the target speech, to be later used for postfiltering. This improves the enhancement performance over a postfilter that uses the target speech and a reference microphone signal. This work also demonstrates that the spatial covariance matrices (SCMs) can be accurately estimated from the direction of arrival (DoA) of the target and a discriminative selection amongst the pairwise estimated time-frequency masks.

Steered Response Power-Based Direction-of-Arrival Estimation Exploiting an Auxiliary Microphone 2024-09-03
Show

Accurately estimating the direction-of-arrival (DOA) of a speech source using a compact microphone array (CMA) is often complicated by background noise and reverberation. A commonly used DOA estimation method is the steered response power with phase transform (SRP-PHAT) function, which has been shown to work reliably in moderate levels of noise and reverberation. Since for closely spaced microphones the spatial coherence of noise and reverberation may be high over an extended frequency range, this may negatively affect the SRP-PHAT spectra, resulting in DOA estimation errors. Assuming the availability of an auxiliary microphone at an unknown position which is spatially separated from the CMA, in this paper we propose to compute the SRP-PHAT spectra between the microphones of the CMA based on the SRP-PHAT spectra between the auxiliary microphone and the microphones of the CMA. For different levels of noise and reverberation, we show how far the auxiliary microphone needs to be spatially separated from the CMA for the auxiliary microphone-based SRP-PHAT spectra to be more reliable than the SRP-PHAT spectra without the auxiliary microphone. These findings are validated based on simulated microphone signals for several auxiliary microphone positions and two different noise and reverberation conditions.

5 pag...

5 pages, 3 figures, conference: EUSIPCO 2024 in Lyon

Direction of Arrival Correction through Speech Quality Feedback 2024-08-13
Show

Real-time speech enhancement has began to rise in performance, and the Demucs Denoiser model has recently demonstrated strong performance in multiple-speech-source scenarios when accompanied by a location-based speech target selection strategy. However, it has shown to be sensitive to errors in the direction-of-arrival (DOA) estimation. In this work, a DOA correction scheme is proposed that uses the real-time estimated speech quality of its enhanced output as the observed variable in an Adam-based optimization feedback loop to find the correct DOA. In spite of the high variability of the speech quality estimation, the proposed system is able to correct in real-time an error of up to 15$^o$ using only the speech quality as its guide. Several insights are provided for future versions of the proposed system to speed up convergence and further reduce the speech quality estimation variability.

Submi...

Submitted to Digital Signal Processing

All Neural Low-latency Directional Speech Extraction 2024-07-05
Show

We introduce a novel all neural model for low-latency directional speech extraction. The model uses direction of arrival (DOA) embeddings from a predefined spatial grid, which are transformed and fused into a recurrent neural network based speech extraction model. This process enables the model to effectively extract speech from a specified DOA. Unlike previous methods that relied on hand-crafted directional features, the proposed model trains DOA embeddings from scratch using speech enhancement loss, making it suitable for low-latency scenarios. Additionally, it operates at a high frame rate, taking in DOA with each input frame, which brings in the capability of quickly adapting to changing scene in highly dynamic real-world scenarios. We provide extensive evaluation to demonstrate the model's efficacy in directional speech extraction, robustness to DOA mismatch, and its capability to quickly adapt to abrupt changes in DOA.

Accep...

Accepted for publication at INTERSPEECH 2024

Exploring the Potential of Data-Driven Spatial Audio Enhancement Using a Single-Channel Model 2024-04-22
Show

One key aspect differentiating data-driven single- and multi-channel speech enhancement and dereverberation methods is that both the problem formulation and complexity of the solutions are considerably more challenging in the latter case. Additionally, with limited computational resources, it is cumbersome to train models that require the management of larger datasets or those with more complex designs. In this scenario, an unverified hypothesis that single-channel methods can be adapted to multi-channel scenarios simply by processing each channel independently holds significant implications, boosting compatibility between sound scene capture and system input-output formats, while also allowing modern research to focus on other challenging aspects, such as full-bandwidth audio enhancement, competitive noise suppression, and unsupervised learning. This study verifies this hypothesis by comparing the enhancement promoted by a basic single-channel speech enhancement and dereverberation model with two other multi-channel models tailored to separate clean speech from noisy 3D mixes. A direction of arrival estimation model was used to objectively evaluate its capacity to preserve spatial information by comparing the output signals with ground-truth coordinate values. Consequently, a trade-off arises between preserving spatial information with a more straightforward single-channel solution at the cost of obtaining lower gains in intelligibility scores.

Acoustic

Title Date Abstract Comment
A framework for diffuseness evaluation using a tight-frame microphone array configuration 2026-02-04
Show

This work presents a unified framework for estimating both sound-field direction and diffuseness using practical microphone arrays with different spatial configurations. Building on covariance-based diffuseness models, we formulate a velocity-only covariance approach that enables consistent diffuseness evaluation across heterogeneous array geometries without requiring mode whitening or spherical-harmonic decomposition. Three array types -- an A-format array, a rigid-sphere array, and a newly proposed tight-frame array -- are modeled and compared through both simulations and measurement-based experiments. The results show that the tight-frame configuration achieves near-isotropic directional sampling and reproduces diffuseness characteristics comparable to those of higher-order spherical arrays, while maintaining a compact physical structure. We further examine the accuracy of direction-of-arrival estimation based on acoustic intensity within the same framework. These findings connect theoretical diffuseness analysis with implementable array designs and support the development of robust, broadband methods for spatial-sound-field characterization.

16 pa...

16 pages including 16 files: This version has been substantially revised in response to reviewers' comments, with clarified theoretical assumptions and extended comparative evaluations

SoundCompass: Navigating Target Sound Extraction With Effective Directional Clue Integration In Complex Acoustic Scenes 2026-01-27
Show

Recent advances in target sound extraction (TSE) utilize directional clues derived from direction of arrival (DoA), which represent an inherent spatial property of sound available in any acoustic scene. However, previous DoA-based methods rely on hand-crafted features or discrete encodings, which lose fine-grained spatial information and limit adaptability. We propose SoundCompass, an effective directional clue integration framework centered on a Spectral Pairwise INteraction (SPIN) module that captures cross-channel spatial correlations in the complex spectrogram domain to preserve full spatial information in multichannel signals. The input feature expressed in terms of spatial correlations is fused with a DoA clue represented as spherical harmonics (SH) encoding. The fusion is carried out across overlapping frequency subbands, inheriting the benefits reported in the previous band-split architectures. We also incorporate the iterative refinement strategy, chain-of-inference (CoI), in the TSE framework, which recursively fuses DoA with sound event activation estimated from the previous inference stage. Experiments demonstrate that SoundCompass, combining SPIN, SH embedding, and CoI, robustly extracts target sources across diverse signal classes and spatial configurations.

5 pag...

5 pages, 4 figures, accepted to ICASSP 2026

Vector Signal Reconstruction Sparse and Parametric Approach of direction of arrival Using Single Vector Hydrophone 2025-12-25
Show

This article discusses the application of single vector hydrophones in the field of underwater acoustic signal processing for Direction Of Arrival (DOA) estimation. Addressing the limitations of traditional DOA estimation methods in multi-source environments and under noise interference, this study introduces a Vector Signal Reconstruction Sparse and Parametric Approach (VSRSPA). This method involves reconstructing the signal model of a single vector hydrophone, converting its covariance matrix into a Toeplitz structure suitable for the Sparse and Parametric Approach (SPA) algorithm. The process then optimizes it using the SPA algorithm to achieve more accurate DOA estimation. Through detailed simulation analysis, this research has confirmed the performance of the proposed algorithm in single and dual-target DOA estimation scenarios, especially under various signal-to-noise ratio(SNR) conditions. The simulation results show that, compared to traditional DOA estimation methods, this algorithm has significant advantages in estimation accuracy and resolution, particularly in multi-source signals and low SNR environments. The contribution of this study lies in providing an effective new method for DOA estimation with single vector hydrophones in complex environments, introducing new research directions and solutions in the field of vector hydrophone signal processing.

The a...

The authors have determined that the simulation results presented are preliminary and insufficient. Further simulation work is required to validate the conclusions. The text also requires major linguistic improvements

A Study of Binaural Deep Beamforming With Interpretable Beampatterns Guided by Time-Varying RTF 2025-11-13
Show

In this work, a deep beamforming framework for speech enhancement in dynamic acoustic environments is studied. The time-varying beamformer weights are estimated from the noisy multichannel signals by minimizing an SI-SDR loss. The estimation is guided by the continuously tracked relative transfer functions (RTFs) of the moving target speaker. The spatial behavior of the network is evaluated through both narrowband and wideband beampatterns under three settings: (i) oracle guidance using true RTFs, (ii) estimated RTFs obtained by a subspace tracking method, and (iii) without the RTF guidance. Results show that RTF-guided models produce smoother, spatially consistent beampatterns that accurately track the target's direction of arrival. In contrast, the model fails to maintain a clear spatial focus when guidance is absent. Using the estimated RTFs as guidance closely matches the oracle RTF behavior, confirming the effectiveness of the tracking scheme. The model also outputs a binaural signal to preserve the speaker's spatial cues, which promotes hearing aid and hearables applications.

5 pages, 6 figures
DOA Estimation with Lightweight Network on LLM-Aided Simulated Acoustic Scenes 2025-11-11
Show

Direction-of-Arrival (DOA) estimation is critical in spatial audio and acoustic signal processing, with wide-ranging applications in real-world. Most existing DOA models are trained on synthetic data by convolving clean speech with room impulse responses (RIRs), which limits their generalizability due to constrained acoustic diversity. In this paper, we revisit DOA estimation using a recently introduced dataset constructed with the assistance of large language models (LLMs), which provides more realistic and diverse spatial audio scenes. We benchmark several representative neural-based DOA methods on this dataset and propose LightDOA, a lightweight DOA estimation model based on depthwise separable convolutions, specifically designed for mutil-channel input in varying environments. Experimental results show that LightDOA achieves satisfactory accuracy and robustness across various acoustic scenes while maintaining low computational complexity. This study not only highlights the potential of spatial audio synthesized with the assistance of LLMs in advancing robust and efficient DOA estimation research, but also highlights LightDOA as efficient solution for resource-constrained applications.

Consensus Tracking of an Underwater Vehicle Using Weighted Harmonic Mean Density 2025-11-05
Show

This paper addresses an underwater target tracking problem in which a large number of sonobuoy sensors are deployed on a surveillance region. The region is divided into several sub-regions, where a single tracker, capable of generating track is installed. Each sonobuoy can measure the direction of arrival of acoustic signals (known as bearing angles) and communicate the measurements with the local tracker. Further, each local tracker can communicate with all other trackers, where each of them can exchange their estimate and finally a consensus is reached. We propose a weighted harmonic mean density (HMD) based tracking to reach a consensus and provide a solution for the fusion of Gaussian densities. In this approach, optimal weights are assigned by minimizing the Kullback-Leibler divergence measure. Performance of the proposed method is measured using root mean square error, percentage of track divergence, and normalized estimation error squared. Simulation results demonstrate that the optimized HMD-based fusion outperforms existing fusion methods during a distributed tracking.

State Space and Self-Attention Collaborative Network with Feature Aggregation for DOA Estimation 2025-10-29
Show

Accurate direction-of-arrival (DOA) estimation for sound sources is challenging due to the continuous changes in acoustic characteristics across time and frequency. In such scenarios, accurate localization relies on the ability to aggregate relevant features and model temporal dependencies effectively. In time series modeling, achieving a balance between model performance and computational efficiency remains a significant challenge. To address this, we propose FA-Stateformer, a state space and self-attention collaborative network with feature aggregation. The proposed network first employs a feature aggregation module to enhance informative features across both temporal and spectral dimensions. This is followed by a lightweight Conformer architecture inspired by the squeeze-and-excitation mechanism, where the feedforward layers are compressed to reduce redundancy and parameter overhead. Additionally, a temporal shift mechanism is incorporated to expand the receptive field of convolutional layers while maintaining a compact kernel size. To further enhance sequence modeling capabilities, a bidirectional Mamba module is introduced, enabling efficient state-space-based representation of temporal dependencies in both forward and backward directions. The remaining self-attention layers are combined with the Mamba blocks, forming a collaborative modeling framework that achieves a balance between representation capacity and computational efficiency. Extensive experiments demonstrate that FA-Stateformer achieves superior performance and efficiency compared to conventional architectures.

Perceptual Compensation of Ambisonics Recordings for Reproduction in Room 2025-10-13
Show

Ambisonics is a method for capturing and rendering a sound field accurately, assuming that the acoustics of the playback room does not significantly influence the sound field. However, in practice, the acoustics of the playback room may lead to a noticeable degradation in sound quality. We propose a recording and rendering method based on Ambisonics that utilizes a perceptually-motivated approach to compensate for the reverberation of the playback room. The recorded direct and reverberant sound field components in the spherical harmonics (SHs) domain are spectrally and spatially compensated to preserve the relevant auditory cues including the direction of arrival of the direct sound, the spectral energy of the direct and reverberant sound components, and the Interaural Coherence (IC) across each auditory band. In contrast to the conventional Ambisonics, a flexible number of Ambisonics channels can be used for audio rendering. Listening test results show that the proposed method provides a perceptually accurate rendering of the originally recorded sound field, outperforming both conventional Ambisonics without compensation and even ideal Ambisonics rendering in a simulated anechoic room. Additionally, subjective evaluations of listeners seated at the center of the loudspeaker array demonstrate that the method remains robust to head rotation and minor displacements.

The m...

The manuscript was submitted to the JASA and is under review

OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models 2025-09-30
Show

Spatial reasoning is fundamental to auditory perception, yet current audio large language models (ALLMs) largely rely on unstructured binaural cues and single step inference. This limits both perceptual accuracy in direction and distance estimation and the capacity for interpretable reasoning. Recent work such as BAT demonstrates spatial QA with binaural audio, but its reliance on coarse categorical labels (left, right, up, down) and the absence of explicit geometric supervision constrain resolution and robustness. We introduce the $\textbf{Spatial-Acoustic Geometry Encoder (SAGE}$), a geometry-aware audio encoder that aligns binaural acoustic features with 3D spatial structure using panoramic depth images and room-impulse responses at training time, while requiring only audio at inference. Building on this representation, we present $\textbf{OWL}$, an ALLM that integrates $\textbf{SAGE}$ with a spatially grounded chain-of-thought to rationalize over direction-of-arrivals (DoA) and distance estimates. Through curriculum learning from perceptual QA to multi-step reasoning, $\textbf{OWL}$ supports o'clock-level azimuth and DoA estimation. To enable large-scale training and evaluation, we construct and release $\textbf{BiDepth}$, a dataset of over one million QA pairs combining binaural audio with panoramic depth images and room impulse responses across both in-room and out-of-room scenarios. Across two benchmark datasets, our new $\textbf{BiDepth}$ and the public SpatialSoundQA, $\textbf{OWL}$ reduces mean DoA error by $\textbf{11$^{\circ}$}$ through $\textbf{SAGE}$ and improves spatial reasoning QA accuracy by up to $\textbf{25}$% over BAT.

Adaptive Bayesian Beamforming for Imaging by Marginalizing the Speed of Sound 2025-09-15
Show

Imaging methods based on array signal processing often require a fixed propagation speed of the medium, or speed of sound (SoS) for methods based on acoustic signals. The resolution of the images formed using these methods is strongly affected by the assumed SoS, which, due to multipath, nonlinear propagation, and non-uniform mediums, is challenging at best to select. In this letter, we propose a Bayesian approach to marginalize the influence of the SoS on beamformers for imaging. We adapt Bayesian direction-of-arrival estimation to an imaging setting and integrate a popular minimum variance beamformer over the posterior of the SoS. To solve the Bayesian integral efficiently, we use numerical Gauss quadrature. We apply our beamforming approach to shallow water sonar imaging where multipath and nonlinear propagation is abundant. We compare against the minimum variance distortionless response (MVDR) beamformer and demonstrate that its Bayesian counterpart achieves improved range and azimuthal resolution while effectively suppressing multipath artifacts.

Direction of Arrival Estimation: A Tutorial Survey of Classical and Modern Methods 2025-09-02
Show

Direction of arrival (DOA) estimation is a fundamental problem in array signal processing with applications spanning radar, sonar, wireless communications, and acoustic signal processing. This tutorial survey provides a comprehensive introduction to classical and modern DOA estimation methods, specifically designed for students and researchers new to the field. We focus on narrowband signal processing using uniform linear arrays, presenting step-by-step mathematical derivations with geometric intuition. The survey covers classical beamforming methods, subspace-based techniques (MUSIC, ESPRIT), maximum likelihood approaches, and sparse signal processing methods. Each method is accompanied by Python implementations available in an open-source repository, enabling reproducible research and hands-on learning. Through systematic performance comparisons across various scenarios, we provide practical guidelines for method selection and parameter tuning. This work aims to bridge the gap between theoretical foundations and practical implementation, making DOA estimation accessible to beginners while serving as a comprehensive reference for the field. See https://github.com/AmgadSalama/DOA for detail implementation of the methods.

DOA S...

DOA Survey, 44 pages, Not published yet

MASSLOC: A Massive Sound Source Localization System based on Direction-of-Arrival Estimation 2025-08-16
Show

Acoustic indoor localization offers the potential for highly accurate position estimation while generally exhibiting low hardware requirements compared to Radio Frequency (RF)-based solutions. Furthermore, angular-based localization significantly reduces installation effort by minimizing the number of required fixed anchor nodes. In this contribution, we propose the so-called MASSLOC system, which leverages sparse two-dimensional array geometries to localize and identify a large number of concurrently active sources. Additionally, the use of complementary Zadoff-Chu sequences is introduced to enable efficient, beamforming-based source identification. These sequences provide a trade-off between favorable correlation properties and accurate, unsynchronized direction-of-arrival estimation by exhibiting a spectrally balanced waveform. The system is evaluated in both a controlled anechoic chamber and a highly reverberant lobby environment with a reverberation time of 1.6 s. In a laboratory setting, successful direction-of-arrival estimation and identification of up to 14 simultaneously emitting sources are demonstrated. Adopting a Perspective-n-Point (PnP) calibration approach, the system achieves a median three-dimensional localization error of 55.7 mm and a median angular error of 0.84 deg with dynamic source movement of up to 1.9 mps in the challenging reverberant environment. The multi-source capability is also demonstrated and evaluated in that environment with a total of three tags. These results indicate the scalability and robustness of the MASSLOC system, even under challenging acoustic conditions.

IEEE ...

IEEE Transactions on Instrumentation and Measurement

Beamformed 360° Sound Maps: U-Net-Driven Acoustic Source Segmentation and Localization 2025-08-01
Show

We introduce a U-net model for 360° acoustic source localization formulated as a spherical semantic segmentation task. Rather than regressing discrete direction-of-arrival (DoA) angles, our model segments beamformed audio maps (azimuth and elevation) into regions of active sound presence. Using delay-and-sum (DAS) beamforming on a custom 24-microphone array, we generate signals aligned with drone GPS telemetry to create binary supervision masks. A modified U-Net, trained on frequency-domain representations of these maps, learns to identify spatially distributed source regions while addressing class imbalance via the Tversky loss. Because the network operates on beamformed energy maps, the approach is inherently array-independent and can adapt to different microphone configurations without retraining from scratch. The segmentation outputs are post-processed by computing centroids over activated regions, enabling robust DoA estimates. Our dataset includes real-world open-field recordings of a DJI Air 3 drone, synchronized with 360° video and flight logs across multiple dates and locations. Experimental results show that U-net generalizes across environments, providing improved angular precision, offering a new paradigm for dense spatial audio understanding beyond traditional Sound Source Localization (SSL).

Sound Source Localization for Human-Robot Interaction in Outdoor Environments 2025-07-29
Show

This paper presents a sound source localization strategy that relies on a microphone array embedded in an unmanned ground vehicle and an asynchronous close-talking microphone near the operator. A signal coarse alignment strategy is combined with a time-domain acoustic echo cancellation algorithm to estimate a time-frequency ideal ratio mask to isolate the target speech from interferences and environmental noise. This allows selective sound source localization, and provides the robot with the direction of arrival of sound from the active operator, which enables rich interaction in noisy scenarios. Results demonstrate an average angle error of 4 degrees and an accuracy within 5 degrees of 95% at a signal-to-noise ratio of 1dB, which is significantly superior to the state-of-the-art localization methods.

Resnet-conformer network with shared weights and attention mechanism for sound event localization, detection, and distance estimation 2025-07-23
Show

This technical report outlines our approach to Task 3A of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024, focusing on Sound Event Localization and Detection (SELD). SELD provides valuable insights by estimating sound event localization and detection, aiding in various machine cognition tasks such as environmental inference, navigation, and other sound localization-related applications. This year's challenge evaluates models using either audio-only (Track A) or audiovisual (Track B) inputs on annotated recordings of real sound scenes. A notable change this year is the introduction of distance estimation, with evaluation metrics adjusted accordingly for a comprehensive assessment. Our submission is for Task A of the Challenge, which focuses on the audio-only track. Our approach utilizes log-mel spectrograms, intensity vectors, and employs multiple data augmentations. We proposed an EINV2-based [1] network architecture, achieving improved results: an F-score of 40.2%, Angular Error (DOA) of 17.7 degrees, and Relative Distance Error (RDE) of 0.32 on the test set of the Development Dataset [2 ,3].

This ...

This paper has been submitted as a technical report outlining our approach to Task 3A of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 and can be found in DCASE2024 technical reports

Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach 2025-07-08
Show

Acoustic mapping techniques have long been used in spatial audio processing for direction of arrival estimation (DoAE). Traditional beamforming methods for acoustic mapping, while interpretable, often rely on iterative solvers that can be computationally intensive and sensitive to acoustic variability. On the other hand, recent supervised deep learning approaches offer feedforward speed and robustness but require large labeled datasets and lack interpretability. Despite their strengths, both methods struggle to consistently generalize across diverse acoustic setups and array configurations, limiting their broader applicability. We introduce the Latent Acoustic Mapping (LAM) model, a self-supervised framework that bridges the interpretability of traditional methods with the adaptability and efficiency of deep learning methods. LAM generates high-resolution acoustic maps, adapts to varying acoustic conditions, and operates efficiently across different microphone arrays. We assess its robustness on DoAE using the LOCATA and STARSS benchmarks. LAM achieves comparable or superior localization performance to existing supervised methods. Additionally, we show that LAM's acoustic maps can serve as effective features for supervised models, further enhancing DoAE accuracy and underscoring its potential to advance adaptive, high-performance sound localization systems.

Multi-Channel Acoustic Echo Cancellation Based on Direction-of-Arrival Estimation 2025-06-06
Show

Acoustic echo cancellation (AEC) is an important speech signal processing technology that can remove echoes from microphone signals to enable natural-sounding full-duplex speech communication. While single-channel AEC is widely adopted, multi-channel AEC can leverage spatial cues afforded by multiple microphones to achieve better performance. Existing multi-channel AEC approaches typically combine beamforming with deep neural networks (DNN). This work proposes a two-stage algorithm that enhances multi-channel AEC by incorporating sound source directional cues. Specifically, a lightweight DNN is first trained to predict the sound source directions, and then the predicted directional information, multi-channel microphone signals, and single-channel far-end signal are jointly fed into an AEC network to estimate the near-end signal. Evaluation results show that the proposed algorithm outperforms baseline approaches and exhibits robust generalization across diverse acoustic environments.

Accep...

Accepted by Interspeech 2025

CST-former: Multidimensional Attention-based Transformer for Sound Event Localization and Detection in Real Scenes 2025-04-17
Show

Sound event localization and detection (SELD) is a task for the classification of sound events and the identification of direction of arrival (DoA) utilizing multichannel acoustic signals. For effective classification and localization, a channel-spectro-temporal transformer (CST-former) was suggested. CST-former employs multidimensional attention mechanisms across the spatial, spectral, and temporal domains to enlarge the model's capacity to learn the domain information essential for event detection and DoA estimation over time. In this work, we present an enhanced version of CST-former with multiscale unfolded local embedding (MSULE) developed to capture and aggregate domain information over multiple time-frequency scales. Also, we propose finetuning and post-processing techniques beneficial for conducting the SELD task over limited training datasets. In-depth ablation studies of the proposed architecture and detailed analysis on the proposed modules are carried out to validate the efficacy of multidimensional attentions on the SELD task. Empirical validation through experimentation on STARSS22 and STARSS23 datasets demonstrates the remarkable performance of CST-former and post-processing techniques without using external data.

12 pa...

12 pages, 10 figures, Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

Broadband

Title Date Abstract Comment
A framework for diffuseness evaluation using a tight-frame microphone array configuration 2026-02-04
Show

This work presents a unified framework for estimating both sound-field direction and diffuseness using practical microphone arrays with different spatial configurations. Building on covariance-based diffuseness models, we formulate a velocity-only covariance approach that enables consistent diffuseness evaluation across heterogeneous array geometries without requiring mode whitening or spherical-harmonic decomposition. Three array types -- an A-format array, a rigid-sphere array, and a newly proposed tight-frame array -- are modeled and compared through both simulations and measurement-based experiments. The results show that the tight-frame configuration achieves near-isotropic directional sampling and reproduces diffuseness characteristics comparable to those of higher-order spherical arrays, while maintaining a compact physical structure. We further examine the accuracy of direction-of-arrival estimation based on acoustic intensity within the same framework. These findings connect theoretical diffuseness analysis with implementable array designs and support the development of robust, broadband methods for spatial-sound-field characterization.

16 pa...

16 pages including 16 files: This version has been substantially revised in response to reviewers' comments, with clarified theoretical assumptions and extended comparative evaluations

Ambiguity-Free Broadband DOA Estimation Relying on Parameterized Time-Frequency Transform 2025-03-05
Show

An ambiguity-free direction-of-arrival (DOA) estimation scheme is proposed for sparse uniform linear arrays under low signal-to-noise ratios (SNRs) and non-stationary broadband signals. First, for achieving better DOA estimation performance at low SNRs while using non-stationary signals compared to the conventional frequency-difference (FD) paradigms, we propose parameterized time-frequency transform-based FD processing. Then, the unambiguous compressive FD beamforming is conceived to compensate the resolution loss induced by difference operation. Finally, we further derive a coarse-to-fine histogram statistics scheme to alleviate the perturbation in compressive FD beamforming with good DOA estimation accuracy. Simulation results demonstrate the superior performance of our proposed algorithm regarding robustness, resolution, and DOA estimation accuracy.

6 figures
Fully Bayesian Wideband Direction-of-Arrival Estimation and Detection via RJMCMC 2024-12-12
Show

We propose a fully Bayesian approach to wideband, or broadband, direction-of-arrival (DoA) estimation and signal detection. Unlike previous works in wideband DoA estimation and detection, where the signals were modeled in the time-frequency domain, we directly model the time-domain representation and treat the non-causal part of the source signal as latent variables. Furthermore, our Bayesian model allows for closed-form marginalization of the latent source signals by leveraging conjugacy. To further speed up computation, we exploit the sparse ``stripe matrix structure'' of the considered system, which stems from the circulant matrix representation of linear time-invariant (LTI) systems. This drastically reduces the time complexity of computing the likelihood from $\mathcal{O}(N^3 k^3)$ to $\mathcal{O}(N k^3)$, where $N$ is the number of samples received by the array and $k$ is the number of sources. These computational improvements allow for efficient posterior inference through reversible jump Markov chain Monte Carlo (RJMCMC). We use the non-reversible extension of RJMCMC (NRJMCMC), which often achieves lower autocorrelation and faster convergence than the conventional reversible variant. Detection, estimation, and reconstruction of the latent source signals can then all be performed in a fully Bayesian manner through the samples drawn using NRJMCMC. We evaluate the detection performance of the procedure by comparing against generalized likelihood ratio testing (GLRT) and information criteria.

Comparison of Frequency-Fusion Mechanisms for Binaural Direction-of-Arrival Estimation for Multiple Speakers 2024-01-15
Show

To estimate the direction of arrival (DOA) of multiple speakers with methods that use prototype transfer functions, frequency-dependent spatial spectra (SPS) are usually constructed. To make the DOA estimation robust, SPS from different frequencies can be combined. According to how the SPS are combined, frequency fusion mechanisms are categorized into narrowband, broadband, or speaker-grouped, where the latter mechanism requires a speaker-wise grouping of frequencies. For a binaural hearing aid setup, in this paper we propose an interaural time difference (ITD)-based speaker-grouped frequency fusion mechanism. By exploiting the DOA dependence of ITDs, frequencies can be grouped according to a common ITD and be used for DOA estimation of the respective speaker. We apply the proposed ITD-based speaker-grouped frequency fusion mechanism for different DOA estimation methods, namely the multiple signal classification, steered response power and a recently published method based on relative transfer function (RTF) vectors. In our experiments, we compare DOA estimation with different fusion mechanisms. For all considered DOA estimation methods, the proposed ITD-based speaker-grouped frequency fusion mechanism results in a higher DOA estimation accuracy compared with the narrowband and broadband fusion mechanisms.

Accep...

Accepted for ICASSP 2024

Gridless DOA Estimation with Multiple Frequencies 2023-02-06
Show

Direction-of-arrival (DOA) estimation is widely applied in acoustic source localization. A multi-frequency model is suitable for characterizing the broadband structure in acoustic signals. In this paper, the continuous (gridless) DOA estimation problem with multiple frequencies is considered. This problem is formulated as an atomic norm minimization (ANM) problem. The ANM problem is equivalent to a semi-definite program (SDP) which can be solved by an off-the-shelf SDP solver. The dual certificate condition is provided to certify the optimality of the SDP solution so that the sources can be localized by finding the roots of a polynomial. We also construct the dual polynomial to satisfy the dual certificate condition and show that such a construction exists when the source amplitude has a uniform magnitude. In multi-frequency ANM, spatial aliasing of DOAs at higher frequencies can cause challenges. We discuss this issue extensively and propose a robust solution to combat aliasing. Numerical results support our theoretical findings and demonstrate the effectiveness of the proposed method.

This ...

This work has been accepted by IEEE Transactions on Signal Processing

DA-MUSIC: Data-Driven DoA Estimation via Deep Augmented MUSIC Algorithm 2023-01-11
Show

Direction of arrival (DoA) estimation of multiple signals is pivotal in sensor array signal processing. A popular multi-signal DoA estimation method is the multiple signal classification (MUSIC) algorithm, which enables high-performance super-resolution DoA recovery while being highly applicable in practice. MUSIC is a model-based algorithm, relying on an accurate mathematical description of the relationship between the signals and the measurements and assumptions on the signals themselves (non-coherent, narrowband sources). As such, it is sensitive to model imperfections. In this work we propose to overcome these limitations of MUSIC by augmenting the algorithm with specifically designed neural architectures. Our proposed deep augmented MUSIC (DA-MUSIC) algorithm is thus a hybrid model-based/data-driven DoA estimator, which leverages data to improve performance and robustness while preserving the interpretable flow of the classic method. DA-MUSIC is shown to learn to overcome limitations of the purely model-based method, such as its inability to successfully localize coherent sources as well as estimate the number of coherent signal sources present. We further demonstrate the superior resolution of the DA-MUSIC algorithm in synthetic narrowband and broadband scenarios as well as with real-world data of DoA estimation from seismic signals.

Submitted to TVT
Wideband Modal Orthogonality: A New Approach for Broadband DOA Estimation 2020-06-12
Show

Wideband direction of arrival (DOA) estimation techniques for sensors array have been studied extensively in the literature. Nevertheless, needing prior information on the number and directions of sources or demanding heavy computational load makes most of these techniques less useful in practice. In this paper, a low complexity subspace-based framework for DOA estimation of broadband signals, named as wideband modal orthogonality (WIMO), is proposed and accordingly two DOA estimators are developed. First, a closed-form approximation of spatial-temporal covariance matrix (STCM) in the uniform spectrum case is presented. The eigenvectors of STCM associated with non-zero eigenvalues are modal components of the wideband source in a given bandwidth and direction. WIMO idea is to extract these eigenvectors at desired DOAs from the approximated STCM and test their orthogonality to estimated noise subspace. In the non-uniform spectrum case, WIMO idea can be applied by approximating STCM through numerical integration. Fortunately, STCM approximation and modal extraction can be performed offline. WIMO provides DOA estimation without the conventional prerequisites, such as spectral decomposition, focusing procedure and, a priori information on the number of sources and their DOAs. Several numerical examples are conducted to compare the WIMO performance with the state-of-the-art methods. Simulations demonstrate that the two proposed DOA estimators achieve superior performance in terms of probability of resolution and estimation error along with orders of magnitude runtime speedup.

Broadband Sparse Array Focusing Via Spatial Periodogram Averaging and Correlation Resampling 2019-12-24
Show

This paper proposes two coherent broadband focusing algorithms for spatial correlation estimation using sparse linear arrays. Both algorithms decompose the time-domain array data into disjoint frequency bands through discrete Fourier transform or filter banks to obtain broadband frequency-domain snapshots. The periodogram averaging (AP) algorithm starts in the frequency domain by estimating the broadband spatial periodograms for all bands and then averaging them to reinforce the sources' spatial spectral information. Taking inverse spatial Fourier transform of the combined spatial periodogram estimates the focused spatial correlations. Alternatively, the spatial correlation resampling (SCR) algorithm directly computes the spatial correlations for each band and then rescales the spatial sampling rate to align at a focused frequency. The resampled spatial correlations from all frequency bands are then averaged to estimate the focused spatial correlations. The spatial correlations estimated from the AP or SCR algorithms populate the diagonals of a Hermitian Toeplitz augmented covariance matrix (ACM). The focused ACM is the input of a new minimum description length (MDL) based criteria, termed MDL-gap, for source enumeration and the standard narrowband MUSIC algorithm for DOA estimation. Numerical simulations show that both the AP and SCR algorithms improve source enumeration and DOA estimation performances over the incoherent subspace focusing algorithm in snapshot limited scenarios.

11 pages, 8 figures
Broadband DOA estimation using Convolutional neural networks trained with noise signals 2017-12-12
Show

A convolution neural network (CNN) based classification method for broadband DOA estimation is proposed, where the phase component of the short-time Fourier transform coefficients of the received microphone signals are directly fed into the CNN and the features required for DOA estimation are learnt during training. Since only the phase component of the input is used, the CNN can be trained with synthesized noise signals, thereby making the preparation of the training data set easier compared to using speech signals. Through experimental evaluation, the ability of the proposed noise trained CNN framework to generalize to speech sources is demonstrated. In addition, the robustness of the system to noise, small perturbations in microphone positions, as well as its ability to adapt to different acoustic conditions is investigated using experiments with simulated and real data.

Publi...

Published in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2017

Convolutional Neural Networks for Passive Monitoring of a Shallow Water Environment using a Single Sensor 2016-12-12
Show

A cost effective approach to remote monitoring of protected areas such as marine reserves and restricted naval waters is to use passive sonar to detect, classify, localize, and track marine vessel activity (including small boats and autonomous underwater vehicles). Cepstral analysis of underwater acoustic data enables the time delay between the direct path arrival and the first multipath arrival to be measured, which in turn enables estimation of the instantaneous range of the source (a small boat). However, this conventional method is limited to ranges where the Lloyd's mirror effect (interference pattern formed between the direct and first multipath arrivals) is discernible. This paper proposes the use of convolutional neural networks (CNNs) for the joint detection and ranging of broadband acoustic noise sources such as marine vessels in conjunction with a data augmentation approach for improving network performance in varied signal-to-noise ratio (SNR) situations. Performance is compared with a conventional passive sonar ranging method for monitoring marine vessel activity using real data from a single hydrophone mounted above the sea floor. It is shown that CNNs operating on cepstrum data are able to detect the presence and estimate the range of transiting vessels at greater distances than the conventional method.

Final...

Final draft for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017. 5 pages, 4 figures

DOA

Title Date Abstract Comment
Spatial Angular Pseudo-Derivative Searching: A Single Snapshot Super-resolution Sparse DOA Scheme with Potential for Practical Application 2026-02-08
Show

Accurate, high-resolution, and real-time DOA estimation is a cornerstone of environmental perception in automotive radar systems. While sparse signal recovery techniques offer super-resolution and high-precision estimation, their prohibitive computational complexity remains a primary bottleneck for practical deployment. This paper proposes a sparse DOA estimation scheme specifically tailored for the stringent requirements of automotive radar such as limited computational resources, restricted array apertures, and a single snapshot. By introducing the concept of the spatial angular pseudo-derivative and incorporating this property as a constraint into a standard L0-norm minimization problem, we formulate an objective function that more faithfully characterizes the physical properties of the DOA problem. The associated solver, designated as the SAPD search algorithm, naturally transforms the high-dimensional optimization task into an efficient grid-search scheme. The SAPD algorithm circumvents high-order matrix inversions and computationally intensive iterations. We provide an analysis of the computational complexity and convergence properties of the proposed algorithm. Extensive numerical simulations demonstrate that the SAPD method achieves a superior balance of real-time efficiency, high precision, and super-resolution, making it highly suitable for next-generation automotive radar applications.

Beyond $λ/2$: Can Arbitrary EMVS Arrays Achieve Unambiguous NLOS Localization? 2026-02-07
Show

Conventional radar array design mandates interelement spacing not exceeding half a wavelength ($λ/2$) to avoid spatial ambiguity, fundamentally limiting array aperture and angular resolution. This paper addresses the fundamental question: Can arbitrary electromagnetic vector sensor (EMVS) arrays achieve unambiguous reconfigurable intelligent surface (RIS)-aided localization when element spacing exceeds $λ/2$? We provide an affirmative answer by exploiting the multi-component structure of EMVS measurements and developing a synergistic estimation and optimization framework for non-line-of-sight (NLOS) bistatic multiple input multiple output (MIMO) radar. A third-order parallel factor (PARAFAC) model is constructed from EMVS observations, enabling natural separation of spatial, polarimetric, and propagation effects via the trilinear alternating least squares (TALS) algorithm. A novel phase-disambiguation procedure leverages rotational invariance across the six electromagnetic components of EMVSs to resolve $2π$ phase wrapping in arbitrary array geometries, allowing unambiguous joint estimation of two-dimensional (2-D) direction of departure (DOD), two-dimensional direction of arrival (DOA), and polarization parameters with automatic pairing. To support localization in NLOS environments and enhance estimation robustness, a reconfigurable intelligent surface (RIS) is incorporated and its phase shifts are optimized via semidefinite programming (SDP) relaxation to maximize received signal power, improving signal-to-noise ratio (SNR) and further suppressing spatial ambiguities through iterative refinement.

Uncertainty-Weighted Multi-Task CNN for Joint DoA and Rain-Rate Estimation Under Rain-Induced Array Distortions 2026-02-02
Show

We investigate joint direction-of-arrival (DoA) and rain-rate estimation for a uniform linear array operating under rain-induced multiplicative distortions. Building on a wavefront fluctuation model whose spatial correlation is governed by the rain-rate, we derive an angle-dependent covariance formulation and use it to synthesize training data. DoA estimation is cast as a multi-label classification problem on a discretized angular grid, while rain-rate estimation is formulated as a multi-class classification task. We then propose a multi-task deep CNN with a shared feature extractor and two task-specific heads, trained using an uncertainty-weighted objective to automatically balance the two losses. Numerical results in a two-source scenario show that the proposed network achieves lower DoA RMSE than classical baselines and provides accurate rain-rate classification at moderate-to-high SNRs.

Location-Oriented Sound Event Localization and Detection with Spatial Mapping and Regression Localization 2026-01-30
Show

Sound Event Localization and Detection (SELD) combines the Sound Event Detection (SED) with the corresponding Direction Of Arrival (DOA). Recently, adopted event oriented multi-track methods affect the generality in polyphonic environments due to the limitation of the number of tracks. To enhance the generality in polyphonic environments, we propose Spatial Mapping and Regression Localization for SELD (SMRL-SELD). SMRL-SELD segments the 3D spatial space, mapping it to a 2D plane, and a new regression localization loss is proposed to help the results converge toward the location of the corresponding event. SMRL-SELD is location-oriented, allowing the model to learn event features based on orientation. Thus, the method enables the model to process polyphonic sounds regardless of the number of overlapping events. We conducted experiments on STARSS23 and STARSS22 datasets and our proposed SMRL-SELD outperforms the existing SELD methods in overall evaluation and polyphony environments.

accep...

accepted at ICME 2025

Robust Covariance-Based DoA Estimation under Weather-Induced Distortion 2026-01-27
Show

We investigate robust direction-of-arrival (DoA) estimation for sensor arrays operating in adverse weather conditions, where weather-induced distortions degrade estimation accuracy. Building on a physics-based $S$-matrix model established in prior work, we adopt a statistical characterization of random phase and amplitude distortions caused by multiple scattering in rain. Based on this model, we develop a measurement framework for uniform linear arrays (ULAs) that explicitly incorporates such distortions. To mitigate their impact, we exploit the Hermitian Toeplitz (HT) structure of the covariance matrix to reduce the number of parameters to be estimated. We then apply a generalized least squares (GLS) approach for calibration. Simulation results show that the proposed method effectively suppresses rain-induced distortions, improves DoA estimation accuracy, and enhances radar sensing performance in challenging weather conditions.

HYPERDOA: Robust and Efficient DoA Estimation using Hyperdimensional Computing 2026-01-27
Show

Direction of Arrival (DoA) estimation techniques face a critical trade-off, as classical methods often lack accuracy in challenging, low signal-to-noise ratio (SNR) conditions, while modern deep learning approaches are too energy-intensive and opaque for resource-constrained, safety-critical systems. We introduce HYPERDOA, a novel estimator leveraging Hyperdimensional Computing (HDC). The framework introduces two distinct feature extraction strategies -- Mean Spatial-Lag Autocorrelation and Spatial Smoothing -- for its HDC pipeline, and then reframes DoA estimation as a pattern recognition problem. This approach leverages HDC's inherent robustness to noise and its transparent algebraic operations to bypass the expensive matrix decompositions and "black-box" nature of classical and deep learning methods, respectively. Our evaluation demonstrates that HYPERDOA achieves ~35.39% higher accuracy than state-of-the-art methods in low-SNR, coherent-source scenarios. Crucially, it also consumes ~93% less energy than competing neural baselines on an embedded NVIDIA Jetson Xavier NX platform. This dual advantage in accuracy and efficiency establishes HYPERDOA as a robust and viable solution for mission-critical applications on edge devices.

3 fig...

3 figures, 5 pages. Paper accepted at ICASSP 2026. Authors' version posted for personal use and not for redistribution

LuSeeL: Language-queried Binaural Universal Sound Event Extraction and Localization 2026-01-27
Show

Most universal sound extraction algorithms focus on isolating a target sound event from single-channel audio mixtures. However, the real world is three-dimensional, and binaural audio, which mimics human hearing, can capture richer spatial information, including sound source location. This spatial context is crucial for understanding and modeling complex auditory scenes, as it inherently informs sound detection and extraction. In this work, we propose a language-driven universal sound extraction network that isolates text-described sound events from binaural mixtures by effectively leveraging the spatial cues present in binaural signals. Additionally, we jointly predict the direction of arrival (DoA) of the target sound using spatial features from the extraction network. This dual-task approach exploits complementary location information to improve extraction performance while enabling accurate DoA estimation. Experimental results on the in-the-wild AudioCaps dataset show that our proposed LuSeeL model significantly outperforms single-channel and uni-task baselines.

ICASSP 2026
SoundCompass: Navigating Target Sound Extraction With Effective Directional Clue Integration In Complex Acoustic Scenes 2026-01-27
Show

Recent advances in target sound extraction (TSE) utilize directional clues derived from direction of arrival (DoA), which represent an inherent spatial property of sound available in any acoustic scene. However, previous DoA-based methods rely on hand-crafted features or discrete encodings, which lose fine-grained spatial information and limit adaptability. We propose SoundCompass, an effective directional clue integration framework centered on a Spectral Pairwise INteraction (SPIN) module that captures cross-channel spatial correlations in the complex spectrogram domain to preserve full spatial information in multichannel signals. The input feature expressed in terms of spatial correlations is fused with a DoA clue represented as spherical harmonics (SH) encoding. The fusion is carried out across overlapping frequency subbands, inheriting the benefits reported in the previous band-split architectures. We also incorporate the iterative refinement strategy, chain-of-inference (CoI), in the TSE framework, which recursively fuses DoA with sound event activation estimated from the previous inference stage. Experiments demonstrate that SoundCompass, combining SPIN, SH embedding, and CoI, robustly extracts target sources across diverse signal classes and spatial configurations.

5 pag...

5 pages, 4 figures, accepted to ICASSP 2026

Analytic Incremental Learning For Sound Source Localization With Imbalance Rectification 2026-01-26
Show

Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-tailed direction-of-arrival (DoA) distributions, and inter-task imbalance induced by cross-task skews and overlaps. These often lead to catastrophic forgetting, significantly degrading the localization accuracy. To mitigate these issues, we propose a unified framework with two key innovations. Specifically, we design a GCC-PHAT-based data augmentation (GDA) method that leverages peak characteristics to alleviate intra-task distribution skews. We also propose an Analytic dynamic imbalance rectifier (ADIR) with task-adaption regularization, which enables analytic updates that adapt to inter-task dynamics. On the SSLR benchmark, our proposal achieves state-of-the-art (SoTA) results of 89.0% accuracy, 5.3° mean absolute error, and 1.6 backward transfer, demonstrating robustness to evolving imbalances without exemplar storage.

Accepted by ICASSP26
Joint single-shot ToA and DoA estimation for VAA-based BLE ranging with phase ambiguity: A deep learning-based approach 2026-01-21
Show

Conventional direction-of-arrival (DoA) estimation methods rely on multi-antenna arrays, which are costly to implement on size-constrained Bluetooth Low Energy (BLE) devices. Virtual antenna array (VAA) techniques enable DoA estimation with a single antenna, making angle estimation feasible on such devices. However, BLE only provides a single-shot two-way channel frequency response (CFR) with a binary phase ambiguity issue, which hinders the direct application of VAA. To address this challenge, we propose a unified model that combines VAA with BLE two-way CFR, and introduce a neural network based phase recovery framework that employs row / column predictors with a voting mechanism to resolve the ambiguity. The recovered one-way CFR then enables super resolution algorithms such as MUSIC for joint time of arrival (ToA) and DoA estimation. Simulation results demonstrate that the proposed method achieves superior performance under non-uniform VAAs, with mean square errors approaching the Cramer Rao bound at SNR $\geq$ 5 dB.

Direction-of-Arrival and Noise Covariance Matrix joint estimation for beamforming 2026-01-20
Show

We propose a joint estimation method for the Direction-of-Arrival (DoA) and the Noise Covariance Matrix (NCM) tailored for beamforming applications. Building upon an existing NCM framework, our approach simplifies the estimation procedure by deriving an quasi-linear solution, instead of the traditional exhaustive search. Additionally, we introduce a novel DoA estimation technique that operates across all frequency bins, improving robustness in reverberant environments. Simulation results demonstrate that our method outperforms classical techniques, such as MUSIC, in mid- to high-angle scenarios, achieving lower angular errors and superior signal enhancement through beamforming. The proposed framework was also fared against other techniques for signal enhancement, having better noise rejection and interference canceling capabilities. These improvements are validated using both theoretical and empirical performance metrics.

6G OFDM Communications with High Mobility Transceivers and Scatterers via Angle-Domain Processing and Deep Learning 2026-01-19
Show

High-mobility communications, which are crucial for next-generation wireless systems, cause the orthogonal frequency division multiplexing (OFDM) waveform to suffer from strong intercarrier interference (ICI) due to the Doppler effect. In this work, we propose a novel receiver architecture for OFDM that leverages the angular domain to separate multipaths. A block-type pilot is sent to estimate direction-of-arrivals (DoAs), propagation delays, and channel gains of the multipaths. Subsequently, a decision-directed (DD) approach is employed to estimate and iteratively refine the Dopplers. Two different approaches are investigated to provide initial Doppler estimates: an error vector magnitude (EVM)-based method and a deep learning (DL)-based method. Simulation results reveal that the DL-based approach allows for constant bit error rate (BER) performance up to the maximum 6G speed of 1000 km/h.

Accep...

Accepted for presentation at IEEE International Conference on Communications (ICC) 2026

Joint DOA and Non-circular Phase Estimation of Non-circular Signals for Antenna Arrays: Block Sparse Bayesian Learning Method 2026-01-14
Show

This letter proposes a block sparse Bayesian learning (BSBL) algorithm of non-circular (NC) signals for direction-of-arrival (DOA) estimation, which is suitable for arbitrary unknown NC phases. The block sparse NC signal representation model is constructed through a permutation strategy, capturing the available intra-block structure information to enhance recovery performance. After that, we create the sparse probability model and derive the cost function under BSBL framework. Finally, the fast marginal likelihood maximum (FMLM) algorithm is introduced, enabling the rapid implementation of signal recovery by the addition and removal of basis functions. Simulation results demonstrate the effectiveness and the superior performance of our proposed method.

Directional Selective Fixed-Filter Active Noise Control Based on a Convolutional Neural Network in Reverberant Environments 2026-01-11
Show

Selective fixed-filter active noise control (SFANC) is a novel approach capable of mitigating noise with varying frequency characteristics. It offers faster response and greater computational efficiency compared to traditional adaptive algorithms. However, spatial factors, particularly the influence of the noise source location, are often overlooked. Some existing studies have explored the impact of the direction-of-arrival (DoA) of the noise source on ANC performance, but they are mostly limited to free-field conditions and do not consider the more complex indoor reverberant environments. To address this gap, this paper proposes a learning-based directional SFANC method that incorporates the DoA of the noise source in reverberant environments. In this framework, multiple reference signals are processed by a convolutional neural network (CNN) to estimate the azimuth and elevation angles of the noise source, as well as to identify the most appropriate control filter for effective noise cancellation. Compared to traditional adaptive algorithms, the proposed approach achieves superior noise reduction with shorter response times, even in the presence of reverberations.

TransDOA: Calibrating Array Imperfections via Transformer-based Transfer Learning 2026-01-09
Show

In practical scenarios, processes such as sensor design, manufacturing, and installation will introduce certain errors. Furthermore, mutual interference occurs when the sensors receive signals. These defects in array systems are referred to as array imperfections, which can significantly degrade the performance of Direction of Arrival (DOA) estimation. In this study, we propose a deep-learning based transfer learning approach, which effectively mitigates the degradation of deep-learning based DOA estimation performance caused by array imperfections. In the proposed approach, we highlight three major contributions. First, we propose a Vision Transformer (ViT) based method for DOA estimation, which achieves excellent performance in scenarios with low signal-to-noise ratios (SNR) and limited snapshots. Second, we introduce a transfer learning framework that extends deep learning models from ideal simulation scenarios to complex real-world scenarios with array imperfections. By leveraging prior knowledge from ideal simulation data, the proposed transfer learning framework significantly improves deep learning-based DOA estimation performance in the presence of array imperfections, without the need for extensive real-world data. Finally, we incorporate visualization and evaluation metrics to assess the performance of DOA estimation algorithms, which allow for a more thorough evaluation of algorithms and further validate the proposed method. Our code can be accessed at https://github.com/zzb-nice/DOA_est_Master.

Sensing for Free: Learn to Localize More Sources than Antennas without Pilots 2026-01-08
Show

Integrated sensing and communication (ISAC) represents a key paradigm for future wireless networks. However, existing approaches require waveform modifications, dedicated pilots, or overhead that complicates standards integration. We propose sensing for free - performing multi-source localization without pilots by reusing uplink data symbols, making sensing occur during transmission and directly compatible with 3GPP 5G NR and 6G specifications. With ever-increasing devices in dense 6G networks, this approach is particularly compelling when combined with sparse arrays, which can localize more sources than uniform arrays via an enlarged virtual array. Existing pilot-free multi-source localization algorithms first reconstruct an extended covariance matrix and apply subspace methods, incurring cubic complexity and limited to second-order statistics. Performance degrades under non-Gaussian data symbols and few snapshots, and higher-order statistics remain unexploited. We address these challenges with an attention-only transformer that directly processes raw signal snapshots for grid-less end-to-end direction-of-arrival (DOA) estimation. The model efficiently captures higher-order statistics while being permutation-invariant and adaptive to varying snapshot counts. Our algorithm greatly outperforms state-of-the-art AI-based benchmarks with over 30x reduction in parameters and runtime, and enjoys excellent generalization under practical mismatches. Applied to multi-user MIMO beam training, our algorithm can localize uplink DOAs of multiple users during data transmission. Through angular reciprocity, estimated uplink DOAs prune downlink beam sweeping candidates and improve throughput via sensing-assisted beam management. This work shows how reusing existing data transmission for sensing can enhance both multi-source localization and beam management in 3GPP efforts towards 6G.

17 pa...

17 pages, 14 figures, 1 table. This paper was accepted by the IEEE Journal on Selected Areas in Communications (JSAC) on Jan. 5, 2026

The ECME Algorithm Using Factor Analysis for DOA Estimation in Nonuniform Noise 2026-01-03
Show

Factor analysis (FA) plays a critical role in psychometrics, econometrics, and statistics. Recently, maximum likelihood FA (MLFA) has been applied to direction of arrival (DOA) estimation in unknown nonuniform noise and a variety of iterative approaches have been developed. In particular, the Factor Analysis for Anisotropic Noise (FAAN) method proposed by Stoica and Babu has excellent convergence properties. In this article, the Expectation/Conditional Maximization Either (ECME) algorithm, an extension of the expectation-maximization algorithm, is designed again for MLFA by introducing new complete data, which can thus use two explicit formulas to sequentially update the estimates of parameters at each iteration and have excellent convergence properties. Theoretical analysis shows that the ECME algorithm has almost the same computational complexity at each iteration as the FAAN method. However, numerical results show that the ECME algorithm yields faster stable convergence and the convergence to the global optimum is easier. Importantly, MLFA is not the best choice for the subspace based DOA estimation in unknown nonuniform noise.

Compressive Toeplitz Covariance Estimation From Few-Bit Quantized Measurements With Applications to DOA Estimation 2025-12-27
Show

This paper addresses the problem of estimating the Hermitian Toeplitz covariance matrix under practical hardware constraints of sparse observations and coarse quantization. Within the triangular-dithered quantization framework, we propose an estimator called Toeplitz-projected sample covariance matrix (Q-TSCM) to compensate for the quantization-induced bias, together with its finite-bit counterpart termed the $2k$-bit Toeplitz-projected sample covariance matrix ($2k$-TSCM), obtained by truncating the pre-quantization observations. Under the complex Gaussian assumption, we derive non-asymptotic error bounds of the estimators that reveal a quadratic dependence on the quantization level and capture the effect of sparse sampling patterns through the so-called coverage coefficient. To further improve performance, we propose the quantized sparse and parametric approach (Q-SPA) based on a covariance-fitting criterion, which enforces additionally positive semidefiniteness at the cost of solving a semidefinite program. Numerical experiments are presented that corroborate our theoretical findings and demonstrate the effectiveness of the proposed estimators in the application to direction-of-arrival estimation.

About

Daily ArXiv Paper: Research Related to Direction of Arrival (DoA)

Topics

Resources

Stars

Watchers

Forks

Languages

  • Python 100.0%