Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: array must not contain infs or NaNs in seaborn.histplot(... kde=True) #3817

Open
petsuter opened this issue Jan 21, 2025 · 5 comments

Comments

@petsuter
Copy link

>>> import seaborn
>>> seaborn.__version__
'0.13.2'
>>> seaborn.histplot(x=range(5), weights=[0,0,3,0,0], kde=False)
<Axes: ylabel='Count'>
>>> seaborn.histplot(x=range(5), weights=[0,0,3,0,0], kde=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python311\Lib\site-packages\seaborn\distributions.py", line 1416, in histplot
    p.plot_univariate_histogram(
  File "C:\Python311\Lib\site-packages\seaborn\distributions.py", line 447, in plot_univariate_histogram
    densities = self._compute_univariate_density(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\seaborn\distributions.py", line 345, in _compute_univariate_density
    density, support = estimator(observations, weights=weights)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\seaborn\_statistics.py", line 193, in __call__
    return self._eval_univariate(x1, weights)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\seaborn\_statistics.py", line 154, in _eval_univariate
    kde = self._fit(x, weights)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\seaborn\_statistics.py", line 143, in _fit
    kde = gaussian_kde(fit_data, **fit_kws)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\scipy\stats\_kde.py", line 226, in __init__
    self.set_bandwidth(bw_method=bw_method)
  File "C:\Python311\Lib\site-packages\scipy\stats\_kde.py", line 574, in set_bandwidth
    self._compute_covariance()
  File "C:\Python311\Lib\site-packages\scipy\stats\_kde.py", line 586, in _compute_covariance
    self._data_cho_cov = linalg.cholesky(self._data_covariance,
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\scipy\linalg\_decomp_cholesky.py", line 101, in cholesky
    c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\scipy\linalg\_decomp_cholesky.py", line 18, in _cholesky
    a1 = asarray_chkfinite(a) if check_finite else asarray(a)
         ^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\numpy\lib\function_base.py", line 630, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs
@mwaskom
Copy link
Owner

mwaskom commented Jan 21, 2025

This is happening because your data are effectively singular, with only one nonzero weight.

@petsuter
Copy link
Author

petsuter commented Jan 21, 2025

Yes, but the data is what it is.... Is it unreasonable to expect seaborn to handle this (i.e. any data)?
Could seaborn log a warning and skip the kde in this case?

Should I use this:

def safe_kde_histplot(x,w):
    try:
        return seaborn.histplot(x=x, weights=w, kde=True)
    except:
        return seaborn.histplot(x=x, weights=w, kde=False)

Or what is the recommendation?

@mwaskom
Copy link
Owner

mwaskom commented Jan 21, 2025

It's possible it could be handled here too; not sure. Annoying that scipy doesn't raise a consistent error that we could handle.

@petsuter
Copy link
Author

Interesting.

Sounds like the intention is already to handle such cases, but it's not so easy to do reliably.
https://github.com/scipy/scipy/blob/v1.15.1/scipy/stats/_kde.py suggests maybe handling ValueError and LinAlgError would be sufficient?

The check seems to work as intended when scipy is not installed.

@mwaskom
Copy link
Owner

mwaskom commented Jan 21, 2025

Yeah, there's a long history of singular data being passed into kdeplot being very annoying 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants