Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: All JWST downloads are failing #360

Merged
merged 4 commits into from
Dec 17, 2024

Conversation

troyraen
Copy link
Contributor

@troyraen troyraen commented Dec 5, 2024

The JWST_get_spec function is failing to download any files. This is the cell that was throwing the FileNotFoundError mentioned in #336. Prior to this PR, it would hit the first error and then exit without processing the full sample. Currently, this PR just checks for failed downloads, prints a message, and then continues so that all targets in the sample are processed. Running this shows that every file it tries to download fails. So far, I don't know why, and thus don't know what to do about it.

To do:

  • Issue is at least partially fixed with no changes needed here, but need to make sure we can actually retrieve result. (So far, "no results found" in the archive for our targets.)
  • Update notebook runtime

@troyraen troyraen added bug Something isn't working use case: spectroscopy Spectroscopy use case labels Dec 5, 2024
Comment on lines 144 to 148
# [FIXME] Every one of these downloads is failing. What to do?
download_results = download_results[download_results["Status"] != "ERROR"]
if len(download_results) == 0:
print(f"ALL DOWNLOADS FAILED for {stab['label']}")
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before line 145 runs, download_results looks like this for the sample labeled 'COSMOS1':

Screenshot 2024-12-04 at 11 37 09 PM

For copy/paste convenience:

Local Path: ./data/JWST/mastDownload/JWST/jw02565-o301_s13239_nirspec_clear-prism/jw02565-o301_s13239_nirspec_clear-prism_x1d.fits
Status: ERROR
Message: HTTPError: 404 Client Error: Not Found for url: https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02565-o301_s13239_nirspec_clear-prism_x1d.fits
URL: https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02565-o301_s13239_nirspec_clear-prism_x1d.fits

I'm guessing that the other failure messages are similar but I haven't checked them yet.

@troyraen
Copy link
Contributor Author

troyraen commented Dec 5, 2024

@bsipocz @jkrick any thoughts here? My sense is that the issue is upstream somewhere either in astroquery.mast or MAST itself but I don't even know where to start looking.

@bsipocz
Copy link
Member

bsipocz commented Dec 5, 2024

cc @snbianco - do you have any insight for this? Is this a server side issue or we should upstream a bugreport to astroquery?

@troyraen
Copy link
Contributor Author

troyraen commented Dec 5, 2024

>>> import astroquery
>>> astroquery.__version__
'0.4.8.dev9474'

@troyraen
Copy link
Contributor Author

troyraen commented Dec 5, 2024

Code to reproduce:

import astropy.units as u
from astropy.coordinates import SkyCoord
from astroquery.mast import Observations

search_coords = SkyCoord(150.091, 2.2745833, unit=u.deg)
search_radius_arcsec = 0.5

query_results = Observations.query_criteria(
    coordinates=search_coords, radius=search_radius_arcsec * u.arcsec,
    dataproduct_type=["spectrum"], obs_collection=["JWST"], intentType="science",
    calib_level=[3, 4], instrument_name=['NIRSPEC/MSA', 'NIRSPEC/SLIT'],
    dataRights=['PUBLIC'])
data_products_list = Observations.get_product_list(query_results)
data_products_list_filter = Observations.filter_products(
    data_products_list, productType=["SCIENCE"], extension="fits",
    calib_level=[3, 4], productSubGroupDescription=["X1D"], dataRights=['PUBLIC'])
download_results = Observations.download_products(data_products_list_filter)
download_results

@snbianco
Copy link

snbianco commented Dec 6, 2024

cc @dr-rodriguez

Thank you for bringing this to our attention! I can reproduce the issue and did some investigation in the UI portals. I can find this dataset in the Portal UI, and I can see this particular file in the download manager. When I try to select the files in the directory jw02565-0301_s13239_nirspec_clear-prism to download, it completes, but the download folder is empty and I get the same error message in the manifest file. When I try to "Download Data Products" using the Actions column in the search results, I get two different JWST directories that do not include jw02565-0301_s13239_nirspec_clear-prism. I cannot find this dataset at all using the MAST Search UI.

Because the portals have a similar issue with downloading this product, I'm assuming that the problem is server side. I'll file a ticket with someone at MAST to look at this more closely.

@dr-rodriguez
Copy link

I did some brief investigation and some files no longer exist on disk. This includes jw02565-o301_s13239_nirspec_clear-prism. This is a result of JWST moving for a longer source name and doing some reprocessing, which can cause some source extractions to no longer exist.
We'll be aiming to do some cleanup over the next few days to fix this and some other problematic records.

@bsipocz
Copy link
Member

bsipocz commented Dec 6, 2024

While this is clearly an upstream problem, I wonder if we could be more robust in astroquery and not report the local file path but instead raise some kind of error if the download is unsuccessful? What do you think @snbianco ?

@snbianco
Copy link

snbianco commented Dec 6, 2024

I'm not sure if it should raise an error since that would stop other products that can be found from being downloaded. Maybe we have astroquery log a warning for any files that fail to download? We could also not populate the Local Path column in the manifest for any downloads that fail to try and avoid confusion.

@troyraen
Copy link
Contributor Author

We'll be aiming to do some cleanup over the next few days to fix this and some other problematic records.

Thanks, the behavior is better now. With the coords from above, query_results = Observations.query_criteria(...) now raises a NoResultsWarning and if I ignore it and keep going then Observations.get_product_list(query_results) raises an InvalidQueryError -- as expected in case of no results. After 4b70220, I think we're handling this case appropriately.

I still need to make sure we can get results back for some coords. So far, no luck.

@troyraen
Copy link
Contributor Author

@dr-rodriguez is the cleanup still in progress? I am unable to retrieve spectra for any target, so I'm trying to determine whether there is still a server-side issue or if we need to look closer at our code. (For example, I noticed that I can get results for TRAPPIST-1 by removing our restriction to 1D spectra -- productSubGroupDescription=["X1D"].)

@snbianco
Copy link

@troyraen @dr-rodriguez

It looks like there's something wrong with the calibration levels listed for the observations. When I remove that criteria, I get 27 results that list calib_level as -1. Since -1 doesn't have a listed meaning, I'm guessing it's indicative of missing data or an error somewhere. Still, temporarily removing that argument might help you to find the observations you're looking for.

@troyraen
Copy link
Contributor Author

Thanks, we'll try it.

@troyraen troyraen force-pushed the raen/bug/spectra_generator-mast_functions branch from 47bf680 to 6ac9aa5 Compare December 12, 2024 00:22
@troyraen
Copy link
Contributor Author

A web-based search showed that calibration-level 2 data is available for at least one of our targets, so I loosened the filter here to include it. Now we're running into a new error. Here is an example that starts the same as above but with different coords.

import astropy.units as u
import numpy as np
from astropy.coordinates import SkyCoord
from astroquery.mast import Observations

search_coords = SkyCoord(150.1024475, 2.2815559, unit=u.deg)  # COSMOS2
search_radius_arcsec = 0.5

query_results = Observations.query_criteria(
            coordinates=search_coords, radius=search_radius_arcsec * u.arcsec,
            dataproduct_type=["spectrum"], obs_collection=["JWST"], intentType="science",
            calib_level=[2, 3, 4], instrument_name=['NIRSPEC/MSA', 'NIRSPEC/SLIT'],
            dataRights=['PUBLIC'])
data_products_list = Observations.get_product_list(query_results)

# Our code expects that every row (jj) in data_products_list
# corresponds to a row in query_results.
# So the next line should return at least one index for any `jj`:
np.where(query_results["obsid"] == data_products_list["obsID"][jj])

# But it doesn't -- There are some values in data_products_list["obsID"]
# that are not in query_results["obsid"]:
sorted(set(query_results["obsid"]))
# Output:
# ['232881512', '232882983']
sorted(set(data_products_list["obsID"]))
# Output:
# ['107463356', '107463357', '107463358', '107463359', '107463360',
#     '107463361', '107463362', '107463363', '107463364', '107463365',
#     '107463366', '107463367', '107463368', '107463372', '140187486',
#     '140187487', '140187488', '140187489', '140187490', '140187491',
#     '140187492', '140187493', '140195110', '140195221', '140195347',
#     '140195410', '140195478', '140195509', '232881512', '232882983']

@snbianco Is is expected for Observations.get_product_list to return results for more "obsID" values than were requested (as "obsid")? Sorry to keep asking; I'm new to this code and confused about what I'm seeing.

@snbianco
Copy link

@troyraen

I'm happy to help! Yes, what you're seeing is expected. It's possible for products associated with an observation to belong to a different product group than the observation. For your purposes, you can try using the 'parent_obsid' column in the product table; that should return only the obsids that are present in the observations table.

@troyraen
Copy link
Contributor Author

Thank you @snbianco. We are getting some results back now.

@afaisst @bsipocz, this is ready for review. I made the following changes:

@troyraen troyraen marked this pull request as ready for review December 17, 2024 07:32
@troyraen troyraen requested review from bsipocz and afaisst December 17, 2024 07:32
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bsipocz
Copy link
Member

bsipocz commented Dec 17, 2024

OK, let's go ahead with this and see if the rendering still likes the notebook. Thank you Troy!

@bsipocz bsipocz merged commit bf3d5dc into main Dec 17, 2024
3 checks passed
@bsipocz bsipocz deleted the raen/bug/spectra_generator-mast_functions branch December 17, 2024 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working use case: spectroscopy Spectroscopy use case
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants