Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Universe File Conversion Notebook #46

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,358 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"top\"></a>\n",
"# Open Universe FITS to ASDF File Converter\n",
"***\n",
"## Learning Goal\n",
"By the end of this tutorial, you will:\n",
"\n",
"- Understand how to use the provided file converter to convert the FITS format of Open Universe data into ASDF files\n",
"- Have a better understanding of the benefits to using ASDF files versus FITS files"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Introduction\n",
"\n",
"This should probably descibe the Troxel data as a whole.\n",
"\n",
"This should definitely mention that most of the data has been converted, and provide the URI to those files.\n",
"\n",
"This should also mention that this is for transparency and in case new files are posted that users want to convert. Or that it is an example of how to convert FITS to ASDF should people want to convert other data sets.\n",
"\n",
"Currently does not work for coadd images as the WCS information is stored in a different format.\n",
"\n",
"The workflow for this notebook consists of:\n",
"* [Converting Data](#Converting-Data)\n",
" * [Using the Packaged Converter](#Using-the-Packaged_Converter)\n",
"* [Exploring Converted ASDF Files](#Explore-Converted-ASDF-Files)\n",
"* [Additional Resources](#Additional-Resources)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Imports\n",
"We have prepared an Open Universe FITS to ASDF Converter (OUFAC) and provided it in the `oufac.py` file. This module primarily contains a `FitsToAsdf` class that performs the conversion. We will also use the `asdf` package to explore the newly converted data after we convert a file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import s3fs\n",
"from oufac import FitsToAsdf\n",
"import asdf\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.colors import LogNorm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Open Universe data relies on parquet files to store the catalog files which require an additional package to read that is not natively on the science platform. Thus we perform a manual pip install:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install pyarrow"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Converting Data\n",
"\n",
"The first step to converting the data is to determine which file we want to convert. Here we specify the s3 bucket containing the simulated data, choose a band, and a healpix index and use them to construct a single detector image file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s3bucket = 's3://nasa-irsa-simulations/openuniverse2024/roman/preview/RomanWAS/images/simple_model'\n",
"band = 'F184'\n",
"hpix = '9111'\n",
"sensor = 2\n",
"fits_filename = f'Roman_WAS_simple_model_{band}_{hpix}_{sensor}.fits.gz'\n",
"s3fpath = s3bucket+f'/{band}/{hpix}/{fits_filename}'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we think we have the URI to the file we have, it is always good practice to ensure that that path exists using s3fs and the s3fs.ls command to list all files in the path."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fs = s3fs.S3FileSystem(anon=True)\n",
"fs.ls(s3fpath)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we receive a list with a single string URI inside, we are ready to proceed with the conversion. If you received an empty list or an error, please confirm the file path URI is correct."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Using the Packaged Converter\n",
"\n",
"Now that we have confirmed our URI points to the file we want, we are ready to use the packaged converter. \n",
"\n",
"**NOTE:** the following cell will take roughly 3.5 minutes to run. While it is running, please read on to learn more about the converter."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"asdf_filename = f'Roman_WAS_simple_model_{band}_{hpix}_{sensor}.fits.gz'\n",
"fa = FitsToAsdf(s3fpath)\n",
"fa.write(asdf_filename)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"While the converter is running, we should expect there to be two types of warnings that are thrown:\n",
"\n",
"The first is a warning that our Simple Imaging Polynomial (SIP) is not sufficient to meet a maximum residual at the current degree. This error is thrown while the converter is gathering the sources from the given catalogs and does not impact the simulated image at all. Rather it has a slight impact on the accuracy of the RA and DEC of the catalog sources within the detector, however the sources are still easy to correlate with the bright sources in the image. The resulting warning should resemble:\n",
"```\n",
" Maximum specified SIP approximation error: 5\n",
" - SIP degree: 1. Maximum residual: 0.0091165\n",
" * Maximum residual, double sampled grid: 0.0091165\n",
" * Final SIP degree: 1. Maximum residual: 0.0091165\n",
"\n",
"```\n",
"\n",
"The second has to do with ASDF versioning while the file is saving. The warning is alerting the user that some expected schema documentation does not match. The resulting warnings should resemble:\n",
"```\n",
" /opt/conda/envs/roman-cal/lib/python3.11/site-packages/erfa/core.py:133: ErfaWarning: ERFA function \"dtf2d\" yielded 1 of \"dubious year (Note 6)\"\n",
" warn(f'ERFA function \"{func_name}\" yielded {wmsg}', ErfaWarning)\n",
" /opt/conda/envs/roman-cal/lib/python3.11/site-packages/erfa/core.py:133: ErfaWarning: ERFA function \"d2dtf\" yielded 1 of \"dubious year (Note 5)\"\n",
" warn(f'ERFA function \"{func_name}\" yielded {wmsg}', ErfaWarning)\n",
"```\n",
"\n",
"Now that we are not worried about the warnings we should expect, we can explain some added functionality to the converter. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the Converted ASDF File"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have created the ASDF file, we can open it using `asdf.open()`. The `.info()` method then prints out the overall structure of the ASDF file making it easy to see the contents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"converted_file = asdf.open(asdf_filename)\n",
"converted_file.info(max_rows=40)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The scientifically useful information is stored within the `roman` dictionary which contains all the source catalogs in `catalogs`, all the metadata within `meta`, as well as the image data (`data`), image error (`err`), and data quality (`dq`). The `wcs` key contains a `gwcs.WCS` object that allows for easy access to the WCS information of the simulated data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"type(converted_file['roman']['wcs'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Additionally, each of the catalogs are stored within an `astropy.table.Table` which allows for a user to easily query them. The galaxy source catalog table is previewed in the below cell. We have truncated the table at 10 elements because the full tables can contain >10k rows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"converted_file['roman']['catalogs']['galaxies'][:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly, we plot the simulated image using `matplotlib` and a logarithmic color normalization to make the sources easier to view."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.imshow(converted_file['roman']['data'], norm=LogNorm())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.imshow(converted_file['roman']['data'][2500:3500, 750:1750], norm=LogNorm())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional Resources\n",
"\n",
"Point to:\n",
"- Troxel paper\n",
"- Open Universe documentation\n",
"- Open Universe notebooks\n",
"- "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Citations\n",
"Provide your reader with guidelines on how to cite open source software and other resources in their own published work.\n",
"\n",
"```\n",
"If you use `astropy` or `lightkurve` for published research, please cite the\n",
"authors. Follow these links for more information about citing `astropy` and\n",
"`lightkurve`:\n",
"\n",
"* [Citing `astropy`](https://www.astropy.org/acknowledging.html)\n",
"* [Citing `lightkurve`](http://docs.lightkurve.org/about/citing.html)\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## About this Notebook\n",
"\n",
"**Author(s):** Javier Sanchez, Will C. Schultz <br>\n",
"**Keyword(s):** Tutorial, FITS, ASDF, Open-Universe <br>\n",
"**Last Updated:** Sep 2024 <br>\n",
"**Next Review:** Sep 2024\n",
"***\n",
"[Top of Page](#top)\n",
"<img style=\"float: right;\" src=\"https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png\" alt=\"Space Telescope Logo\" width=\"200px\"/> "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading