ZTF DR

Installation

Clone the repo

git clone https://github.com/alercebroker/ztf_dr_downloader.git

Install the package in your system

pip install .

Usage

Downloading a Data Release of ZTF (>= DR5)

You need an instance in S3 for run this script. The resources that worked without a problem are a c5a.large instance with 150 GB of disk. As you increase the number of processes to run, you will need more disks.

Each process distributes the fields of the data release, each process performs:

Verify that the field file is in S3, if it is not there, the file is downloaded, if it is, continue with the next one.
Upload the file to S3.
Delete the file from disk.
Continue with the next file.

To run the code, follow the instructions below:

Locate the data release that you need download. i.e: https://irsa.ipac.caltech.edu/data/ZTF/lc_dr5/
Locate the checksum file. i.e: https://irsa.ipac.caltech.edu/data/ZTF/lc_dr5/checksums.md5
Execute for 5 parallel process (arg -n):

dr download-data-release https://irsa.ipac.caltech.edu/data/ZTF/lc_dr5/ \
    https://irsa.ipac.caltech.edu/data/ZTF/lc_dr5/checksums.md5 \
    s3://<your-bucket>/<data-release>/<etc> \
    -n 5

Wait calmly because there is a lot of data!

Remove light curve of DR (or get object's table)

If you want to get the Data Release metadata without the light curves (to do xmatch or another operation), you can get it using the following command (you must have the data stored somewhere e.g S3):

dr get-objects <your-bucket> <data-release>

Get features of data release

You can also obtain characteristics of the light curves from the Data Release (based on code for Sánchez-Sáez et al. 2020), This can be very expensive but can be executed on different machines at the same time with slurm (on work).

Run for only one field:

dr get-features <input-file> <output-file>

Or compute in your own code:

import pandas as pd

from ztf_dr.extractors import DataReleaseExtractor

input_file = "path_to_pallet_town"
output_file = "path_to_drink_a_beer"

extractor = DataReleaseExtractor()
zone = pd.read_parquet(input_file)
features = extractor.compute_features(zone)
features.to_parquet(output_file)

If you have access to a slurm cluster, run this command in your terminal:

sbatch --array [0-499]%500 compute_features.slurm <s3-bucket-raw-data> <s3-bucket-output-data>

This code distributes 500 jobs in the whole cluster, therefore it distributes all the files of the data release in these jobs. This code is in charge of calculating the features for objects that meet the following conditions:

points of the light curve with catflags = 0 and magerr <1
ndets> 20 in fid 1 and 2, ndets> 5 in fid 3

Integration with ZTF DR API

Each time you want to update the data in the API database (when ZTF launch a new data release), you must do the following procedure:

Launch a machine and install mongodb, the disk should be of a size similar to the total of the data release (e.g DR5 weighs ~ 3.5 TB and the disk of the machine is 3.4 TB).
In the same instance run:

   dr load_mongo <host> <db-name> <collection-name> <input-s3-bucket> --n-cores <number-of-processes> --batch-size <objects-per-batch>

NOTE: If you want drop all elements in database use the flag -d in the command.

The code filter objects and converts the light curve to binary and inserts a document of the structure into database:

{
    "_id" : NumberLong(1550215200000003),
    "filterid" : 2,
    "fieldid" : 1550,
    "rcid" : 57,
    "objra" : 35.6168823242188,
    "objdec" : 19.0562019348145,
    "nepochs" : 25,
    "hmjd" : { "$binary" : "avJjR2D1Y0eF/GNHawNkR3AJZEdsCmRHWzdkR0k5ZEdHOmRHTT1kR0w/ZEdNQGRHTUNkR01EZEdWRWRHZFFkRzRYZEcwWWRHYHJlR2ByZUdpc2VHaXNlR3Z7ZUdffGVHX3xlRw==", "$type" : "00" },
    "mag" : { "$binary" : "U/6EQc5ahUGrcYVBcG+FQfKUhUHSZYVBJliFQQBphUEcNIVB1AiFQRAqhUHOOoVBJA6FQUFahUEiMoVB9a2FQQ4UhUHWNoVBw0yFQQyYhUF2jIVBHCGFQWhchUHiQIVBWlyFQQ==", "$type" : "00" },
    "magerr" : { "$binary" : "U65nPEjEazyCz2w8TbVsPMdybjybRGw8dKVrPMppbDzXB2o8UCJoPA6WaTwEVGo8Ql1oPN69azxi8Wk8LKFvPBWfaDzWJmo82SFrPAiYbjwwDW48ODFpPOTWazx0mWo8O9ZrPA==", "$type" : "00" },
    "loc" : {
        "type" : "Point",
        "coordinates" : [ 
            -144.383117675781, 
            19.0562019348145
        ]
    }
}

Finally, the script create a spatial-index of the value loc and other indexes like by nepochs, filterid, fieldid, among others.

After that, in the AWS console:

Go to lambda.
Click on ztd-dr-api
Go to configuration after that click on Environment variables.
Change the oldest credentials to the newest credentials.

NOTE: If you did some change in the ztf_dr_api, you must update the lambda function.

Maintenance

In data folder we save some data of data releases (since DR5).
In data/DR<X>_by_field.csv we save total size and amount of files by field.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
.github/workflows		.github/workflows
cli		cli
runner		runner
tests/unit_tests		tests/unit_tests
ztf_dr		ztf_dr
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZTF DR

Installation

Usage

Downloading a Data Release of ZTF (>= DR5)

Remove light curve of DR (or get object's table)

Get features of data release

Integration with ZTF DR API

Maintenance

External documentation:

About

Releases

Packages

Contributors 3

Languages

alercebroker/ztf_dr

Folders and files

Latest commit

History

Repository files navigation

ZTF DR

Installation

Usage

Downloading a Data Release of ZTF (>= DR5)

Remove light curve of DR (or get object's table)

Get features of data release

Integration with ZTF DR API

Maintenance

External documentation:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages