Skip to content

Commit

Permalink
Merge pull request #68 from golmschenk/split_public_and_internal_stru…
Browse files Browse the repository at this point in the history
…cture

Split public and internal structure
  • Loading branch information
golmschenk authored May 17, 2024
2 parents b173b0f + 117d200 commit 70b5dec
Show file tree
Hide file tree
Showing 51 changed files with 769 additions and 735 deletions.
4 changes: 4 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@
templates_path = ["_templates"]
exclude_patterns = []
source_suffix = [".rst", ".md"]
autodoc_class_signature = 'separated'
autodoc_default_options = {
'special-members': None,
}

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
Expand Down
21 changes: 19 additions & 2 deletions docs/source/reference_index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,23 @@
# Reference

```{eval-rst}
.. automodule:: qusi.light_curve
:members:
.. autoclass:: qusi.data.LightCurve
:members: new
.. autoclass:: qusi.data.LightCurveCollection
:members: new
.. autoclass:: qusi.data.LightCurveDataset
:members: new
.. autoclass:: qusi.data.LightCurveObservationCollection
:members: new
.. autoclass:: qusi.data.FiniteStandardLightCurveDataset
:members: new
.. autoclass:: qusi.data.FiniteStandardLightCurveObservationDataset
:members: new
.. autoclass:: qusi.model.Hadryss
:members: new
.. autofunction:: qusi.session.get_device
.. autofunction:: qusi.session.infer_session
.. autofunction:: qusi.session.train_session
.. autoclass:: qusi.session.TrainHyperparameterConfiguration
:members: new
```
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def get_positive_train_paths():

This functions says to create a `Path` object for a directory at `data/spoc_transit_experiment/train/positives`. Then, it obtains all the files ending with the `.fits` extension. It puts that in a list and returns that list. In particular, `qusi` expects a function that takes no input parameters and outputs a list of `Path`s.

In our example code, we've split the data based on if it's train, validation, or test data and we've split the data based on if it's positive or negative data. And we provide a function for each of the 6 permutations of this, which is almost identical to what's above. You can see the above function and other 5 similar functions near the top of `examples/transit_dataset.py`.
In our example code, we've split the data based on if it's train, validation, or test data and we've split the data based on if it's positive or negative data. And we provide a function for each of the 6 permutations of this, which is almost identical to what's above. You can see the above function and other 5 similar functions near the top of `scripts/transit_dataset.py`.

`qusi` is flexible in how the paths are provided, and this construction of having a separate function for each type of data is certainly not the only way of approaching this. Depending on your task, another option might serve better. In another tutorial, we will explore a few example alternatives. However, to better understand those alternatives, it's first useful to see the rest of this dataset construction.

Expand All @@ -35,7 +35,7 @@ def load_times_and_fluxes_from_path(path):
return light_curve.times, light_curve.fluxes
```

This uses a builtin class in `qusi` that is designed for loading light curves from TESS mission FITS files. However, the important thing is that your function returns two comma separated values, which is a NumPy array of the times and a NumPy array of the fluxes of your light curve. And the function takes a single `Path` object as input. These `Path` objects will be one of the ones we returned from the functions in the previous section. But you can write any code you need to get from a `Path` to the two arrays that represent times and fluxes. For example, if your file is a simple CSV file, it would be easy to use Pandas to load the CSV file and extract the time column and the flux column as two arrays which are then returned at the end of the function. You will see the above function in `examples/transit_dataset.py`.
This uses a builtin class in `qusi` that is designed for loading light curves from TESS mission FITS files. However, the important thing is that your function returns two comma separated values, which is a NumPy array of the times and a NumPy array of the fluxes of your light curve. And the function takes a single `Path` object as input. These `Path` objects will be one of the ones we returned from the functions in the previous section. But you can write any code you need to get from a `Path` to the two arrays that represent times and fluxes. For example, if your file is a simple CSV file, it would be easy to use Pandas to load the CSV file and extract the time column and the flux column as two arrays which are then returned at the end of the function. You will see the above function in `scripts/transit_dataset.py`.

## Creating a function to provide a label for the data

Expand All @@ -49,44 +49,34 @@ def negative_label_function(path):
return 0
```

Note, `qusi` expects the label functions to take in a `Path` object as input, even if we don't end up using it. This is because, it allows for more flexible configurations. For example, in a different situation, the data might not be split into positive and negative directories, but instead, the label data might be contained within the user's data file itself. Also, in other cases, this label can also be something other than 0 and 1. The label is whatever the NN is attempting to predict for the input light curve. But for our binary classification case, 0 and 1 are what we want to use. Once again, you can see these functions in `examples/transit_dataset.py`.
Note, `qusi` expects the label functions to take in a `Path` object as input, even if we don't end up using it. This is because, it allows for more flexible configurations. For example, in a different situation, the data might not be split into positive and negative directories, but instead, the label data might be contained within the user's data file itself. Also, in other cases, this label can also be something other than 0 and 1. The label is whatever the NN is attempting to predict for the input light curve. But for our binary classification case, 0 and 1 are what we want to use. Once again, you can see these functions in `scripts/transit_dataset.py`.

## Creating a light curve collection

Now we're going to join the various functions we've just defined into `LightCurveObservationCollection`s. For the case of positive train light curves, this looks like:

```python
positive_train_light_curve_collection = LightCurveObservationCollection.new(
get_paths_function=get_positive_train_paths,
load_times_and_fluxes_from_path_function=load_times_and_fluxes_from_path,
load_label_from_path_function=positive_label_function)
positive_train_light_curve_collection = LightCurveObservationCollection.new()
```

This defines a collection of labeled light curves where `qusi` knows how to obtain the paths, how to load the times and fluxes of the light curves, and how to load the labels. This `LightCurveObservationCollection.new(...` function takes in the three pieces we just built earlier. Note that you pass in the functions themselves, not the output of the functions. So for the `get_paths_function` parameter, we pass `get_positive_train_paths`, not `get_positive_train_paths()` (notice the difference in parenthesis). `qusi` will call these functions internally. However, the above bit of code is not by itself in `examples/transit_dataset.py` as the rest of the code in this tutorial was. This is because `qusi` doesn't use this collection by itself. It uses it as part of a dataset. We will explain why there's this extra layer in a moment.
This defines a collection of labeled light curves where `qusi` knows how to obtain the paths, how to load the times and fluxes of the light curves, and how to load the labels. This `LightCurveObservationCollection.new(...` function takes in the three pieces we just built earlier. Note that you pass in the functions themselves, not the output of the functions. So for the `get_paths_function` parameter, we pass `get_positive_train_paths`, not `get_positive_train_paths()` (notice the difference in parenthesis). `qusi` will call these functions internally. However, the above bit of code is not by itself in `scripts/transit_dataset.py` as the rest of the code in this tutorial was. This is because `qusi` doesn't use this collection by itself. It uses it as part of a dataset. We will explain why there's this extra layer in a moment.

## Creating a dataset

Finally, we build the dataset `qusi` uses to train the network. First, we'll take a look and then unpack it:

```python
def get_transit_train_dataset():
positive_train_light_curve_collection = LightCurveObservationCollection.new(
get_paths_function=get_positive_train_paths,
load_times_and_fluxes_from_path_function=load_times_and_fluxes_from_path,
load_label_from_path_function=positive_label_function)
negative_train_light_curve_collection = LightCurveObservationCollection.new(
get_paths_function=get_negative_train_paths,
load_times_and_fluxes_from_path_function=load_times_and_fluxes_from_path,
load_label_from_path_function=negative_label_function)
train_light_curve_dataset = LightCurveDataset.new(
standard_light_curve_collections=[positive_train_light_curve_collection,
negative_train_light_curve_collection])
positive_train_light_curve_collection = LightCurveObservationCollection.new()
negative_train_light_curve_collection = LightCurveObservationCollection.new()
train_light_curve_dataset = LightCurveDataset.new(light_curve_collections=[positive_train_light_curve_collection,
negative_train_light_curve_collection])
return train_light_curve_dataset
```

This is the function which generates the training dataset we called in the {doc}`/tutorials/basic_transit_identification_with_prebuilt_components` tutorial. The parts of this function are as follows. First, we create the `positive_train_light_curve_collection`. This is exactly what we just saw in the previous section. Next, we create a `negative_train_light_curve_collection`. This is almost identical to its positive counterpart, except now we pass the `get_negative_train_paths` and `negative_label_function` instead of the positive versions. Then there is the `train_light_curve_dataset = LightCurveDataset.new(` line. This creates a `qusi` dataset built from these two collections. The reason the collections are separate is that `LightCurveDataset` has several mechanisms working under-the-hood. Notably for this case, `LightCurveDataset` will balance the two light curve collections. We know of a lot more light curves that don't have planet transits in them than we do light curves that do have planet transits. In the real world case, it's thousands of times more at least. But for a NN, it's usually useful to during the training process to show equal amounts of the positives and negatives. `LightCurveDataset` will do this for us. You may have also noticed that we passed these collections in as the `standard_light_curve_collections` parameter. `LightCurveDataset` also allows for passing different types of collections. Notably, collections can be passed such that light curves from one collection will be injected into another. This is useful for injecting synthetic signals into real telescope data. However, we'll save the injection options for another tutorial.

You can see the above `get_transit_train_dataset` dataset creation function in the `examples/transit_dataset.py` file. The only part of that file we haven't yet looked at in detail is the `get_transit_validation_dataset` and `get_transit_finite_test_dataset` functions. However, these are nearly identical to the above `get_transit_train_dataset` expect using the validation and test path obtaining functions above instead of the train ones.
You can see the above `get_transit_train_dataset` dataset creation function in the `scripts/transit_dataset.py` file. The only part of that file we haven't yet looked at in detail is the `get_transit_validation_dataset` and `get_transit_finite_test_dataset` functions. However, these are nearly identical to the above `get_transit_train_dataset` expect using the validation and test path obtaining functions above instead of the train ones.

## Adjusting this for your own binary classification task

Expand Down
Loading

0 comments on commit 70b5dec

Please sign in to comment.