You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the pipeline to reformat the raw PLAsTiCC data from .csv to the appropriate layout that astronet expects is slow, and takes up many gigabytes of intermediate memory due to limitations with pandas (see layout example below).
A result of this is reproducing, and testing aspects of the data reformatting pipeline can only be done on the cluster, but even this takes a long time.
This issue is a placeholder for this item to be investigated further, and to look at alternatives to pandas as the data manipulation tool -- it would still be useful to keep pandas as the final DataFrame component for it's interoperability with other libraries.
But, by way of leveraging new standard, in particular apache arrow, the end-to-end processing of the raw data could be dramatically reduced.
The front-runner for this is polars which has been shown to be well suited for this.
Currently, the pipeline to reformat the raw PLAsTiCC data from
.csv
to the appropriate layout thatastronet
expects is slow, and takes up many gigabytes of intermediate memory due to limitations withpandas
(see layout example below).A result of this is reproducing, and testing aspects of the data reformatting pipeline can only be done on the cluster, but even this takes a long time.
This issue is a placeholder for this item to be investigated further, and to look at alternatives to
pandas
as the data manipulation tool -- it would still be useful to keeppandas
as the final DataFrame component for it's interoperability with other libraries.But, by way of leveraging new standard, in particular apache arrow, the end-to-end processing of the raw data could be dramatically reduced.
The front-runner for this is
polars
which has been shown to be well suited for this.Refs:
The text was updated successfully, but these errors were encountered: