Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch-forecasting and dsipts v2 API design #39

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

fkiraly
Copy link
Contributor

@fkiraly fkiraly commented Jan 31, 2025

An enhancement proposal and design document towards v2 of pytorch-forecasting and dsipts.

More context here: sktime/pytorch-forecasting#1736

@felipeangelimvieira
Copy link
Contributor

(bringing the discussion from discord to here)
Really interesting, Franz... I’ll post a more detailed comment here once I’ve organized my thoughts. The first question that comes to mind is: could we condense the information in the output of __getitem__ of BaseTSDataset into fewer objects?

@felipeangelimvieira
Copy link
Contributor

About static_features: not all deep-learning TS models have static features. Example: N-BEATS. At the same time, I believe this is something really relevant to TFT, for example.

Given that, should this be positioned at "model" level instead?

@fkiraly
Copy link
Contributor Author

fkiraly commented Feb 3, 2025

The first question that comes to mind is: could we condense the information in the output of __getitem__ of BaseTSDataset into fewer objects?

Yes, that is also what @thawn said, let's try to have as few tensors as possible.

My answer: perhaps - but it is hard to actually write it down.

Some comments:

  • a lot of the fields are metadata, about properties of variables (future-known or not, categorical/numerical, etc)
  • we need at least 2 if we have only past data, and 3 if we have future data, simply given the native "shapes" of static, future and past data. Tensors can be rectangular only, and it is not possible to have all that data in less than 3 because of the shapes.
  • separating index is a conscious choice, otherwise you have an additional metadata field. And you are open to a future modification where you encode a RangeIndex or period index more efficiently.

@fkiraly
Copy link
Contributor Author

fkiraly commented Feb 3, 2025

About static_features: not all deep-learning TS models have static features. Example: N-BEATS. At the same time, I believe this is something really relevant to TFT, for example.

Given that, should this be positioned at "model" level instead?

I think not, since it is a "data" concern:

  • The information on which variable is static is a data property, and the values of the static variables are data (not a model part or modelling choice)
  • the "minimal representation" of static variables involves a tensor of smaller shape (n_instances x n_static_variables) - 2D and not 3D (instance, variable, time)
  • the corresponding "model" concern is "handles static variables", this is a capability type property. For a unified interface, all models need to be able to take a dataset with static variables, the ones without capability will just ignore them.

@felipeangelimvieira
Copy link
Contributor

What I'm thinking about is that static variables are a specific kind of past and future known exogenous variables, that are constant throughout time. In this sense, should we really separate them? I feel they are essentially the same in a "meta" level. The difference is how the model wants to use them.

One example: we could use fourier features as static variables in TFT. I'm not sure if there's something that do not allow us to use it that way, since these features are known for every timepoint in past and future.

@felipeangelimvieira
Copy link
Contributor

More thoughts:

  • Couldn't __getitem__ return just the relevant data, and metadata be stored in attributes self.y_cols_, self.x_cols_ of the dataset?
  • Should we have a BaseTorchTransformer that defines an interface for transformations (such as normalization of past values in a batch) at torch level? (replacing current encoders)

@fkiraly
Copy link
Contributor Author

fkiraly commented Feb 4, 2025

What I'm thinking about is that static variables are a specific kind of past and future known exogenous variables, that are constant throughout time. In this sense, should we really separate them? I feel they are essentially the same in a "meta" level. The difference is how the model wants to use them.

To me, this is not correct, on the conceptual level.

The formal model I typically adopt is that all data points are indexed. Time-varying (or "dynamic") variables change over time, and obervations (entries, values, etc) are indexed by: (instance index, variable name, time index). Whereas, static variables are only indexed by (instance index, variable name). If you adopt the usual model of hierarchical data, where a combination of indices indexes an observation, it becomes clear that static variables live at different "levels" in the indexing hierarchy.

Mapping static variables on a column on the lower level (plus time index) has them being constant as a side effect in representation, rather than as a conceptual feature.

@fkiraly
Copy link
Contributor Author

fkiraly commented Feb 4, 2025

  • Couldn't __getitem__ return just the relevant data, and metadata be stored in attributes self.y_cols_, self.x_cols_ of the dataset?

There are two ideas here, I believe: (a) nesting metadata, and (b) returning metadata not by __getitem__ but another API point.

I am very open to (a) and think this is perhaps a good idea. Regarding (b), could you explain how the retrieval of the metadata would work? The __getitem__ return seems to be the single intended interface point, if you go through the entire lightning workflow, e.g., DataLoader does not assume anything except (if I understand correctly). If you disagree, I would appreciate an explanation how the information in the suggested attributes will reach the model, if it needs to.

Should we have a BaseTorchTransformer that defines an interface for transformations (such as normalization of past values in a batch) at torch level? (replacing current encoders)

Yes, that would be a great idea - do you want to draw up a design? And, surely this exists elsewhere, too?

@felipeangelimvieira
Copy link
Contributor

felipeangelimvieira commented Feb 4, 2025

The formal model I typically adopt is that all data points are indexed. Time-varying (or "dynamic") variables change over time, and obervations (entries, values, etc) are indexed by: (instance index, variable name, time index). Whereas, static variables are only indexed by (instance index, variable name). If you adopt the usual model of hierarchical data, where a combination of indices indexes an observation, it becomes clear that static variables live at different "levels" in the indexing hierarchy.

I understand your point that, since they can be indexed by different amout of indexes while keeping the information, they are and can be classified as different things. Nevertheless, I believe that the challenge here is projecting this higher dimensional space of concepts (e.g. classifying variables by instance, variable name, time index, discrete or continuous, etc) into a lower dimensional space that still makes the task (implementing different NN-based forecasters) feasible.

In this regard, the separation of data in 1. target variable, 2. features with past knowns, 3. features with past/future knowns is possibly the lowest dimensional representation, and that can be further subdivided into other categories/concepts, such as features with past/future knowns that are static and do not change overtime (we can imagine it as a tree).

Mapping static variables on a column on the lower level (plus time index) has them being constant as a side effect in representation, rather than as a conceptual feature.

I thinking in the opposite direction: static variables are features, they belong to this "feature" concept: relevant data other than the target variable used as input to predict the target variable. They are a special subset of features, that do not change over time, but this does not place them outside the set of features. In particular, being part of this subset allows them to be represented with less index levels as you mentioned.

Some models, such as DeepAR, may still want to use "static variables" as exogenous (I believe this is the case, by inspecting the implementation and the property self.reals inherited from parentBaseModelWithCovariates). So the differentiation of static and non-static is actually model dependent. TFT, on the other hand, uses static variables in a particular way.

@felipeangelimvieira
Copy link
Contributor

felipeangelimvieira commented Feb 4, 2025

am very open to (a) and think this is perhaps a good idea. Regarding (b), could you explain how the retrieval of the metadata would work? The getitem return seems to be the single intended interface point, if you go through the entire lightning workflow, e.g., DataLoader does not assume anything except (if I understand correctly). If you disagree, I would appreciate an explanation how the information in the suggested attributes will reach the model, if it needs to.

good point... If this is a direction we would like to go, I believe there would be possibly two solutions:

  • Create a custom dataloader
  • Create a custom Tensor (inherting from pytorch's tensor, that keeps the metadata of the features).

@fkiraly
Copy link
Contributor Author

fkiraly commented Feb 5, 2025

Some models, such as DeepAR, may still want to use "static variables" as exogenous (I believe this is the case, by inspecting the implementation and the property self.reals inherited from parentBaseModelWithCovariates). So the differentiation of static and non-static is actually model dependent. TFT, on the other hand, uses static variables in a particular way.

But, this is not an intrinsic property of the static varibales. It is how the model chooses to use them. So it is not a data property but a modelling choice, and specifically a "reduction" choice where we reformat the data (like in the reduction forecaster, just a different type of making the data "rectangular-formatted")

Therefore, with the domain driven design philosophy, the choice of reformatting the static variables falls with the model rather than the data layer.

Create a custom dataloader

I wanted to do that too but was voted down and convinced by good arguments in the Jan 20 meeting, reference: sktime/pytorch-forecasting#1736
(open the dropdown and search for dataloader, for defails)

The key argument for me was, current vignettes in all packages reviewed rely on the default DataLoader, and details seem unclear on how this would interact with model and lightning layer. Naive approaches seem to lead to coupling throughout.

Create a custom Tensor (inherting from pytorch's tensor, that keeps the metadata of the features).

I think that is a hypothetical possibility but a bad idea. We would be rewriting layers outside the scope of the package.

I still have nightmares from our attempts to rewrite DataFrame for sktime.
https://github.com/sktime/data-container

@prockenschaub did great work, and I think our designs would have been successful in the end - it just turned out we were getting pulled into the scope of rewriting pandas from scratch, with more resources required than we had (focusing on the core package and all).

The lesson from that is, I suppose, if you are building a house and suddenly find yourself wanting to tear down your neighbour's and then the entire street, there is maybe something architectural to rethink.

@phoeenniixx
Copy link

phoeenniixx commented Feb 7, 2025

Hi Everyone! I have an idea for D1 layer, not sure if this will work but I thought to reduce the metadata by using dicts.
I would really appreciate the comments

D1 __getitem__ returns:

* t: tensor of shape (n_timepoints)
  Time index for each time point in the past or present. Aligned with ``y``, and ``x`` not ending in ``f``.
* y: tensor of shape (n_timepoints, n_targets)
  Target values for each time point. Rows are time points, aligned with ``t``.
* x: tensor of shape (n_timepoints, n_features)
  Features for each time point. Rows are time points, aligned with ``t``.
* group: tensor of shape (n_groups)
  Group identifiers for time series instances.
* st: tensor of shape (n_static_features)
  Static features.
* col_map: dict { 'y': list[str], 'x': list[str], 'st': list[str] }
  Names of columns for y, x, and static features.
* st_col_type: dict[str, int]
  Names of static features mapped to type (0=categorical, 1=numerical).
  Example: { 'region': 0, 'temp': 1 }
* y_types: list of int (length = n_targets)
  Type of each target variable: 0 (categorical), 1 (numerical).
* x_types: list of int (length = n_features)
  Type of each feature:  
  - 0 = categorical unknown  
  - 1 = categorical known  
  - 2 = numerical unknown  
  - 3 = numerical known
Optionally, the following str-keyed entries can be included:
    * t_f: tensor of shape (n_timepoints_future)
      Time index for each time point in the future.
      Aligned with ``x_f``.
    * x_f: tensor of shape (n_timepoints_future, n_features)
      Known features for each time point in the future.
      Rows are time points, aligned with ``t_f``.
    * weights: tensor of shape (n_timepoints), only if weight is not None
    * weight_f: tensor of shape (n_timepoints_future), only if weight is not None

In this way we have a little less metadata

@phoeenniixx
Copy link

phoeenniixx commented Feb 7, 2025

There are two ideas here, I believe: (a) nesting metadata, and (b) returning metadata not by __getitem__ but another API point.

As we are using just lightning modules (please correct me if I am wrong) we can get the metadata in setup function which is a part of the module: https://lightning.ai/docs/pytorch/LTS/common/lightning_module.html#setup

(Although not sure how much of it is feasible😅)

example:

class MyModel(LightningModule):
    def setup(self, stage=None):
        dataset = self.trainer.datamodule.train_dataset
        self.col_map = dataset.col_map
        self.st_col_type = dataset.st_col_type
        self.y_types = dataset.y_types
        self.x_types = dataset.x_types

    def forward(self, x):
     # Now we have metadata available for processing for forward
        pass

    def training_step(self, batch, batch_idx):
    # training logic
        pass

image
Source: https://www.assemblyai.com/blog/pytorch-lightning-for-dummies/

More examples: https://lightning.ai/docs/pytorch/stable/data/datamodule.html#what-is-a-datamodule

Or we can do this:

class DecoderEncoderData(Dataset):
    def __init__(self, tsd: PandasTSDataSet, **params):
        self.tsd = tsd  # Store dataset reference

        # Access metadata from dataset (D1)
        self.col_map = tsd.col_map
        self.st_col_type = tsd.st_col_type
        self.y_types = tsd.y_types
        self.x_types = tsd.x_types

    def __getitem__(self, idx):
        sample = self.tsd[idx]  
        return sample

@phoeenniixx
Copy link

A doubt: are we using pytorch-lightning or lightning? As this is the thing that confuses me everytime I work on ptf, both are similar yet different and although they have similar names and functions they can't be used interchangeably as even their trainers are different (although both have setup support)
I am just asking this for future refrences :)

@felipeangelimvieira
Copy link
Contributor

felipeangelimvieira commented Feb 7, 2025

I think that is a hypothetical possibility but a bad idea. We would be rewriting layers outside the scope of the package.

What I meant is not using this tensor in every part of the code. Just having a code encapsulating the input data tensor with this metadata. This something documented in pytorch's documentation

But, this is not an intrinsic property of the static varibales. It is how the model chooses to use them. So it is not a data property but a modelling choice, and specifically a "reduction" choice where we reformat the data (like in the reduction forecaster, just a different type of making the data "rectangular-formatted")

Therefore, with the domain driven design philosophy, the choice of reformatting the static variables falls with the model rather than the data layer.

I would argue that "static" features are not a real phenomenon; they are a separation made in the original paper. Nothing impedes the usage of other past and future known features in the same way that "static" features are used.

One example of a use case that might encounter difficulties when separating static and non-static variables in the data layer is as follows: What if I want to use non-constant features (such as Fourier terms, which are known for both the past and future) in the same way that static variables are used in TFT? This approach could be interesting if we want the gating mechanisms to depend on seasonality. For example, holidays have different effects depending on the day of the week, and using sine and cosine functions as static features could be helpful in this scenario.

@felipeangelimvieira
Copy link
Contributor

A doubt: are we using pytorch-lightning or lightning? As this is the thing that confuses me everytime I work on ptf, both are similar yet different and although they have similar names and functions they can't be used interchangeably as even their trainers are different (although both have setup support)
I am just asking this for future refrences :)

hi @phoeenniixx ! I believe it is lightning, based on pyproject.toml's content.

@phoeenniixx
Copy link

Thanks for clarifying this! @felipeangelimvieira

@fkiraly
Copy link
Contributor Author

fkiraly commented Feb 8, 2025

@phoeenniixx, thanks a lot for your useful thoughts!

A doubt: are we using pytorch-lightning or lightning?

I think the latter is a rebrand of the former, the package and hte project have been renamed.

we can get the metadata in setup function

Interesting - how would the entire workflow look like if we do this?

Though, using setup py seems to deviate slightly from its intended specification, the docs refer to data processing related setup rather than metadata handling.

I thought to reduce the metadata by using dicts.
I would really appreciate the comments

That's a very interesting idea!

Following up on this, I would actually put even more in single dicts, and use strings (like in pandas) for types:

D1 __getitem__ returns:

* t: tensor of shape (n_timepoints)
  Time index for each time point in the past or present. Aligned with ``y``, and ``x`` not ending in ``f``.
* y: tensor of shape (n_timepoints, n_targets)
  Target values for each time point. Rows are time points, aligned with ``t``.
* x: tensor of shape (n_timepoints, n_features)
  Features for each time point. Rows are time points, aligned with ``t``.
* group: tensor of shape (n_groups)
  Group identifiers for time series instances.
* st: tensor of shape (n_static_features)
  Static features.
* cols: dict { 'y': list[str], 'x': list[str], 'st': list[str] }
  Names of columns for y, x, and static features. List elements are in same order as column dimensions.
  Columns not appearing are assumed to be named x0, x1, etc, y0, y1, etc, st0, st1, etc.
* col_type: dict[str, str]
  maps column names to data types "F" (numerical) and "C" (categorical).
  Column names not occurring are assumed "F".
* col_known: dict[str, str]
  maps column names to "K" (future known) or "U" (future unknown).
  Column names not occurring are assumed "K".
  
Optionally, the following str-keyed entries can be included:
    * t_f: tensor of shape (n_timepoints_future)
      Time index for each time point in the future.
      Aligned with ``x_f``.
    * x_f: tensor of shape (n_timepoints_future, n_features)
      Known features for each time point in the future.
      Rows are time points, aligned with ``t_f``.
    * weights: tensor of shape (n_timepoints), only if weight is not None
    * weight_f: tensor of shape (n_timepoints_future), only if weight is not None

@phoeenniixx
Copy link

Thanks for the comments @fkiraly !

how would the entire workflow look like if we do this?

Well I have not thought this through but we can do something like this:

# Layer D1
class PandasTSDataSet:
    def __init__(self, df, y_cols, x_cols, st_cols, y_types, x_types, st_types):
        self.df = df
        self.metadata = [metadata]    
    def __getitem__(self, idx):
        # Returns only tensors, no metadata
        return {
            't': tensor(...),
            'y': tensor(...),
            'x': tensor(...),
            'group': tensor(...),
            'st': tensor(...),
            't_f': tensor(...),
            'x_f': tensor(...),
        }
    
    def get_metadata(self):
        return self.metadata

# Layer D2
class DecoderEncoderDataModule(LightningDataModule):
    def __init__(self, d1_dataset, batch_size=32, num_workers=4):
        super().__init__()
        # initialize other  params
        self.d1_dataset = d1_dataset
        self.batch_size = batch_size
        self.metadata = None  # Will be set in setup()
    
    def setup(self, stage=None):
        # Get metadata from D1 layer during setup
        self.metadata = self.d1_dataset.get_metadata()
        
        if stage == 'fit' or stage is None:
            # Any train-specific setup
            pass
        
        if stage == 'test' or stage is None:
            # Any test-specific setup
            pass
            
    def train_dataloader(self):
        return DataLoader(
            self.d1_dataset,
            batch_size=self.batch_size,
            **other_params
        )

# Layer M
class DecoderEncoderModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.metadata = None
        
    def setup(self, stage=None):
        # Get metadata from datamodule during setup
        self.metadata = self.trainer.datamodule.metadata
        
        # Initialize layer T model using metadata
    
    def forward(self, x):
     # forward logic
        pass

Though, using setup py seems to deviate slightly from its intended specification, the docs refer to data processing related setup rather than metadata handling.

Well yes, but I found this was the closest way where data and related fields are handled in the model. Or as I mentioned earlier, we can just pass the metadata directly to other layers by doing something similar to this.

Here we can even define a function like get_metadata() that returns everything that's necessary, that will reduce the scope of missing any metadata field.Like:

class DecoderEncoderData(Dataset):
    def __init__(self, tsd: PandasTSDataSet, **params):
        self.tsd = tsd  # Store dataset reference

        # Access metadata from dataset (D1)
        self.metadata = tsd.get_metadata()

    def __getitem__(self, idx):
        sample = self.tsd[idx]  
        # other required additions to ``sample``
        return sample

In this way, we don't deviate from setup documentation, pass the metadata as well to other layers and we don't have to return the metadata in the __getitem__.

Following up on this, I would actually put even more in single dicts, and use strings (like in pandas) for types:

Yeah! we can do that as well. Didn't strike me at that time. Thanks!

@fkiraly
Copy link
Contributor Author

fkiraly commented Feb 8, 2025

@phoeenniixx, brilliant! Great thinking!

In restrospect - looking at your post, it looks like with layer D2 we were trying to reinvent the LightningDataModule interface!

Nice! I will need some time to digest, but I think we are getting close now.

FYI @agobbifbk, @felipeangelimvieira

@phoeenniixx
Copy link

Thanks! @fkiraly

In restrospect - looking at your post, it looks like with layer D2 we were trying to reinvent the LightningDataModule interface!

Well I didn't fiind anyother way in lightning module that can handle this operation so we need to do something :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

4 participants