-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] rework TimeSeriesDataSet
using LightningDataModule
- experimental
#1766
Comments
Should we use the "dict" implementation for metadata or keep it separate as it is now? |
For D1, I think your For the start we can assume everything is float and future-known though, I estimate otherwise there would be a lot of boring boilerplate in handling the different column types etc. The reason for that is, I think we should get to an end-to-end design quickly and see how it looks like and how/whether it works, because we might modify or even abandon it. The work on the boilerplate would then be lost. Whereas, if this proves to be the way to go, it is still easy to add it on top. |
Few problems I found with this approach:
The Proposed solutions:
class TimeSeriesDataset(Dataset):
def __init__(self, datamodule: 'DecoderEncoderDataModule'):
self.datamodule = datamodule
self.tsd = datamodule.tsd # Preprocessed TimeSeries data
def __len__(self):
return len(self.tsd)
def __getitem__(self, idx):
# Fetch raw sample from TimeSeries
batch = self.tsd[idx]
# Apply all transformations inside the datamodule
transformed_batch = self.datamodule.transformation(batch)
return transformed_batch
class DecoderEncoderData(Dataset):
def __init__(self, tsd: PandasTSDataSet, **params):
self.tsd = tsd # Store dataset reference
# Access metadata from dataset (D1)
self.metadata = tsd.get_metadata()
def __getitem__(self, idx):
sample = self.tsd[idx]
# other required additions to ``sample``
return sample |
could you double check how the I feel the |
In this tutorial, the dataloaders in datamodule get only dataset class and not the datamodule class itself,
More clear example: class MNISTDataModule(LightningDataModule):
def __init__(self, batch_size=64):
super().__init__()
self.batch_size = batch_size
def prepare_data(self):
# download only
MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor())
MNIST(os.getcwd(), train=False, download=True, transform=transforms.ToTensor())
def setup(self, stage):
# transform
transform=transforms.Compose([transforms.ToTensor()])
MNIST(os.getcwd(), train=True, download=False, transform=transform)
MNIST(os.getcwd(), train=False, download=False, transform=transform)
# train/val split
mnist_train, mnist_val = random_split(mnist_train, [55000, 5000])
# assign to use in dataloaders
self.train_dataset = mnist_train
self.val_dataset = mnist_val
self.test_dataset = mnist_test
def train_dataloader(self):
return DataLoader(self.train_dataset, batch_size=self.batch_size)
def val_dataloader(self):
return DataLoader(self.val_dataset, batch_size=self.batch_size)
def test_dataloader(self):
return DataLoader(self.test_dataset, batch_size=self.batch_size) source : https://pytorch-lightning.readthedocs.io/en/0.10.0/introduction_guide.html#the-engineering here you can see, they pass the datasets and not the module itself. DataModules are useful while training and testing as then you just pass the model and the module and everything is handled being the curtain. dm = MNISTDataModule()
model = LitMNIST()
trainer = Trainer(tpu_cores=8)
trainer.fit(model, dm) Here Difference between using and not using data modules of lightning [Source] |
Umbrella issue for
pytorch-forecasting 2.0
design: #1736In sktime/enhancement-proposals#39, @phoeenniixx suggested a
LightningDataModule
based design for the end state of dsipts and pytorch-forecasting 2.0.As an work item, this implies a rework of the
TimeSeriesDataSet
usingLightningDataModule
, covering layers D1 and D2 (referencing the EP), with the D1 layer based on a refined design following #1757 (but simpler).@phoeenniixx agreed to give this a go as part of an experimental PR.
The text was updated successfully, but these errors were encountered: