Skip to content

Commit aa45e7e

Browse files
authored
Merge pull request #80 from giotto-ai/refactor_before_master
Refactor before master
2 parents 7f0383b + e4cc41f commit aa45e7e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1227
-962
lines changed

doc/conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919

2020
project = "giotto-time"
2121
copyright = "2019, L2F"
22-
author = "Benjamin Russell, Stefano Savarè, Alessio Baccelli"
2322

2423
# The full version, including alpha/beta/rc tags
2524
from giottotime import __version__

doc/images/gar.png

197 KB
Loading

doc/images/no_trend.png

78.4 KB
Loading

doc/images/trend.png

58 KB
Loading

doc/images/trimmer.png

34.1 KB
Loading

doc/reference/feature_creation.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
:toctree: generated/
1313
:template: class.rst
1414

15-
feature_creation.FeaturesCreation
15+
feature_creation.FeatureCreation
1616
feature_creation.ShiftFeature
1717
feature_creation.MovingAverageFeature
1818
feature_creation.ConstantFeature
@@ -43,4 +43,3 @@
4343
feature_creation.tda_features.AvgLifeTimeFeature
4444
feature_creation.tda_features.BettiCurvesFeature
4545
feature_creation.tda_features.NumberOfRelevantHolesFeature
46-

doc/reference/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ API Reference
77
======================================
88

99
.. toctree::
10-
:maxdepth: 3
10+
:maxdepth: 2
1111
:hidden:
1212

1313
causality_tests

doc/release_notes/index.rst

Lines changed: 104 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ Overview
1111
compared to traditional time series libraries are the following:
1212

1313
- feature creation, model selection, model assessment and prediction pipeline for time series models.
14-
- plug-and-play availability of any scikit-learn-compatible regression or classification model for forecasting.
15-
- minimization of standard custom loss functions for time series (SMAPE, max error, etc..)
16-
- easy-to-use scikit-learn-familiar API.
14+
- plug-and-play availability of any scikit-learn-compatible (i.e., in the fit-transform framework) regression or classification models for forecasting.
15+
- minimization of standard and custom loss functions for time series (SMAPE, max error, etc..).
16+
- easy-to-use scikit-learn-familiar and pandas-familiar API.
1717

18-
Additionally we provide standard causality tests with a scikit-learn-like interface.
18+
Additionally we provide a causality tests with a scikit-learn-like transformer interface.
1919

2020

21-
Input-Output specifications
21+
Input-Output Specifications
2222
~~~~~~~~~~~~~~~~~~~~~~~~~~~
2323

2424
**Input:** `pd.Series`, `pd.DataFrame` (single column), `np.array`, `list`
@@ -28,31 +28,6 @@ Input-Output specifications
2828
**Additional input parameters:** the user can pass a list of features and a scikit-learn
2929
compatible model to giotto-time.
3030

31-
Example of Usage
32-
~~~~~~~~~~~~~~~~
33-
34-
.. code-block:: python
35-
36-
from giottotime.feature_creation import FeaturesCreation
37-
from giottotime.feature_creation.index_independent_features import ShiftFeature, MovingAverageFeature
38-
from giottotime.model_selection.train_test_splitter import TrainTestSplitter
39-
from giottotime.regressors import LinearRegressor
40-
from giottotime.models.time_series_models import GAR
41-
42-
time_series = get_time_series()
43-
44-
features_creation = FeaturesCreation(
45-
horizon=4,
46-
features = [ShiftFeature(1), ShiftFeature(2), MovingAverageFeature(5)]
47-
)
48-
train_test_splitter = TrainTestSplitter()
49-
time_series_model = GAR(base_model=LinearRegressor())
50-
51-
X, y = features_creation.transform(time_series)
52-
X_train, y_train, X_test, y_test = train_test_splitter.transform(X, y)
53-
54-
time_series_model.fit(X_train, y_train)
55-
predictions = time_series_model.predict(X_test)
5631

5732
Time Series Preparation
5833
~~~~~~~~~~~~~~~~~~~~~~~~
@@ -79,28 +54,108 @@ We support the following features:
7954
- `ExogenousFeature`
8055
- `CustomFeature`
8156

82-
The features have a scikit-learn-like interface.
57+
These features all have a scikit-learn-like interface and behave as transformers.
8358

8459
The class FeatureCreation wraps a list of features together and returns the X and y
8560
matrices from a time series given as input.
8661

8762
Time Series Model
8863
~~~~~~~~~~~~~~~~~
89-
We provide the `GAR` class (Generalize Auto Regressive).
64+
Giotto-time provide the `GAR` class (Generalize Auto Regressive model).
9065
It operates in a similar way to the standard AR, but with an arbitrary number of
91-
features and with an arbitrary regression model.
66+
features and with an arbitrary underlying regression model.
67+
68+
.. image:: ../../../../images/gar.png
69+
:width: 60%
70+
:align: center
71+
72+
.. code-block:: python
73+
74+
from giottotime.feature_creation import FeaturesCreation
75+
from giottotime.feature_creation.index_independent_features import ShiftFeature, MovingAverageFeature
76+
from giottotime.model_selection.train_test_splitter import TrainTestSplitter
77+
from giottotime.regressors import LinearRegressor
78+
from giottotime.models.time_series_models import GAR
79+
80+
time_series = get_time_series()
81+
82+
features_creation = FeaturesCreation(
83+
horizon=4,
84+
features = [ShiftFeature(1), ShiftFeature(2), MovingAverageFeature(5)]
85+
)
86+
train_test_splitter = TrainTestSplitter()
87+
time_series_model = GAR(base_model=LinearRegressor())
88+
89+
X, y = features_creation.transform(time_series)
90+
X_train, y_train, X_test, y_test = train_test_splitter.transform(X, y)
91+
92+
time_series_model.fit(X_train, y_train)
93+
predictions = time_series_model.predict(X_test)
9294
9395
Time Series Trend Model
9496
~~~~~~~~~~~~~~~~~~~~~~~
95-
We provide three main classes to analyze and remove trends from time series:
96-
- `FunctionTrend`
97-
- `ExponentialTrend`
98-
- `PolynomialTrend`
97+
We provide main classes to analyze and remove trends from time series in order to create trend stationary time series.
98+
99+
Specifically, giotto-time includes `ExponentialTrend`, `PolynomialTrend` model classes and de-trending transformers.
100+
101+
Example of Usage
102+
~~~~~~~~~~~~~~~~
103+
104+
.. code-block:: python
105+
106+
import numpy as np
107+
import pandas as pd
108+
109+
import matplotlib.pyplot as plt
110+
111+
from giottotime.models.regressors.linear_regressor import LinearRegressor
112+
from giottotime.loss_functions.loss_functions import max_error, smape
113+
114+
from giottotime.models.trend_models.polynomial_trend import PolynomialTrend
115+
116+
from math import pi
117+
118+
d = pd.read_csv('trend.csv', index_col=0, parse_dates=True)
119+
tm = PolynomialTrend(order=3)
120+
121+
tm.fit(d)
122+
123+
d.plot(figsize=(10, 10))
124+
plt.show()
125+
126+
detrended = tm.transform(d)
127+
128+
detrended.plot(figsize=(10, 10))
129+
plt.show()
130+
131+
Before the detrending tranformer, a clear quadratic trend is present in the data:
132+
133+
.. image:: ../../../../images/trend.png
134+
:width: 60%
135+
:align: center
136+
137+
After fitting and applying the detrending tranformer, a the transformed data is 'trend stationary':
138+
139+
.. image:: ../../../../images/no_trend.png
140+
:width: 60%
141+
:align: center
142+
143+
For additional information on trend stationarity, see:
144+
Trend stationarity: `Wikipedia - Trend stationarity <https://en.wikipedia.org/wiki/Trend_stationary />`_.
145+
99146

100147
Model Selection and Cross Validation
101148
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149+
- `trim_feature_nans`
150+
151+
.. image:: ../../../../images/trimmer.png
152+
:width: 60%
153+
:align: center
154+
102155
- `TrainTestSplitter`
103156

157+
158+
104159
Custom Regressors
105160
~~~~~~~~~~~~~~~~~
106161

@@ -110,6 +165,18 @@ Causality Tests
110165
~~~~~~~~~~~~~~~
111166
We provide two tests: `ShiftedLinearCoefficient` and `ShiftedPearsonCorrelation`.
112167

168+
.. code-block:: python
169+
170+
import numpy as np
171+
import pandas as pd
172+
173+
import matplotlib.pyplot as plt
174+
175+
from giottotime.causality_tests import ShiftedPearsonCorrelation
176+
177+
#TODO
178+
179+
113180
Release 0.2.0 (to be discussed)
114181
-------------------------------
115182
To be discussed.

giottotime/base/constants.py

Lines changed: 0 additions & 5 deletions
This file was deleted.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,10 @@
1+
from .base import CausalityTest
12
from .shifted_linear_coefficient import ShiftedLinearCoefficient
23
from .shifted_pearson_correlation import ShiftedPearsonCorrelation
4+
5+
6+
__all__ = [
7+
"CausalityTest",
8+
"ShiftedLinearCoefficient",
9+
"ShiftedPearsonCorrelation",
10+
]

giottotime/causality_tests/shifted_linear_coefficient.py

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
import numpy as np
44
import pandas as pd
55
from sklearn.linear_model import LinearRegression
6+
from sklearn.utils.validation import check_is_fitted
67

78
from giottotime.causality_tests.base import CausalityTest
8-
from giottotime.models.utils import check_is_fitted
99

1010

1111
class ShiftedLinearCoefficient(CausalityTest):
@@ -14,33 +14,34 @@ class ShiftedLinearCoefficient(CausalityTest):
1414
1515
Parameters
1616
----------
17-
max_shift : ``int``, optional, (default=``10``).
17+
max_shift : int, optional, default: ``10``
18+
The maximum number of shifts to check for.
1819
19-
target_col : ``str``, optional, (default='y').
20-
The column to use as the a reference (i.e., the columns which is not shifted).
20+
target_col : str, optional, default: ``'y'``
21+
The column to use as the a reference (i.e., the columns which is not
22+
shifted).
2123
22-
dropna : ``bool``, optional, (default=False).
24+
dropna : bool, optional, default: ``False``
2325
Determines if the Nan values created by shifting are retained or dropped.
2426
2527
"""
2628

2729
def __init__(
2830
self, max_shift: int = 10, target_col: str = "y", dropna: bool = False
2931
):
30-
self._max_shift = max_shift
31-
self._target_col = target_col
32-
self._dropna = dropna
32+
self.max_shift = max_shift
33+
self.target_col = target_col
34+
self.dropna = dropna
3335

3436
def fit(self, data: pd.DataFrame) -> "ShiftedLinearCoefficient":
35-
"""Create the dataframe of shifts of each time series which maximize
36-
the shifted linear fit coefficients.
37+
"""Create the dataframe of shifts of each time series which maximize the shifted
38+
linear fit coefficients.
3739
3840
Parameters
3941
----------
40-
data : ``pd.DataFrame``, required.
41-
The time-series on which to compute the shifted linear fit coefficients.
42-
43-
max_shift : ``int``, optional, (default=10).
42+
data : pd.DataFrame, shape (n_samples, n_time_series), required.
43+
The DataFrame containing the time-series on which to compute the shifted
44+
linear fit coefficients.
4445
4546
Returns
4647
-------
@@ -53,7 +54,7 @@ def fit(self, data: pd.DataFrame) -> "ShiftedLinearCoefficient":
5354
)
5455

5556
for x, y in product(data.columns, repeat=2):
56-
res = self._get_max_coeff_shift(data, self._max_shift, x=x, y=y)
57+
res = self._get_max_coeff_shift(data, self.max_shift, x=x, y=y)
5758

5859
best_shift = res[1]
5960
max_corr = res[0]
@@ -77,34 +78,33 @@ def fit(self, data: pd.DataFrame) -> "ShiftedLinearCoefficient":
7778
return self
7879

7980
def transform(self, data: pd.DataFrame) -> pd.DataFrame:
80-
"""Shifts each input timeseries but the amount which maximizes
81-
shifted linear fit coefficients with the selected 'y' colums.
81+
"""Shifts each input timeseries but the amount which maximizes shifted linear
82+
fit coefficients with the selected 'y' columns.
8283
8384
Parameters
8485
----------
85-
data : ``pd.DataFrame``, required.
86-
The time-series on which to perform the transformation.
86+
data : pd.DataFrame, shape (n_samples, n_time_series), required.
87+
The DataFrame containing the time series on which to perform the
88+
transformation.
8789
8890
Returns
8991
-------
90-
shifted_data : ``pd.DataFrame``
91-
The dataframe (Pivot table) of the shifts which maximize the shifted linear
92+
data_t : pd.DataFrame, shape (n_samples, n_time_series)
93+
The DataFrame (Pivot table) of the shifts which maximize the shifted linear
9294
fit coefficients between each timeseries. The shift is indicated in rows.
9395
9496
"""
95-
check_is_fitted(self)
96-
shifted_data = data.copy()
97+
check_is_fitted(self, ["best_shifts_", "max_corrs_"])
98+
data_t = data.copy()
9799

98-
for col in shifted_data:
99-
if col != self._target_col:
100-
shifted_data[col] = shifted_data[col].shift(
101-
self.best_shifts_[col][self._target_col]
102-
)
100+
for col in data_t:
101+
if col != self.target_col:
102+
data_t[col] = data_t[col].shift(self.best_shifts_[col][self.target_col])
103103

104-
if self._dropna:
105-
shifted_data = shifted_data.dropna()
104+
if self.dropna:
105+
data_t = data_t.dropna()
106106

107-
return shifted_data
107+
return data_t
108108

109109
def _get_max_coeff_shift(
110110
self, data: pd.DataFrame, max_shift: int, x: str = "x", y: str = "y"

0 commit comments

Comments
 (0)