In this section we want to track the Meet Up contents and information.
Speaker: Sebastian Martinez
Abstract: Synthetic control is a method for estimating causal effects (evaluate the effect of an intervention) in comparative case studies when there is only one treatment unit. The method chooses a set of weights for a group of corresponding units that produces an optimally estimated counterfactual to the unit that received the treatment. This unit is referred to as the “synthetic unit” and can be used to outline what would have happened to the treated unit had the treatment never occurred.
11.08.2020 Exploring Facebook Prophet (Book Club)
Moderator: Aaron Pickering
Descripton: We want to discuss the capabilities and limitations of Facebook’s forecasting package Prophet. We want to work a concrete example: On the file data/sample_data_1.csv
you can find (dummy) sales
data (2018-01-01
to 2021-06-30
). In addition there is a feature media
which could be used as an external predictor. The objective is to create a time series forecasting model using Facebook Prophet. The forecasting window is 30 days. Here are some hints/steps for the challenge:
- Do an EDA on the sample data.
- Is there any clear seasonality or trend?
- Is the feature
media
useful for prediction? - How do you evaluate model performance?
- Can you estimate model uncertainty (credible intervals)?
Try to generate predictions for the whole month of July 2021. We will provide the "true" values during the meetup so that we can all test our models. Again, this is not a competition, but rather a concrete use case to apply what we learn about Prophet.
Please bring questions and suggestions to make the best out of this session!
Speaker: Markus Löning (PhD Student @UCL)
Abstract: We present sktime, a new unified toolbox for machine learning with time series in Python. We provide state-of-the-art time series algorithms and scikit-learn compatible tools for model composition. The goal of sktime is to make the time series analysis ecosystem more usable and interoperable. In this talk, you'll learn about different time series learning tasks, how algorithms for one task can be used to help solve another one, sktime's key design ideas and our plans for the future.
Resources:
- MeeUp Code https://github.com/mloning/intro-to-sktime-berlin-tsa-meetup-2020
- GitHub https://github.com/alan-turing-institute/sktime
- Paper sktime: A Unified Interface for Machine Learning with Time Series
Moderator: Korbinian Kuusisto (Co-Founder @ Kineo.ai)
Abstract: Probabilistic forecasting, i.e. estimating the probability distribution of a time series’ future given its past, is a key enabler for optimizing business processes. In retail businesses, for example, forecasting demand is crucial for having the right inventory available at the right time at the right place.
The prevalent forecasting methods in use today have been developed in the setting of forecasting individual or small groups of time series. In this approach, model parameters for each given time series are independently estimated from past observations. The model is typically manually selected to account for different factors, such as autocorrelation structure, trend, seasonality, and other explanatory variables.
However, especially in the demand forecasting domain, one is often faced with highly erratic, intermittent or bursty data which violate core assumptions of many classical techniques, such as Gaussian errors, stationarity, or homoscedasticity of the time series.
Amazon's DeepAR is a forecasting method based on auto-regressive recurrent networks (LSTMs), which learns a global model from historical data of all time series in the data set. In their paper the authors demonstrate how by applying deep learning techniques to forecasting, one can overcome many of the challenges faced by widely-used classical approaches to the problem.
Resources:
- DeepAR: Probabilistic forecasting with autoregressive recurrent networks
- Understanding LSTM Networks
- Entity Embeddings of Categorical Variables
Remarks/Questions:*
- Do you have a guideline when DeepAR would NOT be useful/appropriate?
One big assumption that holds for all deep learning models as well as deepAR is that the distribution of the data does not change in the future. If that happens then of course DeepAR is not appropriate. Another issue is that if your problem has covariates which are not available in the time slots you are doing prediction then this method will not work. Also this method is uni variate method so if your problem is multivariate it will also not be appropriate. Finally due to the LSTM the sequence length might be an issue i.e. if you want to predict daily for 2 years for example. LSTMs suffer from forgetting for long time sequences and then also its not appropriate. In addition, data points must be regular in time (Kashif Rasul).
Speaker: Dr. Kashif Rasul (Research Scientist @Zalando SE)
Abstract: I will present an overview of deep learning based probabilistic forecasting methods and then show how we can extend it to do multivariate probabilistic forecasting in an efficient manner by using Normalizing Flows. I will cover all the background material needed to understand these concepts as well so don't worry if you are new to them.
Resources:
Moderator: BTSA Organizers
Descripton: We will have a hands on session on exploratory data analysis (EDA) for time series data. EDA depends of course on the data and the objective of the study. We will give some hints on how to start with it. Again, it is not set in stone, but a guiding principle.
- Missing values and data frequency (notebook).
- Stationarity and correlation analysis (notebook).
- Seasonality, decomposition, and outlier detection (notebook).
Speaker: Dr. Francesca Lazzeri
Resources:
- GitHub repository: https://github.com/FrancescaLazzeri/AutoML-for-Timeseries-Forecasting-Berlin-MeetUp
- Article around classical methods vs deep learning methods for time series forecasting.
Speaker: Angus Dempster
Resources:
- Paper Links:
- https://arxiv.org/abs/1910.13051 (ROCKET)
- https://arxiv.org/abs/2012.08791 (MINIROCKET)
- GitHub repositories:
Speaker: Ritwika Mukherjee
Abstract: Self-supervised methods for learning embedded feature spaces have increased in popularity over the last couple of years. These techniques allow for efficient representation learning despite complex high dimensional input spaces. In this session, we will explore the 'wav2vec' model developed by Facebook research and its applications in audio signal processing. The model has implications in speech, vastly reducing the need for annotated labels, and can be transferred across other time-series data.
Resources:
15.06.2021 Introduction CausalImpact
Speaker: Munji Choi
Resources
Speaker: Thomas Bierhance
Abstract: Although Kaggle contests are not directly comparable to a real-life task for a forecasting practitioner, the contests are closer to reality than many of the standard data sets found in the scientific literature. They are therefore a goldmine of practice-relevant ideas. Thomas explains the lessons learned from various Kaggle competitions, what needs to be considered when using them in practice, and what loose ends he believes still exist.
Resources
Abstract: In this session we will explore three classical time series forecasting methods:
- Exponential Smoothing (Aaron Pickering)
- ARIMA Models (Sebastian Martinez)
- State Space Models (Juan Orduz)
We will have 20 min session for each method focusing on explaining the main idea behind it through examples. No prior knowledge is required.
Recommended reference: Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos: https://otexts.com/fpp3/ Chapters 8 and 9.
You can find the notebooks of the session here. In addition, here is a summary article on Introduction to Exponential Smoothing.
Speaker: Oleksandr Shchur
References:
- MeetUp Presentation
- Neural Temporal Point Processes: A Review
- Blog post from Oleksandr Shchur:
- Python packages related to TPP:
Speaker: Dr. Fiammetta Menchetti and Eugenio Palmieri
Abstract: In this talk we will provide an overview of C-ARIMA, an approach based on ARIMA models that can be used to make causal statements under the potential outcomes framework. After a brief description of the methodology, we will have a practical session on a real data set where we will illustrate the use of the CausalArima R package, see FMenchetti/CausalArima.
Speaker: Dr. Tom Kealy
Speaker: Dr. Tom Kealy, Senior Data Scientist at HelloFresh
Abstract: Bootstrapping is a common non-parametric technique which uses resampling with replacement to estimate the distribution of a test statistic. Fatally, for time series methods, bootstrapping relies on the exchangeability of data points for statistical analysis consistency. This talk will introduce methods for bootstrapping time series data, with a particular focus on the Maximum Entropy Bootstrap family of algorithms to estimate distributions of test statistics for time series data.
Speaker: Flavio Morelli
Abstract: Graph Neural Networks (GNNs) make it possible to apply deep learning methods to graph data. By taking into account the geometric structure of the graph, it is possible to improve model performance compared to more classical methods of graph analysis. Some examples of data that have a natural graph structure include knowledge graphs, chemical compounds, social and telecommunication networks. However, classical GNNs can only be applied to static graphs. This talk will introduce temporal GNNs, which can be applied to graphs changing through time. We will discuss the different flavors of temporal GNNs and the specific challenges that arise when dealing with dynamic graphs.
References:
05.04.2022 Modern Time Series Analysis with STUMPY
Speaker: Dr. Sean Law
Abstract: STUMPY is a powerful and scalable Python library that efficiently computes something called the matrix profile which can be used for a variety of time series data mining tasks. In this talk, we'll cover all of the background necessary for understanding how to leverage matrix profiles to automatically find patterns/anomalies in your time series data.
Speaker: Dr. Jethro Browell
Abstract: Energy systems are evolving rapidly as they decarbonize, consequences of which include an increasing dependence on weather and new consumer (and producer) behaviours. As a result, all actors in the energy sector are more reliant than ever on short-term forecasts, from the National Grid to me and (maybe) you. Furthermore, in operate as economically as possible and maintain high standards of reliability, forecast uncertainty must be quantified and managed. This seminar will introduce energy forecasting, highlight statistical challenges in this area, and present some recent solutions including forecasting extreme quantiles and modelling time-varying covariance structures.
Speaker: Federico Garza (Co-founder: https://github.com/Nixtla/nixtla)
Abstract: Open source time series forecasting with the Nixtla library.
Speaker: Work Life Balance
Speaker Aaron Pickering (Co-founder http://seenly.io)
Abstract: TBD
Speaker Javier Ramirez (https://es.linkedin.com/in/ramirez)
Abstract Time Series forecasting and modelling is fun, but capturing time-series data and dealing with large volume datasets not so much. In this talk we will introduce QuestDB, an Apache 2.0 licensed database specialised in time-series. We will see how QuestDB can help ingesting, interactively querying, downsampling or augmenting your data. Since QuestDB is postgreSql-compatible, we can interact easily using python with or without Pandas. We will run a demo using QuestDB and skforecast to illustrate how QuestDB can be integrated to your data science workflows.
Speaker Xuesong Wang (https://xuesongwang.github.io/)
Abstract TBD
Speaker Nathaniel Forde
Abstract We'll discuss the nature of lagged impact between multiple time series and sketch some details about how we can implement and use VAR models and their hierarchical variants to quantify the nature of these relationships.