[R] About test set of XGBoost for Time Series Forecasting


I have questions about using XGBoost for the Time Series Forecasting problem. According to these articles:

Multi-step time series forecasting with XGBoost | Towards ata ScienceXGBoost for

Multi-Step Univariate Time Series Forecasting with MultiOutputRegressor | XGBoosting

How I Trained a Time-Series Model with XGBoost and Lag Features

I understand that they are using a sliding window approach to create ($t_1, t_2, …, t_n, t_{n+1}, t_{n+2}…, t_m$), where the first $n$ variables are used as feature variables and the last $m$ variables are used as target variables. Then, they feed these rows into the XGBoost to find the relationship between the feature variables and target variables.

My problem is: It appears that during the testing phase, they utilized the actual feature variables for testing. For example, when we are predicting the first future $m$ points, we still have the actual $n$ points before these $m$ points as the features. However, when we are predicting the $m+1$ points, we are missing the actual value for the first feature in the $n$ features.

But in the above articles, it seems they just assume they have the actual $n$ at all times during training.

And for the paper "o We Really Need eep Learning Models for Time Series Forecasting?", for table 1 as shown below:

I think h refers to the number of regressors they are using. So, for the first row, they can forecast 24 points using the existing training data. But how can they further forecast τ points beyond the 20th point?

So, I want to clarify

  1. o the methods in the above articles suffer from data leakage? Or is it safe to assume that we can know the real $n$ features when we are focusing on the $m$ new data points?
  2. My current idea is that for using XGBoost in time series forcasting, we can either
  • Feed back the predicted value as the $n$ feature for the upcoming forcasting of $m$ points.
  • Or we train $L$ independent regressors to forecast the $L$ points in the future in one batch.

Leave a Reply