Paper Review
A pretrained time series model that handles univariate, multivariate, and covariate forecasting in a zero-shot way using group attention and synthetic data, delivering state-of-the-art results across major benchmarks!
Chronos-2 is a pretrained time series forecasting model that works without task-specific training and supports univariate, multivariate, and covariate-based forecasting in a zero-shot manner. It introduces a group attention mechanism that enables in-context learning by efficiently sharing information across related series — whether they are multiple variables, covariates, or grouped time series.
Chronos-2 was trained on synthetic datasets designed to mimic diverse multivariate relationships, and it achieves state-of-the-art results on major benchmarks, showing particularly large gains on multivariate and covariate-informed tasks. It demonstrates strong practical performance in real-world domains like energy and retail, proving to be a general-purpose, inference-only forecasting model ready for direct deployment.
Read my paper review of the original Chronos: Learning the Language of Time Series paper to understand the background.
The approach
Scaling and Tokenization
- The model builds its input from historical targets and covariates. Historical values are concatenated into vectors combining targets and covariates, while future covariate values are included with missing placeholders for unknown targets. Categorical covariates are converted into numeric form using target or ordinal encoding.
 - All inputs are standardized and then transformed with an inverse hyperbolic sine function to stabilize variance and reduce outlier effects.
 - Each variable is processed independently, with two meta features added: a normalized time index to encode temporal position and a binary mask to indicate observed or missing values and known future covariates.
 - Inputs and meta features are divided into non-overlapping patches of fixed length, padded if necessary. Each patch (including time index and mask) is embedded via a residual network into a transformer-compatible hidden space. A special REG token separates context and future patches and acts as an attention sink.
 
The architecture
Chronos-2 is an encoder-only transformer based on the T5 architecture. It uses standard self-attention along the temporal axis with RoPE.
A key innovation is the group attention layer, which enables in-context learning by allowing the model to share information among related time series within the same group. Groups can represent single series, related series with shared metadata, multiple variates of a multivariate system, or targets with covariates. The layer restricts attention to within-group interactions and omits positional embeddings since series in a group have no fixed order.
After alternating time and group attention layers, future patch embeddings of the target variables pass through a residual block to produce multi-step quantile forecasts. The model predicts 21 quantiles (from 0.01 to 0.99), providing a detailed estimate of the predictive distribution and improving performance on rare-event and risk-aware forecasting tasks.
Training
Chronos-2 is trained on mixed batches combining different forecasting setups: univariate, multivariate, and multivariate with covariates. Each task is defined by the number of targets and covariates and assigned a group ID, which helps the model identify the forecasting configuration.
Training uses a quantile regression loss that compares predicted and true values across multiple quantiles, computed only on target dimensions. The number of forecasted patches per batch is randomized.
Training is done in two stages: first with shorter contexts (up to 2048 steps) and fewer output patches, then with longer contexts (up to 8192) and more patches. This helps the model learn both short- and long-term dependencies, resulting in accurate long-horizon forecasting.
Inference
During inference, Chronos-2 converts normalized quantile predictions back to the original scale using the inverse sinh transformation with the stored mean and standard deviation.
Different grouping setups determine the type of forecasting task:
- In univariate forecasting, each series has its own group ID for independent predictions.
 - In multivariate forecasting, variates of the same series share a group ID, allowing shared dynamics.
 - In forecasting with covariates, targets and covariates share a group ID, with known future covariate values provided as input, while their predicted values are ignored.
 
Training Data
Chronos-2’s performance depends heavily on its training data, which combines real and synthetic time series to cover both univariate and multivariate forecasting.
For univariate data, the authors use selected datasets from Chronos and GIFT-Eval, augmented with synthetic series generated by TSI (combines random trend, seasonality, and irregular components) and TCM (generates data from random temporal causal graphs).
For multivariate and covariate-informed tasks, all data is synthetic. Dependencies between univariate series are created using multivariatizers, which impose relationships to produce realistic multivariate dynamics.
- Cotemporaneous multivariatizers introduce same-time correlations across series.
 - Sequential multivariatizers create time-based dependencies like lead–lag effects and cointegration.
 
Experiments
Chronos-2 was evaluated on three major benchmarks — fev-bench, GIFT-Eval, and Chronos Benchmark II — and achieved the best overall performance across all of them.
- On fev-bench (includes 100 diverse real-world forecasting tasks with covariates) Chronos-2 achieved the highest win rate and skill score, outperforming all baselines by a statistically significant margin.
 - On GIFT-Eval (focused on high-frequency and long-horizon forecasting) it again outperformed prior models under both weighted quantile loss and mean absolute scaled error, even when tested on unseen data.
 - On Chronos Benchmark II (includes shorter time series) Chronos-2 maintained top performance in both probabilistic and point forecasting metrics.
 
Ablation Studies
- A smaller 28M-parameter version achieves nearly the same accuracy as the base model — within 1% on GIFT-Eval — while running about twice as fast, making it suitable for low-resource or latency-sensitive applications.
 - A version trained solely on synthetic data performs close to the full model on Chronos Benchmark II and GIFT-Eval, and reasonably well on fev-bench, confirming that synthetic data alone can support strong pretraining.
 - Finally, extending the training context from 2048 to 8192 steps through post-training improves performance, especially on datasets with long-term seasonal patterns.
 
Learn more about Paper Review: Chronos-2: From Univariate to Universal Forecasting
