[R] PKBoost: Gradient boosting that stays accurate under data drift (2% degradation vs XGBoost’s 32%)

By skyforbes Nov 28, 2025 No Comments

I've been working on a gradient boosting implementation that handles two problems I kept running into with XGBoost/LightGBM in production:

Performance collapse on extreme imbalance (under 1% positive class)
Silent degradation when data drifts (sensor drift, behavior changes, etc.)

Key Results

Imbalanced data (Credit Card Fraud – 0.2% positives):

– PKBoost: 87.8% PR-AUC

– LightGBM: 79.3% PR-AUC

– XGBoost: 74.5% PR-AUC

Under realistic drift (gradual covariate shift):

– PKBoost: 86.2% PR-AUC (−2.0% degradation)

– XGBoost: 50.8% PR-AUC (−31.8% degradation)

– LightGBM: 45.6% PR-AUC (−42.5% degradation)

What's ifferent

The main innovation is using Shannon entropy in the split criterion alongside gradients. Each split maximizes:

Gain = GradientGain + λ·InformationGain

where λ adapts based on class imbalance. This explicitly optimizes for information gain on the minority class instead of just minimizing loss.

Combined with:

– Quantile-based binning (robust to scale shifts)

– Conservative regularization (prevents overfitting to majority)

– PR-AUC early stopping (focuses on minority performance)

The architecture is inherently more robust to drift without needing online adaptation.

Trade-offs

The good:

– Auto-tunes for your data (no hyperparameter search needed)

– Works out-of-the-box on extreme imbalance

– Comparable inference speed to XGBoost

The honest:

– ~2-4x slower training (45s vs 12s on 170K samples)

– Slightly behind on balanced data (use XGBoost there)

– Built in Rust, so less Python ecosystem integration

Why I'm Sharing

This started as a learning project (built from scratch in Rust), but the drift resilience results surprised me. I haven't seen many papers addressing this – most focus on online learning or explicit drift detection.

Looking for feedback on:

– Have others seen similar robustness from conservative regularization?

– Are there existing techniques that achieve this without retraining?

– Would this be useful for production systems, or is 2-4x slower training a dealbreaker?

Links

– GitHub: https://github.com/Pushp-Kharat1/pkboost

– Benchmarks include: Credit Card Fraud, Pima iabetes, Breast Cancer, Ionosphere

– MIT licensed, ~4000 lines of Rust

Happy to answer questions about the implementation or share more detailed results. Also open to PRs if anyone wants to extend it (multi-class support would be great).

—

Edit: Built this on a 4-core Ryzen 3 laptop with 8GB RAM, so the benchmarks should be reproducible on any hardware.

Edit: The Python library is now avaible for use, for furthur details, please check the Python folder in the Github Repo for Usage, Or Comment if any questions or issues

By skyforbes

MachineLearning

[R] Why do continuous normalising flows produce “half dog-half cat” samples when the data distribution is clearly topologically disconnected?

skyforbes Nov 28, 2025

MachineLearning

[D]Just submitted: Multi-modal Knowledge Graph for Explainable Mycetoma Diagnosis (MICAD 2025)

skyforbes Nov 28, 2025

MachineLearning

[P] Aeonisk-52: Open RPG testbed with six-tier counterfactual outcomes (dataset + code)

skyforbes Nov 28, 2025

[R] PKBoost: Gradient boosting that stays accurate under data drift (2% degradation vs XGBoost’s 32%)

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

My Catastrophic Experience with Gemini 3.0 / 2.5 (science, programming, honesty, memory…)

I have ran out of games…

Tehran taps run dry as water crisis deepens across Iran

TIL that by 1984, the English band Depeche Mode desperately wanted to shed their youthful image. To do this, they released the song “Master and Servant” – a song about sexual domination and exploitation. Their plan worked so well that some radio stations even refused to play the song.

Archives

[R] PKBoost: Gradient boosting that stays accurate under data drift (2% degradation vs XGBoost’s 32%)

Like this:

By skyforbes

Related Posts

[R] Why do continuous normalising flows produce “half dog-half cat” samples when the data distribution is clearly topologically disconnected?

[D]Just submitted: Multi-modal Knowledge Graph for Explainable Mycetoma Diagnosis (MICAD 2025)

[P] Aeonisk-52: Open RPG testbed with six-tier counterfactual outcomes (dataset + code)

Leave a ReplyCancel reply

You Missed

My Catastrophic Experience with Gemini 3.0 / 2.5 (science, programming, honesty, memory…)

I have ran out of games…

Tehran taps run dry as water crisis deepens across Iran

TIL that by 1984, the English band Depeche Mode desperately wanted to shed their youthful image. To do this, they released the song “Master and Servant” – a song about sexual domination and exploitation. Their plan worked so well that some radio stations even refused to play the song.