Fast and Simple: Ranker fine-tuning + Embeddings + Classifier

Orders of Magnitud Faster and Less than 4% from the Top

These are a couple of quick notes and random thoughts on our approach to Kaggle's Jigsaw - Agile Community Rules Classification competition

TL;R

Jigsaw – Agile Community Rules Classification task: Create a binary classifier that predicts whether a Reddit comment broke a specific rule. The dataset comes from a large collection of moderated comments, with a range of subreddit norms, tones, and community expectations. https://www.kaggle.com/competitions/jigsaw-agile-community-rules .
We use a ranking model for feature extraction (embeddings) and then train a binary classifier to predict whether a comment violates or not a rule on a given subreddit.
We use a 2-phase approach: (i) fine-tune a ranker (ii) use the model to extract embeddings and train a classifier.
Our approach is orders of magnitude faster than LLM-based solutions. Our approach can complete the steps of fine-tuning, classifier training, and inference in a fraction of compute time than LLM-based approaches and yet achieve a competitive 0.89437 (column-averaged) AUC, which corresponds to less than 3.76% below the winning solution (0.92930).
For a production setting a solution like ours could be more attractive since it is easier to set up, cost-effective, and the use of GPU not a hard requirement given that SentenceTransformer models are quite efficient and could run on (parallel) CPU cores with a fraction of a memory footprint than LLM's.

Fine tuning a SentenceTransformer for ranking

We fine-tune a SentenceTransformer model as a ranker. As base model we use multilingual-e5-base
We fine tune the model using a ranking approach: we define a query as the concatenation of the the subreddit and rule, e.g., query = f"r/{subrs_train[i]}. {rules_train[i]}."
For each query the positive and negative examples correspond to the comments violating or not violating the rule for the given subreddit.
We use a ranking loss, namely: MultipleNegativesRankingLoss
Here is a notebook as example on the fine-tuning using ndcg@10 as validation ranking metric.

Using the model and training a classifier

For the competition, we fine tuned the ranking model using ndcg@10, mrr@10and map.
We use these models to extract embeddings for the concatenation of subreddit, rule, and comment text.
As additional feature we use the similarity between the subreddit and rule concatenation vector e,bedding and the comment embedding. The rational of using this extra feature is how the model was fine tune for ranking.
As classifier we used an ensemble. On initial experiments Extremely Randomized Trees was the fastest and best performer. For the final ensemble, besides the ExtraTreesClassifier, we use HistGradientBoostingClassifier, LGBMClassifier, RandomForestClassifier, and a linear LogisticRegressionClassifier model. We experimented with different weights but settle for an equal weighted voting for the final prediction.
The complete code of our final submission can be found in this notebook: 2025-09-11-jigsaw-laila

Final (random) thoughts

It is very interesting to observe how the evolution over the years of text classification Kaggle competitions, and in particular, the ones organized by Jigsaw. The winning solutions of this on ein particular are dominated by the ues of open source LLM's. We did explore this avenue, but the compute resources and iteration time for experimentation were a blocker for us: we simple did not have the time budget to allocate it to our Kaggle hobby :
It is indeed very appealing to give the machine a classification task and let it answer, now need to do much preprocessing, no need to understand how ML classifiers work. This is extremely powerful. Of course fine-tuning is needed and open source models such as Qwen and others allow for this. The use of tools as unsloth make this process feasible even with constrained computational resources.
The compute power provided by Kaggle is OK, but for the time invested in these code competitions, is still limited if bigger models are used. Ideally, higher end GPU's with more memory on the platform, would be a great feature given the expertise and valuable time provided by the competitors.
For us this competition was a great excuse to explore the open source state of the art LLM, fine-tuning techniques (e.g., using unsloth), and how more pragmatic approaches, like ours, can yield a result that could be more practical to deploy and maintain.
The Kaggle community is great, however, a large number of entries of the leaderboard are coming from fork notebooks with minimal or not edit or improvement, for the Kaggle platform one suggestion would be to at least distill or cluster such entries, to help identify the original contributions.

Cheers!

—

Changelog

2025-12-08 16:54:55 UTC: added task overview to TL;R

[P] Fast and Simple Solution to Kaggle’s `Jigsaw – Agile Community Rules Classification`

Fast and Simple: Ranker fine-tuning + Embeddings + Classifier

Orders of Magnitud Faster and Less than 4% from the Top

TL;R

Fine tuning a SentenceTransformer for ranking

Using the model and training a classifier

Final (random) thoughts

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

Customer held me hostage in the McDonald’s parking lot until I proved their nuggets were “buckled in.”

The “Paradox” of beginner distros

We stopped prompt-juggling and built one GPT Director that manages all roles — stable, context-aware, no drift.

4 brain doctors on the small habits they keep for younger brains

Archives

[P] Fast and Simple Solution to Kaggle’s `Jigsaw – Agile Community Rules Classification`

Fast and Simple: Ranker fine-tuning + Embeddings + Classifier

Orders of Magnitud Faster and Less than 4% from the Top

TL;R

Fine tuning a SentenceTransformer for ranking

Using the model and training a classifier

Final (random) thoughts

Like this:

By skyforbes

Related Posts

[D] A contract-driven agent runtime: separating workflows, state, and LLM contract generation

[R] Adopting a human developmental visual diet yields robust, shape-based AI vision

[D] I built a synthetic “nervous system” (Dopamine + State) to stop my local LLM from hallucinating. V0.1 Results: The brakes work, but now they’re locked up.

Leave a ReplyCancel reply

You Missed

Customer held me hostage in the McDonald’s parking lot until I proved their nuggets were “buckled in.”

The “Paradox” of beginner distros

We stopped prompt-juggling and built one GPT Director that manages all roles — stable, context-aware, no drift.

4 brain doctors on the small habits they keep for younger brains