I’ve tested every major prompting technique. Here’s what delivers results vs. what burns tokens

By skyforbes Nov 22, 2025 No Comments

As a researcher in AI evolution, I have seen that proper prompting techniques produce superior outcomes. I focus generally on AI and large language models broadly. Five years ago, the field emphasized data science, CNN, and transformers. Prompting remained obscure then. Now, it serves as an essential component for context engineering to refine and control LLMs and agents.

I have experimented and am still playing around with diverse prompting styles to sharpen LLM responses. For me, three techniques stand out:

Chain-of-Thought (CoT): I incorporate phrases like "Let's think step by step." This approach boosts accuracy on complex math problems threefold. It excels in multi-step challenges at firms like Google DeepMind. Yet, it elevates token costs three to five times.
Self-Consistency: This method produces multiple reasoning paths and applies majority voting. It cuts errors in operational systems by sampling five to ten outputs at 0.7 temperature. It delivers 97.3% accuracy on MATH-500 using DeepSeek R1 models. It proves valuable for precision-critical tasks, despite higher compute demands.
ReAct: It combines reasoning with actions in think-act-observe cycles. This anchors responses to external data sources. It achieves up to 30% higher accuracy on sequential question-answering benchmarks. Success relies on robust API integrations, as seen in tools at companies like IBM.

Now, with 2025 launches, comparing these methods grows more compelling.

OpenAI introduced the gpt-oss-120b open-weight model in August. xAI followed by open-sourcing Grok 2.5 weights shortly after. I am really eager to experiment and build workflows where I use a new open-source model locally. Maybe create a UI around it as well.

Also, I am leaning into investigating evaluation approaches, including accuracy scoring, cost breakdowns, and latency-focused scorecards.

What thoughts do you have on prompting techniques and their evaluation methods? And have you experimented with open-source releases locally?

By skyforbes

Chat GPT

I’ve tested every major prompting technique. Here’s what delivers results vs. what burns tokens

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

Which is worse: not knowing what “Footloose” is, or never having watched “Jurassic Park” in its entirety?

Rotary phones were never pocket-dialed.

7000% gain ytd – 12k on googl

15 Things to Draw on Procreate

Archives

I’ve tested every major prompting technique. Here’s what delivers results vs. what burns tokens

Like this:

By skyforbes

Related Posts

15 Things to Draw on Procreate

In English We Say Overthinking – Poetry Quotes

Teslalabs all in one platform

Leave a ReplyCancel reply

You Missed

Which is worse: not knowing what “Footloose” is, or never having watched “Jurassic Park” in its entirety?

Rotary phones were never pocket-dialed.

7000% gain ytd – 12k on googl

15 Things to Draw on Procreate