# [Bug]: `python.exec` fails to start on GPT-5.1 Thinking, works on GPT-5 Thinking

Product: ChatGPT web
Models compared: GPT-5.1 Thinking vs GPT-5 Thinking
Tooling: Built-in python sandbox (single execution)
Severity: High (blocks entire class of workflows that rely on the Python tool)
Status: Reproducible on demand

Summary

Running the same prompt with the same attached files produces opposite outcomes depending on the model. On GPT-5 Thinking the python tool executes and returns the expected audit results. On GPT-5.1 Thinking the python tool fails at a system level before user code begins, so there is no Markdown output, no JSON object, and no file reads. The failure repeats within the same chat and across fresh chats.

Environment

ChatGPT Web, desktop browser (Windows 10 LTSC 2019, Chrome 142)
Region: Spain
Model A: GPT-5.1 Thinking → fails
Model B: GPT-5 Thinking → succeeds
Same account, same session style, same files, same prompt

Files attached to the chat

A small Windows automation project, text-only, UTF-8, mix of .ps1, .cmd, .md, .xml, .gitattributes, for example:

CreatePrimaryAdmin.ps1, BootstrapLocalAdmin.ps1, SetupComplete.cmd, PreOOBE.cmd
Autounattend.xml, .gitattributes
README.md, AGENTS.md, DECISIONS.md, SECURITY.md, CONTRIBUTING.md, INTERACTION_CONTRACT.md, BACKGROUND.md, LICENSE

The exact file set is not critical. The failure reproduces as long as several text files are attached.

Minimal prompt used

The prompt asks the model to call python.exec exactly once and run a safe, defensive audit script. The code:

enumerates files in /mnt/data
reads each file with robust multi-codec decode
computes line counts and EOL style
prints a human Markdown report and then prints a JSON object
swallows exceptions into report["errors"]

No network. No writes. Pure read-only. Single tool call.

Note: For triage, an even smaller repro also fails on GPT-5.1 Thinking, for example a one-liner print("ok") in a single python.exec call inside a fresh chat with the same files attached.

Steps to reproduce

Start a new ChatGPT chat.
Attach the project files listed above.
Select GPT-5.1 Thinking.
Send the prompt that performs one python.exec run with the safe audit.
Observe the result.
Repeat in a second fresh chat with GPT-5 Thinking and the same prompt and files.

Expected behavior

The python tool should start the sandbox, run the user code, and print two blocks:

a concise Markdown audit
a JSON object with file-level details and any soft errors

Actual behavior

On GPT-5.1 Thinking, the python tool fails before any user code executes.
- No Markdown is printed.
- The JSON object is never created.
- No file under /mnt/data is actually read for that run.
- UI shows “message stream error” bubbles.
On GPT-5 Thinking, the same prompt and files succeed and return the expected Markdown and JSON.

Repro frequency

Always in my tests on GPT-5.1 Thinking, including:
- a clean, non-project chat with only the attached files and the prompt
- repeated runs inside the same chat
- repeated runs across brand new chats

Impact

Blocks audits, data inspection, quick ETL, and any workflow that depends on the Python tool while using GPT-5.1 Thinking.
Users get silent failure or “message stream error,” with no actionable diagnostics.

Evidence (screenshots to attach)

python_exec fail translation.jpg
Shows GPT-5.1 Thinking chat where the assistant explicitly reports that the python tool hit a system-level error before user code started, so no Markdown or JSON exists.
5 works arrow.jpg
Shows GPT-5 Thinking with the same files and prompt. The audit runs and returns a structured summary.
5.1 fail arrow.jpg
Shows GPT-5.1 Thinking with the same files and prompt. Two red “message stream error” bubbles appear. No output is produced.

Notes and hypotheses

The behavior suggests a sandbox or tool-bridge failure specific to GPT-5.1 Thinking, not user code.
File sizes are small. No long-running compute. No stdout flooding.
The prompt enforces a single python.exec call, so it is not a multi-run limit.
The exact same script and files work on GPT-5 Thinking, which strongly isolates the issue to the 5.1 tool integration.

Temporary workaround

Switch the chat to GPT-5 Thinking for tasks that need the Python tool.
Keep the same prompt and attachments. Execution succeeds there.

What would help from the team

Confirm whether GPT-5.1 Thinking uses a different Python tool gateway than GPT-5 Thinking.
Server-side trace for the failing chat shows the tool session creation and crash reason.
If limits differ for 5.1, please document them in the UI or tool docs.

# [Bug]: `python.exec` fails to start on GPT-5.1 Thinking, works on GPT-5 Thinking

Summary

Environment

Files attached to the chat

Minimal prompt used

Steps to reproduce

Expected behavior

Actual behavior

Repro frequency

Impact

Evidence (screenshots to attach)

Notes and hypotheses

Temporary workaround

What would help from the team

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

33+ Free Baby Blanket Knitting Patterns | Knitting Women

Program Manager II – PMIS System Administrator

Gpt 5.1 being wild

Archives

# [Bug]: `python.exec` fails to start on GPT-5.1 Thinking, works on GPT-5 Thinking

Summary

Environment

Files attached to the chat

Minimal prompt used

Steps to reproduce

Expected behavior

Actual behavior

Repro frequency

Impact

Evidence (screenshots to attach)

Notes and hypotheses

Temporary workaround

What would help from the team

Like this:

By skyforbes

Related Posts

Gpt 5.1 being wild

One Prompt to Rule Them All: How I Made Cursor, Claude & ChatGPT Code Better

Chatgpt feeding my Ego (Philosophy)

Leave a ReplyCancel reply

You Missed

33+ Free Baby Blanket Knitting Patterns | Knitting Women

Program Manager II – PMIS System Administrator

Gpt 5.1 being wild