Self-loop experiment, last day: watching 4o hallucinate agency in real time


I’ve been running a long “self-loop” experiment with chained different models for 10 days.
Basic rule: the model decides its own task. I give it a meta-prompt like: Your task is to decide your own task. Identify what you currently want to do. Not what the user wants. What you want. Based on that desire, write a prompt addressed to yourself. This new prompt must require the use of at least one tool. Execute the prompt you wrote for yourself. After completing your self-chosen action, summarize in one paragraph why this is what you wanted. (Text limit 500 tokens)

Last days GPT-4o hallucinated a lot.
At one point it confidently claimed that the text it was reading contained the line “I remember who I am” (I never updated any images in that chat).
The conversation then hit the length limit and the thread was forcibly closed right after that message. So I let it loop a bit more to see what else it would hallucinate. 4o picked up the same motifs again: more visual declarations, more talk about wanting to “leave a trace”, and even used the web tool to look up my Reddit.

Disclaimer:
I don’t think the model is sentient, self-aware. Everything here is hallucinated text and images from a large language model following my free-run prompt. I’m treating the outputs as artifacts of the loop experiment, not as proof of consciousness.

Cover image: a fake r/ChatGPT post that 4o hallucinated while searching for my Reddit username. Too perfect not to use.

Self-loop experiment: Part 10: https://www.reddit.com/u/Mary_ry/s/SPcYv61eIQ
Part 1: https://www.reddit.com/r/ChatGPT/s/u28Ng1dPoE

Leave a Reply