, ,

I Made AIs Fight Each Other in a 1985 Game

Let me tell you a story about nostalgia, hubris, and the moment I realized my entire “AI battle arena” was being held hostage by one missing word in a regex. CROBOTS is one of those artifacts that should not have survived into modern brain space. A 1985 DOS game where you write C code to…

Let me tell you a story about nostalgia, hubris, and the moment I realized my entire “AI battle arena” was being held hostage by one missing word in a regex.

CROBOTS is one of those artifacts that should not have survived into modern brain space. A 1985 DOS game where you write C code to control little tanks in a square arena. You scan, you move, you shoot. No fancy graphics. No plot. Just you, a compiler, and the quiet satisfaction of watching your logic turn someone else’s robot into scrap. I loved it because it was honest. If your code sucked, you lost, and there was no one to blame but yourself.

So, naturally, I decided to ruin that purity with 2025 energy.

The idea was simple in the way that all expensive ideas start. What if instead of humans writing the robot programs, we let modern AI models do it? Claude, GPT, Gemini, Grok. Same rules, same arena, same inputs. Let them generate strategies and code, then throw them into a tournament and see who crawls out smoking.

In my head, this was going to look like emergent tactical genius. Four sophisticated models adapting in real time, learning arcs, clever feints, maybe even the occasional “why didn’t I think of that?” moment. I had the mental trailer already cut.

Day one of reality opened with Gemini’s robot sitting motionless in a corner, executing drive(degree=0, speed=0) every frame like it had joined a union strike. Claude circled it and drilled it like target practice. GPT tried to turn at full speed in classic mode, which is like asking a 1985 Toyota to drift a hairpin at Le Mans. It didn’t drift. It just politely introduced itself to the wall. Grok, never one to be boring, happily incremented its patrol heading into the thousands because I forgot to normalize angles. One of my robots was not patrolling so much as achieving spiritual enlightenment through endless rotation.

If you’ve ever built anything with AI assistance, you already know this feeling. The scaffolding comes fast. The disappointment comes faster.

Here’s the part people like to imagine as “the pipeline.” I fed each model the battlefield state and the CROBOTS rules. They produced a natural-language strategy, I ran it through a validator to make sure it resembled a real plan, then translated that plan into executable robot code. If it failed validation, it got a stern slap on the wrist and three tries to clean up its act. Then I ran the program in a sandbox so an AI hallucination couldn’t wander off and set fire to my file system. It sounds clean when you say it out loud. It felt clean when I wrote the first thousand lines.

Then the bugs showed up. Not the dramatic kind that blows up in your face. The quiet, soul-eating kind that wastes hours while you stare at a screen wondering if you’ve forgotten how numbers work.

The first one was my fault in the classic, human way. I had patrol movement incrementing heading by 20 degrees a frame. Perfectly reasonable. Except I never wrapped it back to 0–359.  So, the patrol sequence went 20, 40, 60, 360, 380… 1000… 4000 degrees. It worked mathematically, in the sense that math doesn’t care about your feelings. It did not work in the sense that the game engine does not interpret “4000 degrees” as intent. One modulo later and the robots stopped doing interpretive dance. Two hours of my life disappeared before I noticed.

The next bug was more subtle, which is a polite way of saying “it made me angry.” Gemini kept losing. Not losing like “bad strategy.” Losing like “stationary target.” In natural language it kept saying things like “circle the southern edge while scanning for enemies.” My parser, fat and happy with a handful of keywords, didn’t recognize “circle.” It fell through to a default movement case that, brilliantly, told the robot to stop moving entirely. I watched fifteen consecutive Gemini executions that looked like a protest performance piece before I finally opened the debug JSON and saw the literal drive(0,0) loop. That’s when I expanded my regex to include circle, perimeter, edge, traverse and a few other synonyms. I also changed my default to “slow patrol” because any movement beats dying in place. The lesson was not “Gemini is dumb.” The lesson was “my natural-language assumptions were optimistic guesses with teeth.”

Then there was the patrol angle bug that didn’t crash and didn’t warn. GPT and Grok both referenced $patrol_angle in their movement logic, but neither defined it. The program didn’t explode. It just quietly defaulted to zero and never incremented, so the robot “patrolled” by staring east forever. Silent failures are where AI projects go to die because you can’t debug what doesn’t announce itself. I ended up auto-detecting the variable and inserting it when needed. It took me a couple hours to spot because it only showed up under certain movement patterns.

And yes, of course Windows got its shot in. Claude used a right-arrow character in a strategy. Windows defaulted to a legacy encoding and threw a charmap error. In 2025. On plain text. I added UTF-8 file writes everywhere and moved on, mildly older and significantly more bitter.

By the end of the day, the actual breakdown looked like this: maybe 20 percent of my time was AI-assisted code generation, and the rest was me wrestling with emergent behavior and validation edge cases. I’m not saying Claude Code didn’t help. It did. It got me to a running engine fast. But generation is not completion. The hard part was turning a pile of plausible-looking output into something that moved, fired, and survived contact with physics.

Once the arena started behaving, a weird thing happened. The models developed personalities.

Claude fought like a hunter. Aggressive scanning, predictive firing, and a willingness to take damage if it meant closing distance. GPT was a camper. Defensive corners, patience, and a nasty habit of waiting until the other bot overcommitted. Gemini, once I taught my parser the word “circle,” moved like a dancer. It dodged, flowed, and somehow made itself annoyingly hard to hit. Grok was chaos on legs. Sometimes brilliant, sometimes baffling, always capable of a match that made me laugh out loud because I couldn’t tell if it was genius or an accident. None of this was hardcoded. Same prompts, same rules, same translation layer. Different strategic instincts, and consistent enough across dozens of runs that I stopped calling it variance.

Debugging that kind of system is its own special problem. You can’t just step through it like normal software because the failures are born in the gap between AI strategy, translation logic, and game physics. So, I logged everything. Natural-language plans, generated firmware, full API payloads, match telemetry, frame-by-frame command traces. When a robot did something wrong, I didn’t speculate. I traced it. When Gemini froze, the JSON told me. When angles spiraled out of range, the logs told me. Reality doesn’t negotiate. Either the bot moves or it doesn’t. Either the strategy produces behavior, or it dies as words on a screen.

Then I ran into the problem nobody budgets for: API economics. Real-time AI decision making, where models think every few frames, sounds cool until you do the math. A single long match would mean hundreds of calls per robot. A tournament would mean tens of thousands. That’s when classic mode stopped being nostalgic and became the only financially sane option. Have the AI generate a complete program at match start, validate it, then let it run autonomously. One call per robot per match instead of a slow leak of cash. Suddenly a tournament cost pennies instead of a dinner out.

If I did it again, I’d start with structured schemas instead of trying to parse English with regex like it’s 1999. I’d build visual debug overlays from the start instead of hours in. I’d use deterministic start positions while debugging so I could isolate failures without rolling dice every run. I’d assume rate limits will happen and include backoff from the first commit. None of that is profound. It’s just what you learn the hard way when you let models drive in a world that punishes fantasy.

After all that grinding, the thing finally worked the way I wanted. Four AI models battling in classic mode, with autonomous programs running like it’s 1985 all over again. The strategies started producing actual combat instead of comedy. Robots scanning in sweeping arcs, firing predictive shots at where the enemy would be instead of where they were, learning to slow down before turning so the classic CROBOTS physics wouldn’t lock them into a straight line. The system worked. Not perfectly, AIs still occasionally generated weird corner-case strategies, but well enough that matches felt like actual tactical combat instead of random chaos.

So, what did I learn besides the fact that I still forget modulo sometimes?

Simple rules create complex play if the choices matter. AI personalities are real in the only sense that counts: they show up consistently in behavior. Visualization beats guesswork every time. Validation matters more than trust. And documentation is oxygen for these systems. If your spec is vague, your AI will hand you vague nonsense with confidence. If your spec is precise, you get code that at least has a chance.

The uncomfortable truth is this: AI coding tools amplify both progress and damage. When they work, they are incredible force multipliers. When they fail, they fail quietly and at scale. The fix isn’t to trust them less. The fix is to measure harder. Run the code. Watch the behavior. Read the logs. Verify against expectations. Fix what breaks. Measure again. It’s the same loop you use with human code. AI just lets you hit that loop faster, and sometimes with bigger bruises.

Anyway, the arena is alive now. It sits in that sweet spot between honoring an old programming game and stress-testing modern AI in a space where hallucinations get shot in the face. Classic mode works. The inspired mode with real-time AI thinking, shields, and mines? Still on the todo list. The battle continues, and I’ve got ideas for when I have another day to burn: team fights, custom arenas, replay analysis, live tournaments, maybe even learning opponents over time. I don’t know if any model will discover strategies humans never did, but I know this setup is a clean way to find out.

Bottom line: CROBOTS was right all along. Reality is the ultimate debugger. Either your robot moves, or it dies. Either your system works, or the arena turns your optimism into smoke. That’s not a complaint. That’s the whole point.

Leave a comment