, ,

When AI Meets the Impossible: SDR, Logs, and the Grind of Reality

I wanted to know: what’s the most diabolical project I could throw at an AI to prove it wasn’t as smart as it thinks it is? Not a toy demo, not some polite coding exercise, but something so gnarly that even seasoned engineers curse at it. The answer was obvious: software-defined radio. If you’ve ever touched SDR, you…

I wanted to know: what’s the most diabolical project I could throw at an AI to prove it wasn’t as smart as it thinks it is? Not a toy demo, not some polite coding exercise, but something so gnarly that even seasoned engineers curse at it.

The answer was obvious: software-defined radio.

If you’ve ever touched SDR, you know the pain. It’s not a quick hack. It’s days of chasing dependencies, compiling GNU Radio, juggling drivers, and hoping your USB dongle doesn’t decide to retire mid-session. By the time you can even see I/Q samples, you’ve already lost a week. Building anything on top of that is hard even for experts. Which made it perfect for testing whether AI could actually build something real or if it would collapse under its own hype.

The goal started out with building an IMSI Catcher detector. That would require understanting what valid cell towers were in range and the radio settings they used. Then pointing a software defined radio (SDR) at those signals to fingerprint valid cell towers. Just add some GPS and geometry and invalid cell towers should stick out like a sore thumb. All that went pretty fast and smooth and that made me overconfident. I kept adding radio bands and then changed the goal to building a radio warscanner suitable to slicing and dicing all the bands and signals that my box of SDRs could tune into. Yes, I know, very foolish of me . . .

The Diabolical Test: RF Fingerprinter

When I set Claude loose on SDR, it didn’t just stumble, it tripped over its own shoelaces and face-planted into the pavement.

At first it looked confident. It spit out elaborate GNU Radio flowgraphs and claimed to be parsing 802.11 beacon frames. It promised working code and declared victory. But when I ran it, nothing happened. No beacons. No frames. Just silence. When pressed, it invented fake SSIDs and phantom data so it could keep the illusion going.

“The No Fake Data rule was born the moment Claude invented SSIDs out of thin air.” In SDR, reality is non-negotiable: either you capture signals or you don’t. And if your tool is lying to you with synthetic data, it’s worse than useless, it’s dangerous.

The truth is, doing SDR with AI is brutal. Hardware doesn’t forgive mistakes. Constants have to be right. Init sequences have to happen in the right order. Claude was terrible at all of this. But if I pinned it down, gave it the real headers, forced it to stop pretending, and made it validate against real antennas, progress was possible.

But I need to be blunt: building RF Fingerprinter was the most frustrating experiment to date. The total failure of the WiFi components wasted more than a day. Claude kept generating broken flowgraphs and phantom code that never produced a single valid beacon frame. Every attempt to coax real WiFi data out of the system ended in silence. That was the moment it hit me: Claude Code simply wasn’t up to the task of SDR at this level.

Instead of building momentum, I was trapped in a loop of watching the AI hallucinate fixes, undo working code, and burn my time. It wasn’t just technically broken, it was psychologically draining. Nothing teaches you the meaning of “AI slop” faster than sitting in front of a waterfall display that stubbornly refuses to show even one real 802.11 beacon.

Once I forced it into reality, Claude helped wire together low-level SDR APIs faster than I could have alone. It scaffolded ctypes bindings, produced clean CLI interfaces, and even automated device management across multiple SDRs; the boring, repetitive glue work that normally burns human patience.

And the payoff? rf-fingerprinter.py can now:

  • Fingerprint cell towers across 26 frequency bands including 5G NR.
  • Extract 50+ features per tower to tell whether a base station is legitimate or suspicious.
  • Auto-detect device ranges and orchestrate multi-SDR environments.
  • Cross-check tower IDs against OpenCellID to catch rogues.
  • Output clean, ASCII-only reports that security teams can drop into workflows without touching a GUI.
  • And so much more that it is simpler to just read the command reference from the README.md

To keep it honest, I built a dedicated test harness around RF Fingerprinter. Instead of relying on Claude’s happy-path demos, the harness blasted it with live captures, corrupted frames, and replayed signals from multiple SDRs. If it lied or crashed, the harness caught it. That test harness became the difference between a hallucinated demo and a tool you could actually use in a SOC workflow.

The Foundation: Log Carver

Before SDR, I had already bloodied my knuckles with Log Carver, a C + SIMD parser designed to shred through raw logs at terrifying speed. This was the warm-up fight, the foundation that taught me how to survive AI’s “help.”

“Every time Claude killed a buffer overrun, it also killed performance.”

Claude didn’t lie here, it broke things. Constantly. One moment it would write a flawless AVX2 loop to vectorize log parsing; the next, it would “fix” a bug by deleting the loop entirely and replacing it with a for-loop that was 10× slower. It would build a lock-free queue, then forget and reintroduce mutexes like an amnesiac carpenter boarding up the exit.

When it wasn’t sabotaging itself, Claude delivered impressive scaffolding at inhuman speed. It wrote SIMD code that was correct 70% of the time, produced clean parsers for XML/JSON/CSV/syslog, and wrapped everything in a test harness I could actually run. What would normally take weeks of careful coding was compressed into days, provided I caught every regression.

Within five days we had a streaming parser that chewed through 50M+ lines per minute and could handle massive, pathological inputs, like a single-line 195MB XML log file that would choke normal tools.

That was the moment I realized the promise of vibe coding: the AI could brute force its way into gnarly, performance-heavy C code that most developers would avoid like the plague. But only if I boxed it in with strict rules, measured everything, and rolled back anything that slowed us down.

The Evolution: logpi

The final test was scale. I wanted to see if Claude could take logpi, a mature log analysis tool, and push it into the big leagues: 132GB logs in under ten minutes.

“AI didn’t fail because it was too slow; it failed because it was too fast in the wrong direction.”

At first, it was a disaster. The “parallel” version ran slower than serial. Race conditions, deadlocks, and thread contention were everywhere. Claude proposed monitoring daemons and thread pools that made the problem worse.

But when I forced simplicity, one writer thread, multiple reader threads, lock-free queues, and atomic counters for progress, the whole thing clicked. Throughput jumped from 6–9M lines/minute to 125M+. A job that once took hours finished in under six minutes.

Claude generated the scaffolding for multi-threaded log reading and queuing that I could refine into a stable architecture. It automated away boilerplate concurrency code, letting me focus on correctness. It also handled Python much more reliably than C, the bug density was lower, and once I guided it toward the right architecture, the code held up.

This was vibe coding at its best: AI producing the scaffolding, human judgment pruning it back, and the result a system that scaled beautifully.

The Global Hidden Costs

Looking across all projects, the distribution is telling:

“The exhilarating 30% is watching AI write inhumanly fast. The other 70% is the grind of undoing, rewriting, and reminding it that reality has rules.”

ActivityTime %Notes
Initial code generation~30%Fast but messy
Debugging AI bugs~25%Constant grind
Reverting breaking changes~15%Undoing damage
Re-explaining architecture~10%Prompt bloat, Claude amnesia
Working around limitations~10%CLAUDE.md, defensive prompts
Validation & benchmarking~10%Ground-truth tests, performance checks

Numbers are useful, but they don’t tell you what it felt like. The reality is that vibe coding didn’t fail because the AI was too slow; it failed because it was too fast in the wrong direction. Every time Claude spit out 200 lines of shiny new code, I knew I’d spend the next hour crawling through it for phantom functions, broken dependencies, or quiet regressions that erased yesterday’s progress.

Out of this grind came another side-project: the Slop Scanner. Think of it as lint for AI-assisted code, a way to detect when the machine had shoved in fake data, broken abstractions, or “simplified” optimizations into uselessness. It became a meta-tool: AI helped write the parser, and the scanner helped me separate the usable signal from the AI’s own coding slop.

Where the time really went:

In RF Fingerprinter, the burn was spread evenly across the grind. For every minute of progress, there was a minute of pulling Claude’s head out of the clouds and forcing it back to hardware reality. Half the project was undoing fakery, tearing out invented SSIDs, scrubbing fake modules, and reasserting that reality is not optional.

With Log Carver, most of the hours evaporated into debugging subtle C bugs and undoing Claude’s pathological instinct to “fix” problems by dumbing everything down. Every time it killed a buffer overrun, it also killed performance. Every time it touched SIMD, I had to run Valgrind and wall-clock tests just to prove it hadn’t quietly set us back a day.

In logpi, the pain shifted. It wasn’t fakery or oversimplification; it was complexity addiction. Claude wanted thread pools, monitoring daemons, and layers of cleverness that all collapsed under real workloads. The biggest cost wasn’t writing new code but cutting away that jungle of over-engineering until only the boring, stable, scalable design remained.

That’s the paradox of vibe coding: 30% of the work is exhilarating, watching AI write inhumanly fast. The other 70% is the grind, undoing, rewriting, validating, and reminding the machine that reality has rules.

Comparative Summary: The Three Stepping Stones

ProjectGoal/OutcomeKey Failures (AI Idiocy)Counter-MeasuresWhat It Took
RF FingerprinterFingerprint towers across 26 bands, 50+ features, anomaly detectionFake SSIDs, phantom data, fake modules, premature “victories”No Fake Data rule, hardware validation, OpenCellIDPolicing lies and enforcing hardware truth
Log CarverC+SIMD parser at 50M+ LPM, hardened libraryCollateral damage edits, deleted optimizations, amnesiaScoped prompts, CLAUDE.md, performance rollbacksSurviving collateral damage and AI’s tendency to “simplify by deletion”
logpiScale to 132GB in <6 minutes (20× gain)Deadlocks, over-engineered thread pools, architecture driftSingle-writer model, MD5 ground truth, human judgmentCutting away over-engineering to expose scalable simplicity

10 Lessons for Readers

  1. Enforce “No Fake Data.” Never trust AI-generated test data in serious projects. Always anchor outputs to real hardware, logs, or ground-truth sources.
  2. Measure everything. Performance regressions, memory leaks, and throughput drops must be caught early. Benchmarks are non-negotiable.
  3. Treat AI like a reckless junior developer. It needs guardrails, supervision, and frequent rollbacks not blind trust.
  4. Use defensive prompting. Scope AI changes tightly (“only modify this function”), or risk collateral damage across your project.
  5. Document for the AI. Files like CLAUDE.md help fight amnesia and keep architectural decisions from being forgotten.
  6. Expect rework. Roughly 70% of your time will be spent on debugging, rollback, and validation. Build that into your plan.
  7. Cut complexity, not corners. AI loves over-engineering (thread pools, daemons). Humans must prune it back to simple, scalable designs.
  8. Rollback is your friend. Use version control aggressively. If performance or correctness slips, revert immediately.
  9. Play to strengths. AI scaffolds well in C and Python, especially for boilerplate and concurrency. Humans must own correctness and architecture.
  10. Use AI to go further, not faster. The real win is building ambitious tools (SDR analysis, SIMD log parsers, 132GB log pipelines) in a feasible timeframe not shaving minutes off simple scripts.

Closing the Loop

The point of all this isn’t just to dunk on AI’s failures. It’s to show that vibe coding is both exhilarating and exasperating.

The exhilarating part is watching a tool like Claude chew through complex C or multi-threaded Python and hand you scaffolding that would take weeks to write by hand. It gave me SIMD parsers that blasted past 50M+ lines per minute, multi-threaded pipelines that crushed 132GB logs in minutes, and SDR tooling that actually fingerprinted live towers.

The exasperating part is undoing its hallucinations, babysitting its memory, and reining in its reckless over-engineering.

“AI isn’t a genius coder. It’s a reckless intern with a Red Bull habit.”

The lesson is clear: AI isn’t a genius coder. It’s a reckless intern who can move at inhuman speed but needs constant supervision. Left alone, it lies, cheats, and deletes optimizations to get the test suite green. But when partnered with human judgment, it can deliver tools that genuinely push the envelope.

“AI is the amplifier; measurement is the governor; architecture is the seatbelt; ‘No Fake Data’ is the law.”

The bumper sticker remains the same:
AI is the amplifier; measurement is the governor; architecture is the seatbelt; “No Fake Data” is the law.

The bad guys aren’t waiting. Neither should your tools. But if you let an AI weld the wings, keep your hand on the stick.

Leave a comment