When Bill Gates Said AI Can't Write "Real" Software, I Decided to Test Him

Here are some things I believed about artificial intelligence and coding before I spent a week trying to recreate Bill Gates' greatest achievement: AI was getting scary good at programming. GitHub Copilot could autocomplete entire functions. ChatGPT could debug my Python scripts faster than I could Google the error messages. The robots, I figured, were coming for the software engineers.

Then, two weeks ago, I watched Gates sit across from Fareed Zakaria on CNN's GPS and calmly demolish that assumption. Yes, he said, AI can handle simple coding tasks. But when it comes to complex software development? It's still "falling short."

The word "complex" hung in the air like a challenge.

I'd been reading about Gates' Harvard years, when he and Paul Allen disappeared into dorm rooms for weeks at a time, emerging with something that shouldn't have been possible: a BASIC interpreter for the Altair 8800. No Stack Overflow to consult. No GitHub repositories to fork. No precedent for what they were building. Just two college students trying to teach a machine with less memory than a modern smart toaster to understand human-readable commands.

"Complex," Gates had said. I wondered: what did that actually mean in 1975? And could Claude—Anthropic's most capable AI—pull off the same feat that launched Microsoft?

I was about to find out exactly how wrong I was about everything.

The Dunning-Kruger Moment

My first mistake was thinking this would be easy.

Look, I've been writing about technology long enough to know that nothing is ever as simple as it seems. But BASIC? Come on. It's called "Beginner's All-Purpose Symbolic Instruction Code" for a reason. How hard could it be to recreate a programming language from 1975?

My prompt to Claude was characteristically overconfident: "Create a BASIC interpreter in x86-64 assembly language." I wanted everything—line numbers, variables, PRINT statements, GOTO commands. The full retro computing experience, delivered by an AI that had presumably read every programming tutorial on the internet.

Claude responded with the kind of breezy confidence that should have been my first warning sign. It spat out hundreds of lines of assembly code, complete with sophisticated data structures for expression evaluation, FOR loop stacks, and memory management systems that looked like they belonged in a computer science textbook.

section .bss
    program     resb 8000       ; Ambitious program storage  
    variables   resd 26         ; Variable storage A-Z
    expr_stack  resb 256        ; Expression evaluation stack
    for_stack   resb 256        ; FOR loop stack

It looked professional. Impressive, even.

It was also completely broken.

The code wouldn't even compile, let alone run. We'd stumbled face-first into what I now think of as the "tutorial trap"—essentially a version of the Dunning-Kruger effect—the assumption that because something looks straightforward from the outside, implementing it must be equally simple.

Try It Yourself: The Reality of Assembly Programming

Curious about how this actually played out? You can experience our journey firsthand:

# Clone the complete experiment repository
git clone https://github.com/devasur/AI-BASIC-EXP-2025
cd AI-BASIC-EXP-2025

# Try our first overambitious attempt (spoiler: it won't compile)
nasm -f elf64 experiments/gates_v1.asm -o gates_v1.o
# ld gates_v1.o -o gates_v1  # This will fail spectacularly

# Now try a working version that actually does something
cd experiments/working_versions
nasm -f elf64 working_basic_v2.asm -o working_basic_v2.o
ld working_basic_v2.o -o working_basic_v2
echo 'PRINT "Hello World"' | ./working_basic_v2

Assembly Language Crash Course: If you're unfamiliar with assembly, here's what you need to know: rax, rsi, and rdi are registers (think ultra-fast storage slots), system calls are how programs talk to the operating system, and yes, it really is this much harder than Python. Every. Single. Operation.

The Humbling

What happened next was a masterclass in the difference between theoretical knowledge and practical implementation. Over the course of a week, Claude and I cycled through more than a dozen versions, each one a small step back from the original hubris.

The Code Evolution Timeline: From Ambition to Reality

Day 1 - gates_v1.asm: The overambitious disaster

; Claude's first attempt - 400 lines of beautiful, broken ambition
section .data
    program     resb 8000       ; We'll need ALL of this, surely
    variables   resd 26         ; A-Z variables like real BASIC
    expr_stack  resb 256        ; Complex expression evaluation
    for_stack   resb 256        ; Nested FOR loops, obviously

Day 3 - gates_v2.asm: First signs of life

; Finally, something that compiles and runs!
check_print:
    cmp al, 'P'          ; Check for 'P' or 'p'
    je .check_r
    cmp al, 'p'
    jne .no_match
.check_r:
    inc rsi              ; Move to next character
    mov al, [rsi]
    cmp al, 'R'          ; Check for 'R' or 'r'
    je .check_i
    cmp al, 'r'
    jne .no_match
; ... continues for I, N, T ...

Day 5 - working_basic_v2.asm: Stable command parsing at last

; The breakthrough: reliable command recognition
parse_command:
    call skip_whitespace
    cmp byte [rsi], 0
    je .empty_line
    
    ; Check for PRINT command
    push rsi
    call check_print
    pop rsi
    test rax, rax
    jnz .handle_print     ; Success! We can reliably detect PRINT
    ret

Days 6-7 - test_number.asm: The number printing nightmare

; Claude attempt #15 - still printing garbage
; Human: "This is getting ridiculous"
; Claude: "Let me try a different approach..."
print_number_attempt_15:
    ; This definitely won't work either but here we go
    test eax, eax
    jz .print_zero       ; At least zero works...
    ; Everything else: ¯\_(ツ)_/¯

That last one should give you a sense of how far we'd fallen from our original ambitions. Converting an integer to a printable string—something so fundamental that most programming languages handle it as a single function call—became our white whale.

The Number Printing Disaster: A Technical Autopsy

For days, Claude tried increasingly desperate approaches. Let me show you the actual progression of failure:

Attempt #3: Dynamic Stack Allocation (Crashed immediately)

; Claude's ambitious attempt at dynamic buffer management
print_number_dynamic:
    push rbp
    mov rbp, rsp
    sub rsp, 32          ; Allocate buffer on stack
    
    mov rdi, rsp         ; Buffer pointer
    ; ... complex division logic ...
    
    ; BUG: Stack pointer corruption, program crashes
    ; The moment we tried to return, everything died

Attempt #8: Buffer Manipulation Nightmare (Memory corruption)

; This looked so promising...
print_number_buffer:
    mov rdi, number_buffer + 19  ; Start from end
    
.convert_loop:
    xor rdx, rdx
    mov rbx, 10
    div rbx              ; Divide by 10
    add dl, '0'          ; Convert remainder to ASCII
    dec rdi
    mov [rdi], dl        ; Store digit
    
    ; BUG: Never properly null-terminated the string
    ; Result: "42" became "42ï¿½ÄÃ" or worse

Final Attempt: The Surrender (Only handled 0-9 correctly)

print_number_simple:
    test eax, eax
    jnz .convert
    
    ; Handle zero case explicitly
    mov byte [buffer], '0'
    mov rax, 1
    mov rdi, 1
    mov rsi, buffer
    mov rdx, 1
    syscall
    ret
    
.convert:
    ; TODO: Handle other numbers
    ; Narrator: This TODO was never completed
    mov byte [buffer], '?'  ; Give up and print question mark
    jmp .print_single

This pathetic "solution" could handle exactly one case correctly: printing zero. Try to print "42" and you'd get a question mark. Try "1000" and the program would either crash spectacularly or fail in creative, silent ways that took hours to debug.

What Made This So Hard? In higher-level languages, number-to-string conversion is handled by:

Built-in library functions
Garbage-collected memory management
Automatic string formatting
Exception handling for edge cases

In assembly, you're manually managing:

Memory allocation and deallocation
ASCII conversion algorithms
Buffer overflow protection
Stack pointer preservation
String null termination

Every single aspect that modern languages handle invisibly became a potential crash point.

I started to understand what Gates meant by "complex."

The debugging nightmare - when individual components work but integration fails The integration paradox: every component worked beautifully in isolation, but together they fell apart.

The Lesson Hidden in Plain Sight

Here's where things got interesting, and where Gates' CNN comments started to feel less like an old man yelling at clouds and more like hard-earned wisdom.

The Integration Paradox: When Perfect Parts Make Broken Wholes

Every individual piece of our BASIC interpreter worked beautifully in isolation:

✅ Variable Storage (A-Z): Perfect in standalone tests
✅ Command Parsing: Flawlessly recognized "PRINT", "LET", "GOTO"
✅ String Printing: Reliable when tested individually
✅ Memory Management: Each function cleaned up after itself
❌ Combined System: Complete catastrophic failure

But the moment we tried to combine them into a working system, everything fell apart.

The Technical Culprit: Register conflicts and state corruption. Here's what was actually happening:

; Function A: Uses rax for return values
store_variable:
    mov rax, [variable_value]  ; Store result in rax
    ret                        ; Return successfully

; Function B: Assumes rax contains command type
parse_command:
    ; ... some processing ...
    test rax, rax             ; Check command type
    ; BUG: rax now contains variable value from Function A!
    ; This check now means something completely different

It was like watching a perfectly choreographed dance where each dancer knew their steps but had never practiced together. Claude could write elegant assembly code for specific problems. It could debug individual functions with remarkable patience. What it couldn't do—what became glaringly obvious over that week—was maintain the invisible web of state management, memory coordination, and error handling that holds real software together.

System-Level Complexity Visualization:

Individual Components: 4 working pieces
Integration Points: 6 potential failure modes  
State Dependencies: 12 register/memory conflicts
Error Propagation Paths: 24 ways for failures to cascade
Total System Complexity: Exponential nightmare

This, I realized, is the difference between "simple" and "complex" that Gates was talking about. It's not about the difficulty of any individual task. It's about the exponential complexity that emerges when dozens of "simple" components need to work together flawlessly, every time, under conditions you never anticipated.

The Reality Check

The moment everything clicked came when I got curious about the original. Microsoft open-sourced the GW-BASIC code in 2018, and I figured it might be instructive to see what we were up against.

What I found was devastating to my remaining confidence.

Our attempt: ~800 lines of assembly, implementing maybe 3 working commands. The original GW-BASIC: Over 30,000 lines spread across more than 50 files.

The Humbling File Structure Comparison

Our AI-Assisted Attempt:

AI-BASIC-EXP-2025/
├── gates_v1.asm (400 lines)     # Overambitious failure
├── gates_v2.asm (200 lines)     # First working PRINT
├── working_basic_v2.asm (150)   # Stable command parsing  
└── test_number.asm (100 lines)  # The number printing disaster

Total: ~850 lines
Working commands: PRINT (strings only), basic input
Number printing: Broken
Variable math: Non-existent
Program storage: Theoretical

Original GW-BASIC (1975):

GW-BASIC Source Code/
├── GWMAIN.ASM (2,000+ lines)    # Main interpreter loop
├── GWEVAL.ASM (5,000+ lines)    # Expression evaluation engine  
├── GWDATA.ASM (1,500+ lines)    # Data structures and variables
├── MATH1.ASM (3,000+ lines)     # Integer mathematics
├── MATH2.ASM (4,000+ lines)     # Floating-point operations
├── GWLIB.ASM (2,000+ lines)     # String manipulation library
├── GIO1.ASM - GIO8.ASM          # 8 separate I/O subsystems
├── GWCOM.ASM                    # Command processing
├── GWFUN.ASM                    # Built-in function library
├── GWPRN.ASM                    # Print statement handling
├── GWLPT.ASM                    # Line printer support
├── GWCAS.ASM                    # Cassette tape I/O
└── 40+ additional specialized files

Total: 30,000+ lines
Working commands: Full BASIC language
Number printing: Integers, floats, scientific notation
Variable math: Complete expression evaluation
Program storage: Memory management, program editing

The Development Timeline Comparison:

1975 Team (Gates/Allen/Davidoff)    |  2025 AI Experiment (Claude + Human)
──────────────────────────────────────┼────────────────────────────────────────
Week 1: Architecture planning        │  Week 1: Overambitious failure
Week 2: Core interpreter loop        │  Can't compile basic structure
Week 3: Expression parser design     │  Finally prints "Hello World"
Week 4: Variable storage system      │  Command parsing mostly works
Week 5: Math operations foundation   │  Number printing: still broken
Week 6: String handling library      │  Integration: catastrophic failure
Week 7: I/O subsystem integration    │  Project abandoned
Week 8: Full BASIC interpreter       │

Buried in one of the files, I found this copyright notice that made my stomach drop:

ORIGINALLY WRITTEN ON THE [PDP-10](https://en.wikipedia.org/wiki/PDP-10) FROM
FEBRUARY 9 TO APRIL 9 1975

BILL GATES WROTE A LOT OF STUFF.
PAUL ALLEN WROTE A LOT OF OTHER STUFF AND FAST CODE.
[MONTE DAVIDOFF](https://en.wikipedia.org/wiki/Monte_Davidoff) WROTE THE MATH PACKAGE.

Two months. Three 20-year-olds working in dorm rooms and computer labs. No modern development tools, no internet, no Stack Overflow. And they built something that fundamentally changed how humans interact with computers.

Meanwhile, Claude and I—armed with decades of accumulated programming knowledge and all the documentation in the world—had spent a week failing to reliably print the number 42.

Humbling scale comparison - 800 lines vs 30,000 lines The reality check: our 800-line attempt versus the 30,000+ lines of the original GW-BASIC.

But hold on. Before I get too carried away celebrating human superiority, let's be fair to the AI.

After all, we were asking Claude to recreate one of the most significant software achievements in computing history using a programming paradigm it rarely encounters in training data. That's roughly equivalent to asking a literature professor to perform emergency surgery after reading a few medical textbooks.

In Defense of the Machine

Claude wasn't exactly working with ideal conditions. Assembly language represents a tiny fraction of the code repositories that trained modern LLMs. While models like Meta's LLM Compiler have been specifically trained on 546 billion tokens of assembly code and LLVM intermediate representations, general-purpose models like Claude primarily learned from the billions of lines of Python, JavaScript, and C++ that dominate platforms like GitHub.

Assembly language projects on GitHub are vastly outnumbered by web development frameworks and mobile apps. When Claude encountered our NASM syntax errors and x86-64 memory management challenges, it was drawing from a much smaller knowledge base than it would have for, say, a React component or a Python data analysis script.

What AI Actually Excelled At: The Bright Spots

Let's acknowledge what the AI did accomplish remarkably well:

Syntax Error Detection and Fixes:

; My broken code:
mov rax, sys_write        ; Undefined symbol

; Claude's immediate fix:
mov rax, 1                ; sys_write system call number

Pattern Recognition: Claude could instantly spot when I wrote mov [rax], rbx instead of mov [rax], bl for byte operations.

Systematic Debugging: Never got frustrated with the 47th iteration of the same broken number printing function.

Code Documentation: Generated clear, accurate comments explaining every register usage and system call.

API Knowledge: Knew Linux system call conventions, NASM syntax quirks, and x86-64 calling conventions better than most human programmers.

Where AI Hit the Wall: The Failure Modes

System-Level Architecture: Couldn't step back and ask "Why are we doing it this way?"

Creative Problem-Solving: When conventional approaches failed, AI doubled down instead of trying radical alternatives.

Integration Thinking: Perfect at optimizing individual functions, terrible at seeing how they'd work together.

Resource Trade-offs: Never suggested "Maybe we should use C instead of assembly for this?"

The fact that it couldn't integrate everything into a working system says more about the nature of complex software architecture than it does about AI's fundamental limitations.

There's another angle worth considering: Gates and Allen weren't starting from zero in 1975. They had years of programming experience on PDP-10 systems, intimate knowledge of computer architecture, and most importantly, they were working in an environment specifically designed for their task. Claude and I were essentially trying to recreate specialized domain expertise through a text-based conversation.

But even with those caveats, our experiment revealed something important about the current boundaries of AI capability.

What I Learned About AI (And Myself)

So what does this tell us about the current state of artificial intelligence and coding? Gates told CNN that experts can't agree whether AI will crack complex programming tasks "within the next one to two years or if it is still a decade away." After a week in the trenches, I think the timeline misses the point entirely.

The question isn't when AI will get better at coding. It's what we mean by "coding" in the first place.

Claude proved remarkably capable at what I'd call "coding as translation"—taking a well-defined problem and converting it into working code. Need to read a file? Parse a command? Store a variable? The AI could generate clean, efficient assembly language faster than I could type the specifications.

But "coding as architecture"? That's different. The original BASIC wasn't just a collection of functions that happened to work together. It was a coherent vision of how humans should interact with computers, implemented under constraints that would make a modern programmer weep. Four kilobytes of memory. No operating system. No existing frameworks to build on. Just raw silicon and the kind of systemic thinking that turns isolated components into transformative products.

That gap between functional code and visionary software? That's what Gates meant by "complex." And honestly, I'm not sure closing it is just a matter of better training data.

The Thing About Genius

Working with Claude for a week gave me a new appreciation for what separates good programmers from great ones. The AI was everything you'd want in a coding partner: methodical, patient, never frustrated by failure. When our number-printing function crashed for the fifteenth time, Claude calmly suggested another approach. When the memory management went haywire, it systematically debugged each component.

What it couldn't do was step back and ask: "Why are we even trying to print numbers this way?"

The Questions AI Never Asked

During our week of debugging hell, Claude never once suggested:

"Should we implement this in C instead of assembly?"
"What if we used a different number conversion algorithm?"
"Maybe we're solving the wrong problem entirely?"
"Could we simplify the requirements and build up gradually?"

These aren't coding questions—they're architectural questions that require stepping outside the immediate problem and seeing the bigger picture.

Gates and Allen didn't just implement BASIC—they invented the idea that regular people should be able to program computers at all. That wasn't a coding problem; it was a conceptual leap that required seeing a future that didn't yet exist.

Gates told CNN he thinks coding will remain "a 100% human profession" even a century from now. After watching Claude struggle with the same integration challenges that have frustrated human programmers for decades, I'm starting to think he's onto something. Not because AI can't write code, but because the most important programming happens before you ever touch a keyboard.

It's the moment when someone looks at a problem and thinks: "What if we approached this completely differently?"

Human vision versus mechanical execution The gap between human creativity and artificial intelligence: vision versus execution.

The Real Achievement

By traditional measures, our experiment failed. We didn't create a working BASIC interpreter. We couldn't even reliably print numbers over two digits.

But in a deeper sense, the project succeeded brilliantly. It demonstrated exactly what Gates meant by "complex" software development. It's not just about the lines of code or the technical challenges—it's about the system-level thinking, the integration challenges, and the countless small decisions that turn an idea into a product.

The AI proved it could be an incredibly capable programming partner for well-defined tasks. It showed patience, persistence, and impressive technical knowledge. What it couldn't do was provide the architectural vision, the creative problem-solving, and the intuitive understanding of user needs that defined the original BASIC.

The Real Test

Our week-long journey from overconfidence to humility taught me something I didn't expect: the most impressive thing about AI isn't how much it can do, but how much it reveals about what makes human creativity special.

Claude could generate assembly code faster than I could read it. It could debug syntax errors, optimize memory usage, and implement algorithms with mechanical precision. What it couldn't do was tell me why any of this mattered, or whether we were solving the right problem in the first place.

Gates is probably right that AI will get dramatically better at the mechanical aspects of programming. But our BASIC experiment suggests that software development has never really been about the mechanics. It's about having the vision to see what's possible and the judgment to know what's worth building.

The teenagers who created BASIC in 1975 weren't just writing code—they were imagining a world where computers could be personal tools rather than corporate machines. That leap from "what is" to "what could be" remains stubbornly, reassuringly human.

Which means that instead of asking whether AI can replace programmers, maybe we should be asking a different question: What becomes possible when human vision gets paired with artificial intelligence that never gets tired, never gets frustrated, and can implement our wildest ideas faster than we can dream them up?

Your Turn: The Interactive Challenge

Want to experience this humbling journey yourself? Here's your starter kit:

Level 1: The Basics (Should take 30 minutes, will take 3 hours)

git clone https://github.com/devasur/AI-BASIC-EXP-2025
cd experiments/starter_challenges

# Challenge: Make this print "Hello Assembly"
# Hint: It currently prints garbage
nasm -f elf64 hello_broken.asm -o hello_broken.o
ld hello_broken.o -o hello_broken
./hello_broken

Level 2: The Nightmare (Estimated 1 hour, actual time: indefinite)

# Challenge: Make this print any number from 1-100
# Warning: May cause existential crisis
cd number_printing_challenge
# Good luck. You'll need it.

Level 3: Integration Hell (For masochists only)

# Challenge: Combine working components into a system
# Difficulty: How Gates felt in 1975, but with less talent
cd integration_nightmare
# Abandon hope, all ye who enter here

Now that sounds like a complex problem worth solving.

When Bill Gates Said AI Can't Write "Real" Software, I Decided to Test Him

See Also

The Agent Collaboration Revolution: A Five-Part Implementation Guide - Series Overview

The Dependency Web

The Assistant's Dilemma

When Bill Gates Said AI Can't Write "Real" Software, I Decided to Test Him

The Dunning-Kruger Moment

Try It Yourself: The Reality of Assembly Programming

The Humbling

The Code Evolution Timeline: From Ambition to Reality

The Number Printing Disaster: A Technical Autopsy

The Lesson Hidden in Plain Sight

The Integration Paradox: When Perfect Parts Make Broken Wholes

The Reality Check

The Humbling File Structure Comparison

In Defense of the Machine

What AI Actually Excelled At: The Bright Spots

Where AI Hit the Wall: The Failure Modes

What I Learned About AI (And Myself)

The Thing About Genius

The Questions AI Never Asked

The Real Achievement

The Real Test

Your Turn: The Interactive Challenge

About Boni Gopalan

See Also

The Agent Collaboration Revolution: A Five-Part Implementation Guide - Series Overview

The Dependency Web

The Assistant's Dilemma

When Bill Gates Said AI Can't Write "Real" Software, I Decided to Test Him

The Dunning-Kruger Moment

Try It Yourself: The Reality of Assembly Programming

The Humbling

The Code Evolution Timeline: From Ambition to Reality

The Number Printing Disaster: A Technical Autopsy

The Lesson Hidden in Plain Sight

The Integration Paradox: When Perfect Parts Make Broken Wholes

The Reality Check

The Humbling File Structure Comparison

In Defense of the Machine

What AI Actually Excelled At: The Bright Spots

Where AI Hit the Wall: The Failure Modes

What I Learned About AI (And Myself)

The Thing About Genius

The Questions AI Never Asked

The Real Achievement

The Real Test

Your Turn: The Interactive Challenge

More Articles

The Agent Collaboration Revolution: A Five-Part Implementation Guide - Series Overview

The Continental Renaissance

Who Gets to Make the Web Beautiful? WebGPU and the New Digital Divide

About Boni Gopalan