Building JebBot: A Chess AI That Plays Like Me

Everyone's obsessed with living forever. I just want my friends to lose to a bot playing the Danish Gambit poorly after I'm gone.

The Idea

There's something deeply appealing about the idea of a digital clone. Not the sci-fi version where a copy of your consciousness wakes up confused in a server somewhere, but something simpler: a bot that captures how you do one specific thing.

For me, that thing is playing chess badly.

I've played thousands of games on chess.com since 2016. Blitz mostly, always going for the most fun line rather than the most correct one. I love gambits, hate draws, and have developed what can only be described as "a style" over the years.

The question that led to this project: could I train a neural network to recognize that style? Not to play good chess—there's Stockfish for that—but to play my chess?

Three Goals

This wasn't just about the end result. I had three things I wanted to accomplish:

Build a digital chess-playing clone of myself. Something that captures my patterns, my opening preferences, my tendency to play moves that are slightly worse but more interesting.
Use AI coding tools for a real project. I've been using Claude, Cursor, and Claude Code for little things, but I wanted to push them on something substantial. Something with actual ML training, data pipelines, and debugging weird tensor errors at 2am.
Actually understand how neural networks work. Not just conceptually—I've watched the 3Blue1Brown videos like everyone else—but viscerally. I wanted to write the forward pass, debug the loss function, watch the gradients flow. The whole thing.

The Data Pipeline

Downloading 8,000 Games

Chess.com has a nice API. Point it at a username, get a list of monthly game archives. Point it at an archive, get PGN files. I wrote a simple client that pulls down everything and stores it as JSON.

Final count: 7,956 games spanning from 2016 to 2025. That's about a decade of me clicking pieces around a board.

Extracting Positions

Not every game is useful. I filtered out:

Abandoned games
Timeouts in clearly won positions
The opening moves (first 5) since those are just memorized theory

What remained: 219,663 positions where I made an actual decision. About 31 meaningful moves per game on average.

From my games I learned some things about myself:

97% blitz player (no patience for long thinks)
51% win rate as White, 48% as Black
Low draw rate at 4% (I play for decisive results)
Top openings: Modern Defense, Danish Gambit, Philidor Defense

Encoding Positions

Neural networks eat tensors. A chess position becomes a 12x8x8 array:

12 channels (one for each piece type × 2 colors)
8×8 board (obviously)
Each channel is binary: 1 where that piece sits, 0 elsewhere

Moves become indices 0-4095. There are 64 squares a piece can move from, 64 squares it can move to. 64×64 = 4096. Simple.

The Model Architecture

Here's the key insight that makes this work: I'm not teaching the model to play chess. I'm teaching it to recognize my moves.

The model is a binary classifier. Given a position and a candidate move, it outputs a probability: "Is this something Jeb would play?"

INPUT:
├── Position: 12×8×8 tensor (one channel per piece type)
└── Move: index 0-4095 → embedded to 64-dim vector

ARCHITECTURE:
├── 3 convolutional layers (12→64→128→128 channels)
├── Flatten to 8,192 features
├── Concatenate with 64-dim move embedding
├── 3 fully connected layers (8,256→256→64→1)
└── Sigmoid output (probability 0-1)

OUTPUT: "Is this a Jeb move?" (0.0 to 1.0)

Total parameters: ~540,000. Tiny by modern standards. GPT-4 has over a trillion. My model could fit on a floppy disk if anyone still had those.

The Training Trick: Negative Examples

Here's where I got stuck for a while. My first training run hit 100% accuracy instantly. Something was wrong.

The problem: I was only showing the model moves I actually played. It learned that any move passed to it was probably a Jeb move, because that's all it ever saw. The model wasn't learning my style; it was learning to always say "yes."

The fix: negative examples. For every position, I generate two training examples:

Positive: The move I actually played (target = 1.0)
Negative: A move I didn't play (target = 0.0)

But not random moves—that would be too easy. I use Stockfish set to 1500 Elo (roughly my level) to find the top 6 "reasonable" moves, then pick one I didn't play. This creates the perfect contrast: "Here's a good move, but it's not what Jeb chose."

Now the model has to learn what makes my moves mine, not just what makes them legal.

The Training Loop

With the data pipeline solid, training was straightforward:

80/10/10 split for train/validation/test
Binary Cross-Entropy loss (standard for classification)
Adam optimizer with learning rate 1e-4
Early stopping with 10 epochs patience
Dropout (0.3) to prevent overfitting

I trained on Apple Silicon using MPS acceleration. Had to add some memory management—clearing the cache every 1000 batches—but it worked well enough.

The Visualization That Made It Click

One of my favorite parts of this project was building a real-time training visualizer. Every 100 batches, the trainer sends 6 random positions to a local web server that displays:

The board position
The move being evaluated
The model's confidence
Whether it was actually my move

Watching the model learn in real-time was mesmerizing. Early on, it would confidently say "that's a Jeb move!" about Stockfish's top choice. By epoch 20, it started recognizing my weird preferences. The confidence bars would swing—high on my aggressive queen moves, low on the boring defensive retreats I'd never play.

Green border for correct predictions. Red for wrong. The flash animations made it feel alive.

Results

After training, the model achieves ~60.5% accuracy distinguishing my moves from reasonable alternatives.

Is that good? Consider:

Random guessing would get 50% (it's binary classification with balanced classes)
100% would mean it perfectly predicts every move I make, which is impossible because humans are inconsistent
60% means it's captured something about my style

The interesting examples tell the story:

High-Confidence Correct (>90%): The model recognizes my signature moves. Aggressive queen sorties. The Danish Gambit continuation. Pushing pawns when I should be developing.

High-Confidence Wrong (>85%): Moves the model thinks I'd love, but I actually didn't play. Often these are exactly the kind of move I'd play in a different position. The model has learned my patterns but overgeneralizes.

Missed Jeb Moves (<15%): Moves I played that the model thought were too boring for me. Usually defensive moves I made reluctantly.

Playing Against JebBot

The trained model doesn't play chess by itself—it scores moves. To actually play, I built an engine that combines:

1. Opening Book (First ~5 Moves)

Opening play is mostly memorized. Instead of letting the neural network struggle with theory, I built a weighted random book based on my actual opening preferences:

As White: Always 1.e4, then probabilistic responses (Danish Gambit 30%, Italian 30%, etc.)
As Black: Against 1.e4, 45% Caro-Kann, 45% Sicilian, 10% Scandinavian

2. Endgame Detection

When either side has ≤12 material points (roughly a rook or less), JebBot hands off to pure Stockfish. Endgames are tactical—my "style" doesn't help when it's K+R vs K. Let the engine calculate.

3. Style-Quality Balancing

For everything in between, the engine uses a clever selection algorithm:

Get top 5 moves from Stockfish (ranked by quality)
Score each with the neural network (0-1 confidence)
Find the "safe pick": first Stockfish move with >50% Jeb score
If another move scores 15%+ higher, play that instead
Fallback: if nothing is >50%, play highest Jeb score anyway

This balances two competing goals:

Don't make moves so bad they're obviously wrong
Let personality emerge when the choices are close

The result feels remarkably like playing against... me. It favors the same openings. It makes the same slightly-dubious piece sacrifices. It has the same blind spots.

The Tech Stack

python-chess: Board representation and move validation
stockfish: Candidate move generation and endgame play
PyTorch: Neural network training
wandb (optional): Training logging and visualization
Flask: Simple API server for the game UI
chessboard.js: Interactive chess board in the browser

Everything runs locally. No cloud needed for inference.

What I Learned

About Neural Networks

They're not magic. They're function approximators that find patterns in data. The tricky part is giving them data that captures what you actually care about. Negative examples matter more than I expected. Balanced datasets matter more than I expected.

About AI Coding Tools

Claude Code wrote probably 70% of this codebase. Not by magic—I had to be very specific about what I wanted, catch its mistakes, and iterate. But it was genuinely useful in a way that surprised me. The debugging help alone was worth it. When my MPS memory kept exploding, Claude suggested the cache-clearing pattern that fixed it.

About My Chess

I play far more gambits than I realized. My opening repertoire is narrower than I thought. And apparently I have a tell: the model learned that I almost never retreat my queen in the first 15 moves. Even when I should.

What's Next

JebBot is functional but not deployed. The next steps would be:

Host the API somewhere (Railway, Fly.io)
Build a nicer frontend
Maybe add personality to the move explanations ("Ah yes, the classic Jeb attack")

But honestly? The project accomplished what I wanted. I have a digital chess clone. I understand neural networks much better. And I pushed AI coding tools to build something real.

Now I just need to figure out who to bequeath my chess bot to in my will.

Technical Appendix

Model Size Breakdown

Component	Parameters
Conv1 (12→64)	6,976
Conv2 (64→128)	73,856
Conv3 (128→128)	147,584
Move Embedding	262,144
FC1 (8256→256)	2,113,792
FC2 (256→64)	16,448
FC3 (64→1)	65
Total	~2.6M

Data Statistics

Total games: 7,956
Total positions: 219,663
Dataset size (with negatives): 439,326 examples
Training file: 48MB
Trained model: 31MB

Training Configuration

Epochs: 50 (with early stopping)
Batch size: 64
Learning rate: 1e-4
Weight decay: 0.0001
Dropout: 0.3
Early stopping patience: 10 epochs

Repository Structure

jebbot/
├── jebbot/
│   ├── data/          # Download, parse, encode
│   ├── model/         # StyleSelector network
│   ├── play/          # Engine, openings, server
│   ├── training/      # Training utilities
│   └── visualization/ # Real-time training display
├── scripts/           # CLI entry points
└── data/
    ├── raw/           # Downloaded games
    ├── processed/     # Training positions
    └── models/        # Trained checkpoints

Inspired by Peter Whidden's "AI Learns Pokemon" video, but using supervised learning instead of reinforcement learning. Much simpler, much faster, and perfectly suited for behavioral cloning.