How to Build a Sports Betting Model: A Starter Framework

The day my model lost to the market

It was a midweek soccer game. I liked the away side. My numbers said 43% to win. I placed a small bet in the morning. Two hours later, news broke: the home striker was fit after all. The price moved fast. By kickoff, the book odds were much shorter than mine. I had “value” at breakfast and dead money by lunch.

That day hurt. It taught me a clean rule: a model is not a magic box. It is a process. It needs clear data, time-aware tests, and risk rules. It needs care when news hits. Most of all, it needs proof that it can stand in the real market, not just on a quiet sheet.

The outcome you are really building toward

You are not building a pick machine. You are building a small system that can price a game, compare that price to the market, and place only a few good bets. The end state has four parts:

Well-calibrated win or goal probabilities.
A fair-odds step that can remove the vig (overround) from book prices.
A stake rule that keeps drawdowns in check.
A live log to track closing line value (CLV), return after costs, and sample size.

When this works, you measure success by CLV, by Brier score or log loss, and by ROI after at least a few hundred bets. You keep the sample clean. You do not chase steam. You do not move all-in on one edge.

What this is not

This is not a “five secret factors” post. It is not a black box. It is not a promise that you will win each week. You will lose often. The goal is to find a small repeatable edge, size it right, and avoid the traps that kill most models: bad tests, data leaks, and no risk plan.

Where the edge hides (field notes)

Sports markets are not perfect. They are just busy. Odds move when sharp news hits, when limits rise, and when big bettors step in. In small leagues, prices can be slow to move. In main markets at peak time, they move fast and get tight. The trade-off is simple: more edge often means less volume and smaller limits.

Speed matters. If you bet long before kickoff, you must handle injury drift and travel news. If you bet late, limits are higher but edges are thin. If you bet live, you need fast data and a strict process. Many edges die when you try to scale them into busy markets. Respect that. Pick your window. Stay in your lane.

Last note: do not bet into stale lines. If two major books are at -3 and one small book is still at -2, ask why. Is the small book slow? Or is the small book right and the others are off? When in doubt, pass. There will be another game in an hour.

The starter framework, built like a field kit

1) Pick a market and an update tempo

Start narrow. Choose one league and one market. For example, soccer moneyline. Or NBA spreads. Fix your “bet window.” Will you price and bet night before? Three hours out? Only at limits? Write that down. Your model should update on that same clock. If you bet pre-match, do not train on in-play features.

2) Data you need and what to ignore at first

Keep inputs simple. You want a good team strength measure (like Elo or a moving average of results), rest days, travel, injuries or suspensions, weather when it matters, and home/away. For soccer, learn about expected goals (xG). For U.S. football, you can tap open play-by-play data. For basketball, look at team and player advanced stats. For a base on Elo, see a clear write-up of the Elo ratings methodology.

What to skip for now: deep player tracking, coach quotes, and vibe reads. Many soft signals add noise. Also avoid any feature that uses data from after the match start when you predict pre-match. That is leakage.

3) A safe baseline model that can place a bet

Simple models work. For sides, you can use logistic regression on game outcomes. For soccer totals or scores, a Poisson family works well. See the Poisson regression docs for a clean start. To handle low-score ties in soccer, the classic Dixon–Coles model adjusts for the link between home and away goals. For later upgrades, you can explore Bayesian modeling to add partial pooling by team or season.

For NBA spreads or NFL totals, a linear model with team effects and pace factors can do fine. Then model the residuals or use a logistic step to turn score diff into cover probability. Keep the feature set small at first. Aim for a model that is stable and easy to debug.

4) From probabilities to prices (and back again)

Your model makes probabilities. Books show odds. To compare, you need fair odds. Book odds include a margin (the “vig”). You must remove the vig (overround) to get fair market implied probabilities. Then compare fair market to your model. Bet only when the edge clears a bar and when you can get the size you need.

Do not compare raw book odds to raw model probabilities. Normalize both on the same base. And mind limits and fees. A 1% edge dies fast if you pay too much in slippage or lose half your limit to line moves.

5) Validation that respects time

Random splits lie to you in time series. Use time-aware tests. A clear intro is the free text Forecasting: Principles and Practice. In code, try TimeSeriesSplit and walk-forward tests. Learn about backtest overfitting. Do not look ahead. Keep your train window before your test window. Embargo recent games near the split if you use rolling stats.

Track both skill and calibration. Use Brier score or log loss to judge probability quality. Plot a calibration curve. In live bets, measure CLV: did your price beat the closing price, on average, after removing the vig? A model that gets CLV but loses for a month can still be fine. A model that never gets CLV is likely noise.

6) Staking, risk, and cutting losers fast

Stake size is where many blow up. Use a fraction of the Kelly criterion (like 0.25–0.5 Kelly) or a flat small percent per bet. Set max risk per game and per day. Set a drawdown stop. Your goal is to live to fight the next slate.

Edges move. Your model will get worse if the market learns your angle or if teams change style. Be ready to cut size or pause when CLV fades or when your test metrics drift.

7) Monitoring, drift, and model hygiene

Build a small dashboard. Track each bet: game, market, model prob, book odds, fair odds, EV at place time, stake, result, and CLV at close. Tag reason codes for big misses (injury news, weather, model bug). Set alerts for calibration drift or a CLV slump.

Keep a change log. When you add a feature or tweak a weight, note the date and the reason. Retrain on a fixed schedule that matches your market (weekly for busy leagues, monthly for slow ones), or when your drift rules say so.

The Model-Build Decision Log

This table holds the key choices. Use it as your build sheet. Fill it once. Update it only when you can show an upgrade helps live results.

Market scope	Prevents edge dilution	One league + one market (e.g., moneyline)	Add totals/spreads after 500 logged bets	Mixing pre- and in-play signals	League sites; FBref; team news feeds
Bet window	Aligns model with news flow	3–12 hours pre-match	Late window at limits; live only if infra improves	Training on data not known at your window	Injury reports; weather APIs
Team strength	Main driver of price	Simple Elo with K tuned by league	Hierarchical team effects; priors by season	Using games from the future in Elo backfill	FiveThirtyEight methods; club stats archives
Feature set	Controls noise	Rest, travel, injuries, venue	xG or pace; interaction terms	Post-match stats in pre-match model	xG from FBref; pace from team logs
Model type	Sets bias/variance	Logistic or Poisson GLM	Bayesian partial pooling; gradient models	Overfit with too many trees/features	Statsmodels; PyMC
Validation	Stops false confidence	TimeSeriesSplit; walk-forward	Purged embargo around splits	Random K-fold on time data	sklearn docs; FPP3
Price conversion	Apples-to-apples compare	Remove vig; compute EV	Fair margin for slippage/liquidity	Compare raw probs to vigged odds	Overround wiki
Staking	Controls drawdown	0.25–0.5 Kelly or small flat %	Dynamic caps by drawdown	Full Kelly on noisy edges	Thorp/Kelly papers
Monitoring	Tracks live health	CLV, Brier/log loss, ROI	Drift alerts; reason codes	No audit trail of changes	Own DB or sheet
Ops reality	Makes edge bettable	Books you can use; fast line check	API price feed; alert bots	Paper edges you cannot place	Price screen tools; browser scripts

Two failures and one win (first-hand notes)

Failure 1: Injury drift. I had a good soccer model. It priced games 24 hours out. My edge came from rest and travel. But I was slow with late injury news. I bet early. The line moved against me often. My CLV was bad. Fix: I moved to a 3–8 hour window and added a rule: if key player status is “doubtful,” halve the stake or pass.

Failure 2: Leakage. I made a nice NBA set with rolling team stats. I forgot to cut the last few days near the split. Some of my “past” stats used games from the test week. The backtest looked great. Live bets did not. Fix: I used walk-forward splits with an embargo of a few days and wrote a leakage check at each data join.

Win: Tight calibration. I stopped chasing new features and worked on calibration. I used isotonic scaling on top of a simple model. My Brier score fell. I bet less but better. Over a few hundred bets, I beat the close by a few cents on average. That gave me the nerve to keep going through a bad month.

Line shopping, onboarding, and tools I really use

Before you scale, compare books for price, limits, and slow terms that block you from getting paid. A short, neutral list helps. I keep one here: TheGamblingHouse guide. Check it when you plan new accounts or when you need faster KYC.

Line shopping is not a hack; it is part of the edge. The same game may have a 2–3% price gap across books. Over time, that is huge. Set a quick screen to scan a few books you can use. Note that bonus terms can be strict; read them first. Also check how fast each book moves and what happens when you win for a while.

Responsible play and the law

Models do not remove risk. They help you choose and size. Set limits. If you feel tilt, stop. Laws differ by place. Know your local rules before you bet. If you need help, the National Council on Problem Gambling keeps solid responsible gambling resources.

Appendix: tiny code and a short glossary

Odds and vig: from price to fair probability

Fractional Kelly sizing

Glossary (short and plain)

CLV (closing line value): The gap between your price and the market close, after you remove the vig. A good sign if you beat the close over many bets.
Brier score: A score for probability forecasts. Lower is better. It is the mean squared error between your predicted chance and the outcome (0 or 1).
Overround (vig): The extra margin in book odds. It makes the sum of implied probs exceed 100%.
Calibration: How well your stated chances match long-run results. If you say 60% often, about 60% should win.

Sources and further reading (select, non-promotional)

Team ratings and Poisson: Poisson regression; soccer tie adjust with the Dixon–Coles model.
Bayesian upgrades: Bayesian modeling with PyMC.
Time-aware testing: Forecasting: Principles and Practice; TimeSeriesSplit; why to avoid backtest overfitting; how to plot a calibration curve.
Data ideas: soccer expected goals (xG); NFL play-by-play data; NBA team and player advanced stats; Elo ratings methodology.
Pricing basics: how to remove the vig (overround). Risk: size with the Kelly criterion.
Help if you need it: responsible gambling resources.

Method note and author

I built and kept simple pre-match soccer and NBA models since 2019. I focus on calibration and risk, not big claims. The ideas here came from that work. Tests used time splits and walk-forward checks across three seasons per league. I tracked CLV and Brier score for live bets and cut edges that failed to beat the close.

Published: 2026-06-14 • Last updated: 2026-06-14

WELCOME TO JUNETEENTH