This site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact link.
Analytics

Sports data sources & literacy

Where the numbers come from, how they are collected, and the pitfalls of treating any single dataset as ground truth. A practical guide to free and paid sports data sources, definitional gotchas, and survivorship traps.

Sources
20+
Sports
All majors
Reading
9 min
Level
Beginner

Data literacy is leverage

Knowing how a statistic is defined - and how it was measured - is more valuable than the statistic itself. A 'tackle' in one tracking system isn't a tackle in another. 'Hurries' counted by Pro Football Focus follow different rules than those counted by Next Gen Stats. Two reputable sources can post different numbers for the same event, and both can be internally consistent.

Before you bet on a stat, find out exactly what it measures, how it is collected, and what its sample size is. A surprising amount of betting media reports stats without ever defining them. That is a tell that the analysis is downstream, not upstream.

Free public data sources by sport

Almost everything a recreational analytical bettor needs is available free. Paid feeds matter for systematic modeling, not for one-off bet analysis.

  • NBA - NBA.com/stats (official), Basketball Reference, Cleaning the Glass (subscription tier), PBP Stats.
  • NFL - Pro Football Reference, ESPN Stats & Info, nflfastR (R package, raw play-by-play), Sumer Sports, RBSDM.
  • MLB - FanGraphs, Baseball Reference, Baseball Savant (official Statcast), Brooks Baseball.
  • NHL - Natural Stat Trick, MoneyPuck, Hockey Reference, evolving-hockey (subscription).
  • Soccer - FBref (powered by StatsBomb), Understat (xG), Transfermarkt, WhoScored.

Paid feeds - when they actually matter

Paid data services (Sportradar, StatsBomb, Second Spectrum, Hudl, Pinnacle's data API) matter primarily for three use cases: building a real systematic model that needs raw, unaggregated data; tracking line movement at second-level granularity; or accessing player-tracking data not exposed publicly.

For everything else - single-game research, prop bet evaluation, team strength assessment - public sources are sufficient. Paid does not equal sharper; in many cases public sources copy paid feeds and present them more clearly.

Common data pitfalls

Even well-sourced data can mislead if you don't account for these common pitfalls. Every one of them has produced confidently wrong betting takes in published media.

  • Survivorship bias - only winning bettors, models, or systems get studied; their counterparts disappear.
  • Definition drift - the same stat changes definition across seasons (PFF grade revisions, NHL shot-attempt rules).
  • Sample size - a 60-game 'trend' is usually variance dressed as a pattern.
  • Park / venue effects - Coors Field, Fenway, weather-affected NFL stadiums all distort raw stats.
  • Regime change - pre-rule-change data may not apply to the current game (NFL pass interference rules, NBA defensive rules).
  • Selection bias - looking at only games where a stat 'mattered' inflates its apparent importance.

Building a personal data workflow

A working bettor maintains a small set of canonical sources per sport, refreshed on the same cadence, and triangulates between them. When two reputable sources disagree, investigate; do not assume one is right. Build your own bet log alongside the data so you can correlate decisions with sources later.

Frequently asked questions

Where can I get free sports betting data? Pro Football Reference, Basketball Reference, FanGraphs, Baseball Savant, Natural Stat Trick, FBref, and Understat cover the major sports for free.

What is Statcast in MLB? MLB's tracking system that measures exit velocity, launch angle, sprint speed, pitch movement, and other physical data on every event. Available free at baseballsavant.mlb.com.

What is nflfastR? An open-source R package that provides NFL play-by-play data going back to 1999, including EPA, win probability, and a host of derived metrics. The de facto standard for public NFL analytics.

Do I need paid data to bet profitably? Generally no. Public data is sufficient for the vast majority of analytical betting workflows. Paid feeds matter mainly for systematic modeling.

21+ · Play responsibly

If gambling is causing harm, call 1-800-GAMBLER (free, confidential, 24/7) or visit ncpgambling.org.