How it works

The data, model, and assumptions behind these predictions.

Data source

Team strength is measured using World Football Elo Ratings (eloratings.net), a widely-used rating system that updates after every international match. Elo ratings account for match importance, goal margin, and home advantage.

The ratings snapshot used here is from March 6, 2026. This is a static snapshot that does not update as new matches are played. Ratings are mapped to a 1400 to 2200 scale for internal use, with the strongest teams near the top.

Simulation engine

Predictions are generated by running 10,000 full tournament simulations (Monte Carlo method). Each simulation plays out all 104 matches from group stage through the Final, making random draws weighted by team strength.

In each simulation:

All 12 groups are simulated simultaneously. Each group match uses Poisson goal simulation, where goal counts are drawn randomly from a distribution calibrated to produce ~2.5 goals per match, with the split between teams determined by their Elo difference.
Groups are resolved using FIFA tiebreaking rules: points, goal difference, goals scored, head-to-head record, then FIFA ranking.
The 8 best third-place teams are determined inside each simulation run by comparing actual simulated points, goal difference, and goals scored across all 12 groups, matching FIFA's defined criteria. The 8 advancing third-place teams are then assigned to specific bracket slots using backtracking to guarantee a valid matching (per FIFA Regulations Annex C, which defines 495 possible assignment combinations).
Knockout matches use the same Poisson goal model. If the match is tied after 90 minutes, it goes to extra time and penalties (modeled as a compression toward 50/50).

The final probabilities you see (e.g., “Spain: 12.3% to win the tournament”) are simply the fraction of simulations where that outcome occurred. 10,000 iterations gives roughly ±1 percentage point precision for common outcomes and wider margins for rare ones.

Match probability model

The probability of one team beating another starts with the standard Elo logistic formula: a team's expected score is 1 / (1 + 10^((Elo_B - Elo_A) / 400)). Two teams with equal ratings each have a 50% expected score.

For group stage matches, draws are possible. The draw probability uses a Gaussian decay model where equal teams draw about 22% of the time, dropping to ~5% when the Elo gap is large. Raw Poisson simulation naturally overproduces draws (~27% for equal teams), so we apply Dixon-Coles style draw deflation: 20% of simulated draws are rejected and resampled. This brings the predicted draw rate to ~20%, matching the historical World Cup average from 2018 and 2022 (19.8% across 128 group matches).

For knockout matches, the result is compressed toward 50/50 by a factor of 0.85 to account for the randomness added by extra time and penalty shootouts.

Host nation advantage

The three host nations (United States, Mexico, Canada) receive a +80 Elo point boost, equivalent to about 0.6 goals per match of home advantage. This is applied uniformly. The model does not distinguish between matches played in the US, Mexico, or Canada. In reality, Mexico likely has a stronger home advantage at Estadio Azteca than Canada does at BMO Field, but modeling this would require match-level venue data we don't have.

Backtesting

We tested the model against the last two World Cups using historical Elo ratings from before each tournament. The metric is Brier score, a standard measure of probabilistic prediction accuracy where 0 is perfect and higher is worse.

Stage	Brier score	Naive baseline	Skill
Group stage (96 matches)	0.5704	0.6667	14.4%
Knockout (30 matches)	0.2181	0.2500	12.8%

The “naive baseline” is what you'd get by predicting every outcome as equally likely (33% win/draw/loss for group matches, 50/50 for knockouts). The model beats this baseline by 14.4% in the group stage and 12.8% in knockout rounds.

This is a modest but real improvement. World Cup football is inherently unpredictable. Even the best models struggle to do much better than this over 30 knockout matches.

Calibration

After backtesting, we applied a logit power scaling calibration layer: calibrated = sigmoid(α × logit(raw)). The optimal α fitted to the 2018 + 2022 data (126 matches) is 0.96, essentially the identity function. The model is already well-calibrated; aggressive rescaling would overfit to two tournaments.

This module exists as a clean integration point. After the 2026 World Cup adds ~80 more matches, we'll refit α with three tournaments of data (190+ matches), which should give enough signal to detect any real miscalibration.

Path threshold

The “Paths to the Final” view shows every championship route that occurred in at least 50 of 10,000 simulations (0.5% probability). Below this threshold, the confidence interval around the estimate is wider than the estimate itself. The number is noise, not signal.

If a team has no paths above this threshold, we show their single most frequent path with an explicit disclaimer that the probability is too low to be statistically meaningful. Every team gets at least one path.

Playoff teams

Six World Cup spots are still decided by playoffs (four UEFA paths and two intercontinental tournaments). Since the playoff winners are unknown, we simulate the playoffs inside each tournament iteration using the same Elo-based model. This means the bracket naturally reflects uncertainty about who qualifies. You'll see different teams filling each playoff slot across simulations.

Known limitations

Static ratings snapshot. The Elo ratings are frozen as of March 2026. They do not account for injuries, form changes, managerial changes, or any matches played after that date.
No style or matchup effects. The model treats every team as a single strength number. It doesn't know that some teams match up poorly against specific styles of play.
Uniform host advantage. All three host nations get the same +80 Elo boost regardless of which country hosts their matches. In practice, altitude in Mexico City, crowd composition, and travel distance all vary.
126-match calibration sample. Two World Cups is a small dataset. The calibration layer is intentionally conservative (α = 0.96) to avoid overfitting, but it means any real miscalibration pattern won't be corrected until more data is available.
Playoff teams use projected winners. The simulation resolves playoffs probabilistically, but the “most likely team” shown in the bracket display is just the favorite, not the certain qualifier.
Third-place advancement is approximate in display mode. The Monte Carlo engine resolves third-place advancement empirically inside each simulation. The bracket display uses a backtracking algorithm to find the strongest feasible assignment, which may differ slightly from the simulation's most frequent outcome.