The model
Independent variables
- Difference in ELO scores (taken from eloratings.net/)
- Dummy variable to identify hosts
- Dummy variable to identify teams from the same confederation as the hosts (usually nations from the same continent)
Model generation
Two logistic regressions were run, using the same variables but different training sets. One used all group games from the last 3 World Cup finals (referred to subsequently as the Group Model) and the other used all knockout games (Knockout model). For knockout games where extra time was played, the result after extra time was taken. Otherwise, the result after 90 minutes was used.Comparison of models
It is noted that there are differences between the variable weights between the Group Model and the Knockout model.Group Model
Knockout Model
Home nation advantage is strong in the group stage. In the knockout stage, a home nation still has an advantage, but there is a larger advantage for a team in the same confederation without the host status.
Simulations
A Monte Carlo simulation is run with 2,500 trials as described below.Group stages
The Group Model is used to calculate probabilities for each outcome (either team winning or a draw) for each group game in the 2018 World Cup and simulate a result. This is done chronologically. After each game is simulated, the ELO score for each team is updated. Once all group games have been simulated, group tables are calculated. Since the model only generates a result, not goals scored or winning margin, ties are broken randomly.Knockout stages
These tables allow the knockout fixtures to be generated. Results for these are generated according to probabilities derived using the Knockout model. Any draws are settled randomly (note that a draw here refers to a game level at the end of extra time ie. a game decided by penalties). Again, ELO scores for each team are updated following each game.Results
Following 2,500 simulations, probabilities can be generated for each stage of progress for each team. These are shown below, with the table sorted by winning probability, then probability of reaching the final, etc. Blanks indicate an outcome that did not occur in any of the simulations.No comment is offered on these results.
Known shortcomings of the method
No single number, such as ELO rating, can ever truly capture the nature of a team. It is blind to player form or fitness, tactics and any particularly favourable or unfavourable positional matchups between two teams.Breaking ties randomly, particularly in the group stage is probably not optimal. It would be reasonable to expect that in a tie, the better team would hold the superior goal difference even though the points tallies are the same. For knockout matches this is less of a factor as it is only penalty shootouts which are determined randomly. Although there is certainly an element of skill in a penalty shootout, it is definitely very subject to randomness.
No comments:
Post a Comment