Introduction
MOVAM was introduced in this post. Adjusting data to take context into account ought to add value to that data; in this case, the data are as simple as possible is sport: margins of victory in Premier League matches. The context is the opponent and venue of the games played by each team. The performances of teams which have played an 'easier' set of fixtures will be adjusted downwards and vice versa.
The benefits of the MOVAM are:
- Unbiased – Only match results are used to calculate the value, there are no arbitrary weightings.
- Meaning – The value of the MOVAM metric means something: it is the average margin of victory above average. Thus, a team with a MOVAM of 1 has performed, on average, 1 goal better than a team with a MOVAM of 0.
This post comprises 3 stages:
- The calculation of home and away MOVAM values for each Premier League team – a retrospective metric which assesses the performance level achieved so far this season (PAST).
- The estimation of a predicted full-season MOVAM measure, based on the historic distribution of full-season MOVAMs – an estimate of the actual level of each team (PRESENT).
- A Monte Carlo forecast of the final Premier League table, showing each team's likelihood of finishing in each position (FUTURE).
Calculation of MOVAM
Method
The calculation of MOVAM is explaied in more detail in this post. The iterative procedure used may be simply described as follows:
- Initial estimated values for the Home and Away MOVAMs of each team are set [for the first iterative step, the initial assumption is that each team has a MOVAM of 0].
- For each game, the margin for each team is goals scored - goals conceded.
- This margin is adjusted by adding the appropriate (Home/Away) MOVAM for the opponent.
- Taking the mean of these adjusted margins for all of a team's Home games gives the next estimate of the team's Home MOVAM and likewise for Away games [in fact, a damped iterative process is used to ensure convergence – the next estimate is a weighted average of the previous estimate and this mean value].
- This process is repeated, with the output from one step forming the initial estimate for the next step, until the MOVAM values have converged.
Results
Shown below is the MOVAM for each team. Overall MOVAM is a weighted average of the home and away MOVAM, based on the venues of the games played.
Estimate of full-season MOVAM
Method
The MOVAM calculated above is a retrospective metric which only uses a sample of observations (not all games have been played). Using methods from hypothesis testing and conditional probability, however, can allow the nature of the whole distribution (as opposed to the sample) to be estimated.
A systematic process is adopted, cycling through possible values for the full season MOVAM. For each of possible value, the probability of this value being the true value of the full season MOVAM given the observed MOVAM thus far is calculated:
P(True FSMOVAM = Guessed FSMOVAM | Sample of FSMOVAM = Observed MOVAM) = P(True FSMOVAM = Guessed FSMOVAM) * P(Sample of FSMOVAM = Observed MOVAM | True FSMOVAM = Guessed FSMOVAM) / P(Sample of FSMOVAM = Observed MOVAM).
The value of Guessed MOVAM which maximises this probability is considered the “best guess” for FSMOVAM. The calculation above is simplified by noting that, for any given MOVAM observation, the denominator of this fraction is the same for all possible values of Guessed MOVAM. Therefore, two probabilities must be calculated and multiplied together:
- P(True FSMOVAM = GUessed FSMOVAM) – For this, historic values of FSMOVAM are collected for all Premier League team-seasons since 1995-6, the first season with the current 20 teams and 38 games per season. A normal distribution was constructed with these observations and basic probabilistic methods used to calculate the probability that a team would have a FSMOVAM equal to the current guessed value.
- P(Sample of FSMOVAM = Observed MOVAM | True FSMOVAM = Guessed FSMOVAM). Hypothesis testing methods described here, using a calculation of Standard Error, was used to calculate this probability.
Forecast of the final Premier League table
Method
Using the FSMOVAM estimated above and the standard deviation of performances observed thus far, a Monte Carlo process is used to simulate margins in each game for the rest of the season, 10,000 times. For each of these simulations, the final league table is recorded to give probabilities of each team finishing in each league position.
Results
Shown below are the full probabilities for each team/position combination. Percentage values are only shown when they round to at least 1%.
Pertinent information may be extracted from this table and presented in a more simplified way; for example the probabilities associated with a top 4 finish, ensuring qualification fo the Champions League (teams with a probability lower than 5% are not shown):
And the probability of relegation (teams with a probability lower than 5% are not shown):
Shortcomings of Method
Due to its simplicity, there are a number of factors not considered by this method which limit the power of its results:
- Improvement/regression throughout the season – while the PRESENT part of the computation attempts to account for a team's regression to its own mean during the season, any progressive improvement or regression is not accounted for. The estimated full season MOVAM is based on an average of the season so far. A team, such as Liverpool in 2013-14, which begins the season slowly but improves (Liverpool were, according to most observers, the best side in the Premier League during the season half of that season) is not credited with its current level of performance.
- Only results are used in the calculations – while this forms part of the attractive simplicity of this model, it is also true that the result does not tell the full story about a given match. As the sample size increases then the effect of luck should be reduced, but the sample is never greater than 19 home and away games so it is likely that nuance will be lost.
- Player information – this model does not consider “news” concerning player availability in its calculations. In the PAST calculation, the unavailability of a key player may provide some mitigation in a defeat and in the FUTURE calculation, knowledge of upcoming suspensions, current injuries or recent transfers in/out would add value to forecasts.
- Changes in coaching – it is clear that changes in coaching can lead to a large upswing or downswing in a team's performances and results. This model cannot take such changes into account.
If one is aware of the shortcomings listed, the model developed here can be useful in providing a baseline for ranking teams and forecasting season outcomes. Updated forecasts will be tweeted quite frequently and posted here occasionally.
No comments:
Post a Comment