[color=rgba(0, 0, 0, 0.8)]The UEFA Euro 2016 takes place in France from June 10 to July 10. I have made a statistical model to predict the scores of the games and would like to share it here.
[color=rgba(0, 0, 0, 0.8)]The model I applied is a modified version of the Dixon and Coles modelpublished in 1996. The underlying idea behind the algorithm is that the goals scored by each team can be safely assumed to follow a Poisson distribution . The algorithm calculates ‘attack’ and ‘defense’ parameters for a team based on it’s previous results. These are then used to find out the expected number of goals to be scored by each team in a match.Those in turn are used to find the most probable score of the game.
The data[color=rgba(0, 0, 0, 0.8)]I trained my model on matches part of the Euro (2004,08',12'), qualifiers for the Euro(2004,08',12',16') and the FIFA World Cup 2014 held in Brazil. The only data I had was the final score of the game and the date when the game was played. More advanced models use in-game statistics like number of touches in opponent half, shot accuracy etc. , but data for that is hard to find, even more so for international fixtures. One can always source that from opta but they charge a lot of money for that.
How the model works[color=rgba(0, 0, 0, 0.8)]At a high level, I calculate attack and defense parameters for each team and use those to find out the expected number of goals to be scored by the two teams( lambda and mu) based only on previous results. The attack and defense parameters are estimated by maximum likelihood estimation.
# home_attack and home_defence are the attack and defense parameters # respectively of the home team. Similarly for the away team as welllambda <- home_attack + away_defence + home_advantagemu <- away_attack + home_defence
- The model includes a time parameter to give more importance to recent games and lesser to older ones. The importance of the games is exponentially decreasing with the number months between that game and June 2016. This means that the qualifiers for the Euro 16' and the FIFA World Cup finals 14' hold much more importance than Euro 04'.
So the model tries to maximize the following function to predict the attack and defense parameters. This is run on all the matches I have taken in the data mentioned above.
# lambda and mu are the expected number of goals scored in the game.
# poisson(x,lambda) =( exp(-lambda)*(lambda)^x)/factorial(x)
# phi is a parameter to take into consideration the time factor.
[color=rgba(0, 0, 0, 0.8)]This step was incredibly important as it helped to take into consideration the case of Belgium, a team which has grown immensely in the past few years. The fact that they did not qualify for all the 3 Euros I considered(04,08 and 12) made it difficult to make predictions for them. This also took Spain’s brief downfall post the 2014 World Cup into account.
- My model also includes a home advantage parameter for France and also for Belgium. For, France because ,well it is hosting the tournament. And for Belgium to compensate for the lack of data in previous competitions and their stellar rise in the past few years.
- The model also includes the FIFA-Coca Cola rankings as of June 2016 as a parameter. Apart from that it also considers the goals scored and goals conceded by each team in the last 10 games.
- So finally I predict the mostly likely score of a game by using the attack and defense parameters, the FIFA rankings, home advantage and goals scored and conceded in the last 10 games. Here is how the predictions look for the group stages.

Update: Added outcome probabilities too.
[color=rgba(0, 0, 0, 0.8)]Now, obviously predicting the correct score is much more difficult than predicting the result- win,draw or loss. I predicted the most likely outcome for each game and based on that, played out the knock out stage fixtures as well. Have a look below.

[color=rgba(0, 0, 0, 0.8)]So the model predicts Spain to win the tournament. It makes sense based on the data, since they are holders from the previous two tournaments but their current dip in form is a worry and Germany is an incredibly tough team to beat. Although I believe the 4 semi finalists have high chances of actually making it in the tournament. Let’s hope for an exciting tournament and decent accuracy of the predictions!


雷达卡




京公网安备 11010802022788号







