楼主: oliyiyi
995 1

Building a statistical model for predicting UEFA Euro 2016 results. [推广有奖]

版主

已卖:2994份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
66105 个
通用积分
31671.0967
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384134 点
帖子
9629
精华
66
在线时间
5508 小时
注册时间
2007-5-21
最后登录
2025-7-8

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

楼主
oliyiyi 发表于 2016-6-28 12:17:32 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

[color=rgba(0, 0, 0, 0.8)]The UEFA Euro 2016 takes place in France from June 10 to July 10. I have made a statistical model to predict the scores of the games and would like to share it here.

[color=rgba(0, 0, 0, 0.8)]The model I applied is a modified version of the Dixon and Coles modelpublished in 1996. The underlying idea behind the algorithm is that the goals scored by each team can be safely assumed to follow a Poisson distribution . The algorithm calculates ‘attack’ and ‘defense’ parameters for a team based on it’s previous results. These are then used to find out the expected number of goals to be scored by each team in a match.Those in turn are used to find the most probable score of the game.

The data

[color=rgba(0, 0, 0, 0.8)]I trained my model on matches part of the Euro (2004,08',12'), qualifiers for the Euro(2004,08',12',16') and the FIFA World Cup 2014 held in Brazil. The only data I had was the final score of the game and the date when the game was played. More advanced models use in-game statistics like number of touches in opponent half, shot accuracy etc. , but data for that is hard to find, even more so for international fixtures. One can always source that from opta but they charge a lot of money for that.

How the model works

[color=rgba(0, 0, 0, 0.8)]At a high level, I calculate attack and defense parameters for each team and use those to find out the expected number of goals to be scored by the two teams( lambda and mu) based only on previous results. The attack and defense parameters are estimated by maximum likelihood estimation.

# home_attack and home_defence are the attack and defense parameters # respectively of the home team. Similarly for the away team as welllambda <- home_attack + away_defence + home_advantage
mu <- away_attack + home_defence
  • The model includes a time parameter to give more importance to recent games and lesser to older ones. The importance of the games is exponentially decreasing with the number months between that game and June 2016. This means that the qualifiers for the Euro 16' and the FIFA World Cup finals 14' hold much more importance than Euro 04'.
    So the model tries to maximize the following function to predict the attack and defense parameters. This is run on all the matches I have taken in the data mentioned above.
exp(-phi*months)*( log(poisson(y1, lambda)) + log(poisson(y2, mu)) )# y1 and y2 are the actual number of goals scored by the teams
# lambda and mu are the expected number of goals scored in the game.
# poisson(x,lambda) =( exp(-lambda)*(lambda)^x)/factorial(x)
# phi is a parameter to take into consideration the time factor.

[color=rgba(0, 0, 0, 0.8)]This step was incredibly important as it helped to take into consideration the case of Belgium, a team which has grown immensely in the past few years. The fact that they did not qualify for all the 3 Euros I considered(04,08 and 12) made it difficult to make predictions for them. This also took Spain’s brief downfall post the 2014 World Cup into account.

  • My model also includes a home advantage parameter for France and also for Belgium. For, France because ,well it is hosting the tournament. And for Belgium to compensate for the lack of data in previous competitions and their stellar rise in the past few years.
  • The model also includes the FIFA-Coca Cola rankings as of June 2016 as a parameter. Apart from that it also considers the goals scored and goals conceded by each team in the last 10 games.
  • So finally I predict the mostly likely score of a game by using the attack and defense parameters, the FIFA rankings, home advantage and goals scored and conceded in the last 10 games. Here is how the predictions look for the group stages.



Update: Added outcome probabilities too.

[color=rgba(0, 0, 0, 0.8)]Now, obviously predicting the correct score is much more difficult than predicting the result- win,draw or loss. I predicted the most likely outcome for each game and based on that, played out the knock out stage fixtures as well. Have a look below.




[color=rgba(0, 0, 0, 0.8)]So the model predicts Spain to win the tournament. It makes sense based on the data, since they are holders from the previous two tournaments but their current dip in form is a worry and Germany is an incredibly tough team to beat. Although I believe the 4 semi finalists have high chances of actually making it in the tournament. Let’s hope for an exciting tournament and decent accuracy of the predictions!


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Statistical statistica predicting statistic Building version follow

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

沙发
h2h2 发表于 2016-6-28 14:33:22
Thanks
已有 1 人评分论坛币 收起 理由
oliyiyi + 10 精彩帖子

总评分: 论坛币 + 10   查看全部评分

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-1 10:38