How I created an Expected points table for Indian Super League using Python
In my previous article, I covered how I scraped stats such as xG, xG open play from fotmob.com. I’ll be using the data collected from that for making an expected points table.
Expected points signify how many points a team should have been expected to get based on the chances they created. The value ranges from 0 to 3 points. For a team to get all 3 xPoints they must not allow the opposite team to have even one single shot and thus generate an xG of 0. Such scenarios are very rare and hardly ever happen therefore dominant teams get an xP of around 2.1 to 2.7.
How to calculate expected points
Expected points are calculated on the basis of expected goals which is a metric that quantifies the quality of a shot. For instance, a penalty has an xG of 0.75 approx which is like saying a 75% chance of scoring from that spot.
For calculating expected points, you’ll need all the shots that were taken and their respective xG values. Then the whole game is simulated say about 10,000 times and on each simulation, the results are recorded based on the goals scored. The goals scored are estimated using the xG values, each shot compared against a random value that lies between 0 to 1.
Suppose a shot has an xG of 0.24, a random number is generated and the odds are that it will be a number below 0.24, 24% of the time, and above 0.24 76% of the time. If the randomly generated number was less than the xG value then the shot can be considered as a goal scored. Else if it is higher then it is not scored. In this example, the shot with 0.24 xG will have a random number generated that is lower than it 24% of the time, which accurately represents this shots’ xG.
Although this way of calculating expected points is the norm, we could also use the aggregated xG values to find the expected points. Since I did not have the xG value of each shot taken, I used the table below for my algorithm.
Calculating expected points with this method is quite simple and the results are fairly accurate. Let’s dive into the code.
From the scraped data, I had the stats in the per match format, these were the columns of the data frame with the scraped info.
1 match_id 110 non-null int64
2 home_team 110 non-null object
3 away_team 110 non-null object
4 home_team_score 110 non-null float64
5 away_team_score 110 non-null float64
6 home_xG 110 non-null float64
7 away_xG 110 non-null float64
8 home_shots 110 non-null float64
9 away_shots 110 non-null float64
10 home_xG_first_half 110 non-null float64
11 away_xG_first_half 110 non-null float64
12 home_xG_second_half 110 non-null float64
13 away_xG_second_half 110 non-null float64
14 home_xG_Open_Play 110 non-null float64
15 away_xG_Open_Play 110 non-null float64
16 home_xG_Set_Play 110 non-null float64
17 away_xG_Set_Play 110 non-null float64
18 home_xGOT 110 non-null float64
19 away_xGOT 110 non-null float64
- Create an xG difference column
df['xG_differential'] = df['home_xG'] - df['away_xG']
2. Allotting expected points based on xG difference
for idx, row in df.iterrows():
if df.loc[idx,'xG_differential']>1.5:
df.loc[idx,'home_xP'],df.loc[idx,'away_xP'] = 2.7,0.3
elif df.loc[idx,'xG_differential']>1.0 and
df.loc[idx,'xG_differential']<1.5:
df.loc[idx,'home_xP'],df.loc[idx,'away_xP'] = 2.3,0.7
elif df.loc[idx,'xG_differential']>0.5 and
df.loc[idx,'xG_differential']<1.0:
df.loc[idx,'home_xP'],df.loc[idx,'away_xP'] = 2.0,1.0
elif df.loc[idx,'xG_differential']>0 and
df.loc[idx,'xG_differential']<0.5:
df.loc[idx,'home_xP'],df.loc[idx,'away_xP'] = 1.5,1.5
elif df.loc[idx,'xG_differential']>-0.5 and
df.loc[idx,'xG_differential']<0:
df.loc[idx,'home_xP'],df.loc[idx,'away_xP'] = 0.7,2.3
elif df.loc[idx,'xG_differential']>-1.0 and
df.loc[idx,'xG_differential']<-0.5:
df.loc[idx,'home_xP'],df.loc[idx,'away_xP'] = 0.5,2.5
elif df.loc[idx,'xG_differential']>-1.5 and
df.loc[idx,'xG_differential']<-1.0:
df.loc[idx,'home_xP'],df.loc[idx,'away_xP'] = 0.3,2.7
elif df.loc[idx,'xG_differential']>-1.5:
df.loc[idx,'home_xP'],df.loc[idx,'away_xP'] = 0.1,2.9
3. Splitting the dataframe — Home and Away matches:
df_home = df.groupby(df['home_team'])
df_away = df.groupby(df['away_team'])
4. Aggregating the expected points:
team_xPoints = df_home['home_xP'].sum() + df_away['away_xP'].sum()
team_xPoints = team_xPoints.reset_index() #converts to a df
team_xPoints.rename( columns={'home_team':'team',0 :'xPoints'}, inplace=True )
team_xPoints = team_xPoints.sort_values(['team'])
team_xPoints
Conclusion
And that’s how I calculated the expected points for Indian Super League teams. Jamshedpur FC were deserving winners of the ISL shield since they had the highest expected points and the highest actual points.
Using this table you can judge how a team actually performed, for instance, FC Goa finished 9th in the actual table even after creating more chances for the opposition on various occasions. Sometimes you do need a little bit of luck to get through.
Thank you for reading, do check out my other articles on football analytics. Any kind of suggestions or comments would be appreciated.