Since Khiry Shelton returned from injury he has played just over 200 minutes, with Sporting Kansas City scoring nine goals while allowing only one. This has prompted a resurgence in the eternal Khiry Shelton vs Diego Rubio debate, to which this article will add a team-based statistical perspective. When looking at individual statistics, it was not hard to see that Rubio consistently beat out Shelton in goals per game, assists per game, expected goals per game and virtually any other offensive statistic you cared to choose. But Shelton’s case never rested on his individual goal scoring prowess, but rather on the effect he had on overall team performance.
Therefore this comparison will examine performance only at the team level with an emphasis on expected goal difference. While focusing on goals scored might seem more reasonable since the question was which striker leads to the best offensive performance, there are two reasons for choosing goal difference over raw goals.
The first is that pushing forward to create more chances increases the team’s vulnerability to counter attacks. I don’t think it is a coincidence that SKC has gotten four goals from their fullbacks (two each from Jimmy Medranda and Graham Zusi) and that all four came in the first eight games of the season when SKC’s defense was uncharacteristically poor.
The second reason is that some have argued Shelton’s defensive contribution is what justifies his claim to a spot in the starting lineup, so it makes sense to take a look at defensive performance under each striker. Note that while Shelton and Rubio are the two in the limelight, I also threw in the stats for Daniel Salloi and Krisztian Nemeth for comparison. But first, let's get one thing out of the way:
It is an undeniable fact that Sporting KC has a better record when Croizet is the starting striker.
The above statement is technically true: SKC is a perfect 1-0-0 when Yohan Croizet starts at striker. Obviously no one believes that he is best choice at striker, but it helps illustrate an important point: sample size is important. Soccer is a chaotic sport, which is why it is notoriously difficult to analyze statistically. While it might be tempting to think that with a full season in the books sample sizes are sufficiently large to make high confidence assessments, the fact of the matter is that no one player has gotten more than fifteen starts at center forward, and that is not a particularly large number, statistically speaking.
Beyond that obstacle is the fact that goal scoring is a low probability event and therefore one susceptible to very high variance. Lowering this variance is precisely why statistical measures like shots per game and expected goals per game were introduced. That is why I am skeptical of claims like the one Andy Edwards made on Twitter:
#SportingKC record when Shelton starts at CF:— Andy Edwards (@AndyEdMLS) October 21, 2018
8W-4D-2L (+14 GD)
Shelton, Russell, Salloi start together:
8W-3D-1L (+16 GD)
Expected goal difference tells a different story, and using 538’s shot based xG model we have the following averages: Shelton: +0.29 xG/game, Rubio: +0.76 xG/game, Salloi: +0.44 xG/game, Nemeth: + 0.90xG/game. Of course there are several problems with these statistics. The first is of course the small sample size issue alluded to earlier. This is particularly apparent with Nemeth’s numbers, which are clearly padded by the San Jose game which Sporting won 5-1. Another is that it does not take into account the fact that Shelton and Salloi have more starts on the road than at home, while the reverse is true for Rubio. There isn’t anything to be done about the sample size, but we can address the home/away skew. I’ve also included the actual average goal difference for reference, though to be clear I think the xG numbers are more reliable.
Average xG Difference, 538 shot-based model
Average Goal Difference
Note again that Nemeth’s away numbers are heavily inflated due to the 5-1 win over San Jose, and with only four games as a starter I’d honestly ignore his numbers anyway. Salloi also has a very small sample size, with only two home games as striker against Minnesota United and the Houston Dynamo, neither of which have particularly competent defenses.
What is really interesting is that while many have suggested that Shelton’s physicality is more advantageous in a combative away game, Shelton actually has a comparative advantage in home games while Rubio has the edge in away games. However, it does make a certain amount of sense. Shelton’s strength is his ability to pull defenders away from the goal and thus create space for the wingers and midfielders. This could be more useful in home games, where opposing teams tend to sit deeper and pack the box and SKC relies on deeper lying players like Zusi and Ilie to be playmakers.
Conversely on the road Sporting find it harder to push into the final third, and are aided by Rubio’s superior passing in and around the box. Rubio also draws fouls at a higher rate, which helps relieve the added pressure Sporting faces on the road while adding additional scoring opportunities via set pieces. This stylistic difference is highlighted by 538’s non-shot based model, which evaluates a team’s attack based on things like touches in and around their opponent’s penalty area. I’ve also included SKC’s points per game as an additional point of comparison.
Average xG Difference, 538 non shot-based model
Points Per Game
The non-shot model isn’t nearly as high on Shelton at home as the shot based model, but I think that largely reflects stylistic differences. Note again that Rubio is ahead in every category on the road.
The final model I want to highlight is American Soccer Analysis’ team xG model. This model attempts to quantify a team’s offensive performance more accurately than a pure shot based model. One key difference is that this model gives less weight to penalty kicks, since these are viewed as mostly random occurrences. Shots from rebound scenarios are likewise given a lower weight since they only occur if the initial shot misses.
Average xG Difference, ASA Team Model
This model again puts Rubio ahead of Shelton, even at home, but it is close. Away from home, Rubio leads by a large margin. (Yet again, Nemeth’s start at the 5-1 demolition of San Jose is making him look better than he would otherwise.)
So Rubio on the road and Shelton at home, right?
Well, there is problem. Remember the small sample size warning for comparing the performances of Rubio and Shelton overall? Well it goes double when you cut the samples in half in order to account for home/away differences. And even the biggest differences still contain a large degree of uncertainty: using bootstrapping (an actual statistical technique which is ever so slightly more advanced than its name suggests) to test the hypothesis that Shelton has a better goal difference than Rubio at home gives a test statistic of 0.13.
Now this does not exactly mean that there is only a 13% chance that the hypothesis of Rubio being as good as Shelton at home is true, but it is the closest we can get to it. This same technique gives a value of 0.55 when testing Shelton at home vs Rubio at home using the ASA team expected goal model, and a value of 0.18 when testing Rubio on the road against Shelton on the road using the ASA team expected goal model.
Now none of these estimated p-values meet the statistician’s dream of being less than 0.05, but in the real world you don’t need 95% confidence to make an informed choice, and I think that in this situation most people wouldn’t be too hesitant to go with the one which has an 80+ percent confidence level. But where you fall in the Rubio vs Shelton debate will depend on which model you choose. If you look only at the raw goal difference, the stats suggest Shelton is better than Rubio at home and the two are equal on the road. If you look at the underlying chance creation numbers, the stats suggest Rubio is better than Shelton on the road and equal to Shelton at home.
Decision making under uncertainty is difficult.
There is no known way to distinguish between an improbable sequence of observations and a flaw within the model, so statistics speaks not the language of absolute certainty. True believers in the finality of the scoreline have reason to argue that Shelton should start every game. Hard core stat geeks have full cause to argue that Rubio should start every game. But there is also a middle way. For the statistically agnostic who wouldn’t take a wager on one model over the other, minimizing risk produces a third strategy: at home, one model suggests Shelton is better than Rubio and the other says that the two are equally effective. So start Shelton. On the road, one model says the two are essentially equal, while the other says that Rubio is better. So start Rubio.