There is nothing wrong with TrueSkill.
In case you’re wondering how TrueSkill actually calculates rank, visit here for the general discussion and here for the details. Assuming correct input information, it is a very robust method of determining player skill.
The problem with TrueSkill in Halo 4 is that the game does not use it appropriately.
The purpose of TrueSkill is - when presented with the ranks of the teams - to predict which team will win. If the ranks differ by a lot, the team with the higher rank is supposed to win. As the difference get smaller, the uncertainty in the outcome increases. The purpose of the matchmaking system is to get teams that are close enough in rank that TrueSkill cannot predict the victor. This should yield close matches.
There are 2 ways H4 uses TrueSkill - individual and team. Each way has a unique set of problems in the H4 implementation, along with some common problems. First, the common problems:
COMMON
- H4 does not initially estimate rank properly.
TrueSkill is an iterative estimation algorithm. This is known in the statistics world as “expectation maximization”. It’s a special class of EM algorithms, where both the information available and the parameter being measured (skill) vary with time.
For EM algorithms to work efficiently, the initial estimate of the parameter needs to be as close to the expected value as possible. In H4, all ranks start at 1. However, the median rank after a playlist has been around a while is definitely not 1. Maybe it’s 10 - or 15 - or 20 - I do not have that information. But 343 does. They have the TrueSkill information for all players. All that is needed is to remove players with little playing time (say 20 games) in a particular list, and calculate the median TrueSkill of the remainder.
That median TrueSkill is not likely to vary tremendously from list to list, and if it does, then the median of the lists can be easily calculated as well. That should be the initial rank estimate for every player. About half the players will be worse than that, and about half will be better. Worse players will rank down almost immediately, while better players will likewise rank up. This assures better matches right off the bat compared to H4 starting everyone at 1.
The reason it is so much more efficient has both a statistical (not going to discuss) and qualitative explanation. The qualitative explanation is that, if everyone starts at 1, every time a new player enters the list, he must play against the worst existing players. So the guy who is legitimately a CSR 11 now loses badly to a CSR 1 (who really has CSR 50 skill). And he loses again - and again - and again . . . until the CSR 1 has accumulated enough data to move him past the CSR 11 guy (who now sits at CSR 5) and into better matches.
But if everyone starts at the median, the CSR 11 guy never plays that CSR 1 guy, because the CSR 1 guy starts at the median, and continues winning and ranking up until he reaches his 50. And a player who legitimately is a CSR 2 when he enters the list gets destroyed, downranks, and quickly begins being put into matches against other CSR 2s, rather than against CSR 40s and 50s who just haven’t accumulated enough data.
This, incidentally, is likely the reason lists like Proving Grounds and TTD rapidly become high-skill players only. It takes so long for everyone to rank up that the low-skill guys never stop being crushed. And since that isn’t fun, they stop populating the list.
- H4 uses the wrong adjustable parameter.
There is an adjustable parameter (k) listed at the end of the second link above. This parameter essentially determines how rapidly a player is allowed to rank up or down by changing the uncertainty estimate for the player’s skill. For the rotational lists, a small value is used. For the regular playlists, a large value is used. The larger the value, the more accurate the long-term value will be, but short-term estimation suffers. Based on how the rotational lists shake out rank-wise, the additional accuracy offered by the higher value of k in the regular lists is entirely unnecessary. H4 simply uses too high a value. A value somewhere in between the regular list and rotational list would be more than suitable (and, if the Trueskill algorithm allows assigning k on a per-player basis, it could always start out small for new players and increase gradually as they accumulate matches in that list).
INDIVIDUAL
- H4 gives information unrelated to the ability to win to TrueSkill.
If TrueSkill is meant to be a predictor of victory (and it is), only information that is indicative of being able to win or lose should be included. Unfortunately, H4 does not do this.
Consider a 4v4 Slayer match: Every time you kill somebody, you move your team 1/60th (or 1/50th) of the way to victory. Every time you die due to the other team, you allow them the same movement to victory. If you get an assist, you helped move your team to victory by some amount less than 1/60th but more than 0 . . . so the middle of that is likely a reasonable estimate. If you snapshot someone, or get a comeback kill, or blow up a hog with only a driver, you move exactly the same distance toward victory as you do 4-shotting someone.
However, H4 gives bonus points for certain activities in the players’ scores. When it feeds the information to TrueSkill, it feeds those scores - not just K/A/D information. This is what “breaks” CSR. Activities unnecessary for victory are used to determine ranks. So if you rank people using activities that don’t help them win, is it any surprise that TrueSkill does not do a good job of predicting victory? It may do a good job at predicting individual player scores - but not victory. It actually doesn’t even do a good job with the scores, because most of the bonus points are predicated on getting the kill in the first place - so if it cannot accurately predict kills (winning), it also will not be able to accurately predict scores.
It’s fine for H4 to include those points for its own internal scoreboard, but it’s not fine to include them for ranking purposes. What gets processed through the TrueSkill algorithm for Slayer should be only K/A/D. The exact formula could be debated, but simply using score = K - D would be a huge improvement over what is done now.
CONTINUED BELOW →