TrueSkill ranks are broken(and how to fix it)

I’m normally average at Halo. However, today, my friend and I were playing team doubles, and for 4 matches, we totally face rolled our opposition. Not because we were good, but because our enemies were bad. And then CSR decides “derp derp u iz gud” and decides to match us with players high above our actual skill.

We were the ones that got face rolled. It was not fun at all.

Skill ranks need to work like this: When you win or lose a match, the game needs to cross-reference everyone’s performance in the past x matches, then adjust your skill rank. Say my friend and I beat a CSR 3 and 4 no contest and friend and I are CSR 1. The game think:
-Player A(me) went 16 kills and 4 deaths.
-Player B(my friend) went 14 kills and 3 deaths.
-Player C went 5 kills and 12 deaths.
-Player D went 3 kills and 18 deaths.

It would then compile data and adjust skill ranks. As we won with very little competition, the game keeps my party at a CSR of 1. If we keep beating enemies with little effort, we will stay at CSR 1 until we have a game where it is closer and our skills are pushed to the limit.

Honestly, the skill ranks that Halo 3 and 4 uses really doesn’t show my true skill.

> I’m normally average at Halo. However, today, my friend and I were playing team doubles, and for 4 matches, we totally face rolled our opposition. Not because we were good, but because our enemies were bad. And then CSR decides “derp derp u iz gud” and decides to match us with players high above our actual skill.
>
> We were the ones that got face rolled. It was not fun at all.
>
> Skill ranks need to work like this: When you win or lose a match, the game needs to cross-reference everyone’s performance in the past x matches, then adjust your skill rank. Say my friend and I beat a CSR 3 and 4 no contest and friend and I are CSR 1. The game think:
> -Player A(me) went 16 kills and 4 deaths.
> -Player B(my friend) went 14 kills and 3 deaths.
> -Player C went 5 kills and 12 deaths.
> -Player D went 3 kills and 18 deaths.
>
> It would then compile data and adjust skill ranks. As we won with very little competition, the game keeps my party at a CSR of 1. If we keep beating enemies with little effort, we will stay at CSR 1 until we have a game where it is closer and our skills are pushed to the limit.
>
> Honestly, the skill ranks that Halo 3 and 4 uses really doesn’t show my true skill.

Halo 3’s actually shows quite a bit when you give it a sizable amount of games to read from.

Halo 4’s suffers from lack of population, as does Halo 3’s in it’s current status. However back in the day Halo 3 was pretty good at giving you an idea of what your opponent was capable of.

Please stop blaming the population. It really does have nothing to do with it. Halo 4’s ranking system prefers to match you with anyone, and I mean anyone, as fast as possible.

Halo 3 actually matches you with people within your skill range.

> Halo 4’s ranking system prefers to match you with anyone, and I mean anyone, as fast as possible.

It’s not really the ranking system per se, but how the matchmaking system prioritizes speed over everything. This has been an issue since Reach.

> It would then compile data and adjust skill ranks. As we won with very little competition, the game keeps my party at a CSR of 1. If we keep beating enemies with little effort, we will stay at CSR 1 until we have a game where it is closer and our skills are pushed to the limit.

So, how would anyone rank up from day 1?

Beating another team means that you’re better than them and thus you should rank up in order to get a better match up. Keeping you at rank 1 doesn’t accomplish anything else than giving you more uneven matches and keeping your rank low for a much longer time.

Even if you jump in after a month or so provided enough players have ranked up, you’re still going to spend more time playing uneven matches than even matches because it’ll be rare for an even match with a low rank if you’re very good.

> Please stop blaming the population. It really does have nothing to do with it. Halo 4’s ranking system prefers to match you with anyone, and I mean anyone, as fast as possible.
>
> Halo 3 actually matches you with people within your skill range.

Actually, population has everything to do with how good matchups you get. Why? Because with a smaller population there’s a smaller player pool to choose from, and thus in your skill range a much much smaller amount of players.

Either you sacrifice waiting time for good match ups where you wait for 5 minutes finding a good connection and good players.

Or you sacrifice the match up for faster match up times.

> > It would then compile data and adjust skill ranks. As we won with very little competition, the game keeps my party at a CSR of 1. If we keep beating enemies with little effort, we will stay at CSR 1 until we have a game where it is closer and our skills are pushed to the limit.
>
> So, how would anyone rank up from day 1?
>
> Beating another team means that you’re better than them and thus you should rank up in order to get a better match up. Keeping you at rank 1 doesn’t accomplish anything else than giving you more uneven matches and keeping your rank low for a much longer time.
>
> Even if you jump in after a month or so provided enough players have ranked up, you’re still going to spend more time playing uneven matches than even matches because it’ll be rare for an even match with a low rank if you’re very good.
>
>
>
> > Please stop blaming the population. It really does have nothing to do with it. Halo 4’s ranking system prefers to match you with anyone, and I mean anyone, as fast as possible.
> >
> > Halo 3 actually matches you with people within your skill range.
>
>
> <mark>Actually, population has everything to do with how good matchups you get. Why? Because with a smaller population there’s a smaller player pool to choose from, and thus in your skill range a much much smaller amount of players.</mark>
>
> Either you sacrifice waiting time for good match ups where you wait for 5 minutes finding a good connection and good players.
>
> Or you sacrifice the match up for faster match up times.

Obviously, but even at Halo 4’s population level one could still easily have found close games within a reasonable time with a stricter ranking system.

People within the Rank GrandMaster in Starcraft 2 only has a pool of 200 players to draw people from.

It’s just that 343 never wanted you to find very close games to begin with …

> > Halo 4’s ranking system prefers to match you with anyone, and I mean anyone, as fast as possible.
>
> It’s not really the ranking system per se, but how the matchmaking system prioritizes speed over everything. This has been an issue since Reach.

The ranking system compromises the matchmaking system. What does it matter whether you use the word matchmaking system, or ranking system?

> Obviously, but even at Halo 4’s population level one could still easily have found close games within a reasonable time with a stricter ranking system.
>
> People within the Rank GrandMaster in Starcraft 2 only has a pool of 200 players to draw people from.
>
> It’s just that 343 never wanted you to find very close games to begin with …

Perhaps or perhaps not.

However the GrandMaster rank and games are a little different as the “playlist” is limited the what I’d say is the 200 best players in that division/league. As opposed to a flowing number of random players. The skill range is a lot smaller between the top 200 I’d imagine than what appears in Halo 4.

> It would then compile data and adjust skill ranks. As we won with very little competition, the game keeps my party at a CSR of 1.

That makes no sense. If you won against CSR 3 and 4 (especially “no contest”), you are obviously better than CSR 3 and 4. The only logical thing to do would be to rank you up so you can play against better players.

Just like if you happen to get to CSR 20 and then consistently lose horribly to CSR 15s, you are obviously not better than CSR 15.

> The ranking system compromises the matchmaking system. What does it matter whether you use the word matchmaking system, or ranking system?

They’re not the same. The ranking system only assigns a value to players to represent their skill relative to other players. The matchmaking system decides how close the skill ranks must be, how long it will search before broadening the scope of ranks, how much weight is put into matching based on DLC, connection, etc.

Else answer me this: how is DLC content taken into account in ranks? Does it make the skill rank higher or lower?

There is nothing wrong with TrueSkill.

In case you’re wondering how TrueSkill actually calculates rank, visit here for the general discussion and here for the details. Assuming correct input information, it is a very robust method of determining player skill.

The problem with TrueSkill in Halo 4 is that the game does not use it appropriately.

The purpose of TrueSkill is - when presented with the ranks of the teams - to predict which team will win. If the ranks differ by a lot, the team with the higher rank is supposed to win. As the difference get smaller, the uncertainty in the outcome increases. The purpose of the matchmaking system is to get teams that are close enough in rank that TrueSkill cannot predict the victor. This should yield close matches.

There are 2 ways H4 uses TrueSkill - individual and team. Each way has a unique set of problems in the H4 implementation, along with some common problems. First, the common problems:

COMMON

  1. H4 does not initially estimate rank properly.

TrueSkill is an iterative estimation algorithm. This is known in the statistics world as “expectation maximization”. It’s a special class of EM algorithms, where both the information available and the parameter being measured (skill) vary with time.

For EM algorithms to work efficiently, the initial estimate of the parameter needs to be as close to the expected value as possible. In H4, all ranks start at 1. However, the median rank after a playlist has been around a while is definitely not 1. Maybe it’s 10 - or 15 - or 20 - I do not have that information. But 343 does. They have the TrueSkill information for all players. All that is needed is to remove players with little playing time (say 20 games) in a particular list, and calculate the median TrueSkill of the remainder.

That median TrueSkill is not likely to vary tremendously from list to list, and if it does, then the median of the lists can be easily calculated as well. That should be the initial rank estimate for every player. About half the players will be worse than that, and about half will be better. Worse players will rank down almost immediately, while better players will likewise rank up. This assures better matches right off the bat compared to H4 starting everyone at 1.

The reason it is so much more efficient has both a statistical (not going to discuss) and qualitative explanation. The qualitative explanation is that, if everyone starts at 1, every time a new player enters the list, he must play against the worst existing players. So the guy who is legitimately a CSR 11 now loses badly to a CSR 1 (who really has CSR 50 skill). And he loses again - and again - and again . . . until the CSR 1 has accumulated enough data to move him past the CSR 11 guy (who now sits at CSR 5) and into better matches.

But if everyone starts at the median, the CSR 11 guy never plays that CSR 1 guy, because the CSR 1 guy starts at the median, and continues winning and ranking up until he reaches his 50. And a player who legitimately is a CSR 2 when he enters the list gets destroyed, downranks, and quickly begins being put into matches against other CSR 2s, rather than against CSR 40s and 50s who just haven’t accumulated enough data.

This, incidentally, is likely the reason lists like Proving Grounds and TTD rapidly become high-skill players only. It takes so long for everyone to rank up that the low-skill guys never stop being crushed. And since that isn’t fun, they stop populating the list.

  1. H4 uses the wrong adjustable parameter.

There is an adjustable parameter (k) listed at the end of the second link above. This parameter essentially determines how rapidly a player is allowed to rank up or down by changing the uncertainty estimate for the player’s skill. For the rotational lists, a small value is used. For the regular playlists, a large value is used. The larger the value, the more accurate the long-term value will be, but short-term estimation suffers. Based on how the rotational lists shake out rank-wise, the additional accuracy offered by the higher value of k in the regular lists is entirely unnecessary. H4 simply uses too high a value. A value somewhere in between the regular list and rotational list would be more than suitable (and, if the Trueskill algorithm allows assigning k on a per-player basis, it could always start out small for new players and increase gradually as they accumulate matches in that list).

INDIVIDUAL

  1. H4 gives information unrelated to the ability to win to TrueSkill.

If TrueSkill is meant to be a predictor of victory (and it is), only information that is indicative of being able to win or lose should be included. Unfortunately, H4 does not do this.

Consider a 4v4 Slayer match: Every time you kill somebody, you move your team 1/60th (or 1/50th) of the way to victory. Every time you die due to the other team, you allow them the same movement to victory. If you get an assist, you helped move your team to victory by some amount less than 1/60th but more than 0 . . . so the middle of that is likely a reasonable estimate. If you snapshot someone, or get a comeback kill, or blow up a hog with only a driver, you move exactly the same distance toward victory as you do 4-shotting someone.

However, H4 gives bonus points for certain activities in the players’ scores. When it feeds the information to TrueSkill, it feeds those scores - not just K/A/D information. This is what “breaks” CSR. Activities unnecessary for victory are used to determine ranks. So if you rank people using activities that don’t help them win, is it any surprise that TrueSkill does not do a good job of predicting victory? It may do a good job at predicting individual player scores - but not victory. It actually doesn’t even do a good job with the scores, because most of the bonus points are predicated on getting the kill in the first place - so if it cannot accurately predict kills (winning), it also will not be able to accurately predict scores.

It’s fine for H4 to include those points for its own internal scoreboard, but it’s not fine to include them for ranking purposes. What gets processed through the TrueSkill algorithm for Slayer should be only K/A/D. The exact formula could be debated, but simply using score = K - D would be a huge improvement over what is done now.

CONTINUED BELOW →

  1. H4 gives no win bonus for individual lists.

It may seem counterintuitive that a win bonus would be necessary for individual lists, but it is absolutely required. The reason is not what most might think, however.

Remember that whole assist thing? Somewhere between 1/60th and zero contribution? You could (and do) have teams that rack up lots of assists, but still manage to lose because those assists come from 2-on-1 situations where the assisting player should not have been required. The win bonus is how you make sure that the assist line correctly estimates the contribution to victory. For an individual list, the bonus is easy: Take the score margin (winning team K - losing team K), divide it by the number of assists the winning team had, cap the assist (so that an assist cannot be worth more than a kill), and give each player on the winning team with an assist that bonus. The losing team gets no assist bonus.

Scoreboard:

I understand why 343 would not want to make the individual scoreboard show just the values above. That would be highly discouraging for new players to consistently score negative points. But there’s no reason the scoreboard couldn’t show the score just as it is now (comeback kill bonuses and all), but add a column you can see if you page over that shows the TrueSkill scores as well. That would allow those players who want to keep track of it the ability to do so, without needlessly discouraging the poorer players.

TEAM

  1. H4 has no way of distinguishing players in team rankings.

Team rankings only are statistically valid for predicting outcomes if both of the following conditions are met:

(a) Every player that plays with a team plays with exactly the same team every match; and,
(b) Every player that plays as a lone wolf plays with a random set of players every match.

These conditions are not met in Halo, and arguably cannot be met outside of tournament play. Thus, for regular playlists, team rankings are not statistically valid.

However, there is a compromise between team and individual that will serve the same purpose, but also be more accurate. That compromise is to use individual TrueSkill rankings, but set the winning bonus high enough that it is at least equal to the maximum possible individual score. For example:

TrueSkill score = Individual contribution + win bonus, where individual contribution is capped at whatever the win bonus is.

This does two things. First, it ensures that no matter how much you slay, if you don’t win, your TrueSkill score cannot exceed even the worst player on the winning team. Hence, winning is everything. But it also gives the ability to distinguish between players on the winning and losing teams, such that their ranks can be updated more accurately. This will make playlists that are supposed to have team rankings update far more rapidly than currently happens, ensuring better matches more quickly. It also helps minimize kill farming, as kill farming now has only marginal rank benefits despite using individual CSR. Also note that the entire winning team has a score that exceeds every losing player, so the odds of ranking down despite winning are identical to the odds of doing so using team CSR (the discussion on win/loss in the TrueSkill links explains how winning by too small a margin when it should have been a blowout can result in downranking, and also why that is appropriate).

The other thing using individual allows you to do is reward time-to-victory. Let us change the formula to:

TrueSkill score = Individual contribution + win bonus + time-to-victory bonus, where all three are capped at the same value

The incentive to kill farm is almost entirely eliminated. Winning quickly achieves at least the same (and very possibly, a great deal more) rank benefit as kill farming. Why kill farm when you can rank up faster simply by winning faster? The benefits to this approach should not require explicit statement - it should be obvious to all.

As far as individual contribution goes, if this is to have meaning, the individual score must reflect activities that assist victory. Achieving objectives is obviously a contribution. Getting comeback kills is obviously not. Slaying has some contribution . . . but how do you define it? A simple answer is that each kill results in the other team being down a player for the respawn time - which makes it easier to achieve objectives - and so the points for each kill could be awarded based on enemy time lost divided by match length. Or a K/A/D formula, where individual score = multiplier * (K + (0.5 * A) - D), capped at a maximum and no lower than zero, where the multiplier relates to the time lost by the other team.

TL; DR:

In short, there is nothing wrong with TrueSkill. It is a powerful statistical tool for ranking players . . . but it only works if:

  1. The correct inital rank estimate is used (playlist median)
  2. The correct uncertainty parameter k is used (H4’s value is too high)
  3. The correct score information is fed to it (that portion of score that actually contributes to victory)

Postscript:

As a follow-on to the above, if you are interested in a means to increase the effective matching population to get better matches even as the overall population decreases, I would recommend this thread, which deals with the matchmaking implementation of whatever ranking system you choose to use.

Discussion about TrueSkill in the Halo 5 forum is moot. Halo 5 will be using Smart Match. New algorithm, new matchmaking.

> Discussion about TrueSkill in the Halo 5 forum is moot. Halo 5 will be using Smart Match. New algorithm, new matchmaking.

Smart Match uses an unspecified skill algorithm + reputation + language, and can do it in the background. Five bucks says the skill calculation for Smart Match is TrueSkill.

And even if it’s not, all of the above still applies. Whatever algorithm you use, it is only as good as the information fed into it.

> > Discussion about TrueSkill in the Halo 5 forum is moot. Halo 5 will be using Smart Match. New algorithm, new matchmaking.
>
> Smart Match uses an unspecified skill algorithm + <mark>reputation</mark> + language, and can do it in the background. Five bucks says the skill calculation for Smart Match is TrueSkill.
>
> And even if it’s not, all of the above still applies. Whatever algorithm you use, it is only as good as the information fed into it.

I wonder what made them use a system they originally abandoned.

How well will it actually work when all the little Johnnys and Jennys out there who are sore losers, and that’s a lot, will start giving bad rep to the winning team for losing?

I mean, my Unsporting rep was at one point 60% and I never intentionally do something that would hinder my team, like team killing, blocking a path and so on. All that came from Halo 3. At the same time, while using no mic, I got bad rep in the communication area.

> How well will it actually work when all the little Johnnys and Jennys out there who are sore losers, and that’s a lot, will start giving bad rep to the winning team for losing?

I’m going to guess that it was exactly that issue that prevented them from incorporating rep into the 360 matching. How they intend to take care of that . . . I don’t know.

Here is a link from Major Nelson about Smart Match. It was posted about a year ago but I don’t think anything has changed.

Here’s one of the highlights concerning matchmaking:

> We all have better things to do than wait for people to show up to play a game. It would be great if I could start up the flight sim game, see if anyone is online to play, put in my play request and then switch to something else while I wait for people to show up. That is what Smart Match on Xbox One allows people to do. It makes it easy for a title to create a match request and then “untether” me so I don’t need wait in the title while the match search is processing. I can switch to reading a quick social blog or watch a viral video and when the match is ready Xbox One tells me to pull me back into the title to play.

And here’s a bit about reputation:

> All of the feedback from player’s online flow into the reputation service to evaluate a players online social reputation. The more hours you play online without causing others to have a horrible time the better your reputation will be, similar to the more hours your drive without an accident the better your driving record and insurance rates will be. Most players will have good reputations and be seen as “green” good players you’d enjoy playing with. Even those good players might receive a few player feedback reports each month and that is OK. Xbox Live is looking to identify players that are repeatedly disruptive on Xbox Live. We’ll identify those players with a lower reputation score and in the worse cases they will earn the “avoid me” reputation. Looking at someone’s gamer card you’ll be able to quickly see their reputation.

I suggest reading the entire article before passing judgement.

This is what Microsoft is promoting.

From what I can tell, Microsoft has chucked TrueSkill entirely and has written (or is at least using) a completely new algorithm. I have no evidence to support that, but that’s the vibe I’m getting.

> > The more hours you play online without causing others to have a horrible time the better your reputation will be, similar to the more hours your drive without an accident the better your driving record and insurance rates will be.

That’s exactly how the Xbox 360’s rep system works though: over time, your rep increases to five stars and the only thing that can bring it down is poor player reviews. That’s why you can have a 100% avoid rate and still have a five-star rep.

> > > The more hours you play online without causing others to have a horrible time the better your reputation will be, similar to the more hours your drive without an accident the better your driving record and insurance rates will be.
>
> That’s exactly how the Xbox 360’s rep system works though: over time, your rep increases to five stars and the only thing that can bring it down is poor player reviews. That’s why you can have a 100% avoid rate and still have a five-star rep.

Again, I suggest reading the entire article as it has an entire section devoted to reputation.

Apparently Smart Match is not just relying on player input.

> Here is a link from Major Nelson about Smart Match. It was posted about a year ago but I don’t think anything has changed.
>
> Here’s one of the highlights concerning matchmaking:
>
>
> > We all have better things to do than wait for people to show up to play a game. It would be great if I could start up the flight sim game, see if anyone is online to play, put in my play request and then switch to something else while I wait for people to show up. That is what Smart Match on Xbox One allows people to do. It makes it easy for a title to create a match request and then “untether” me so I don’t need wait in the title while the match search is processing. I can switch to reading a quick social blog or watch a viral video and when the match is ready Xbox One tells me to pull me back into the title to play.
>
> And here’s a bit about reputation:
>
>
> > All of the feedback from player’s online flow into the reputation service to evaluate a players online social reputation. The more hours you play online without causing others to have a horrible time the better your reputation will be, similar to the more hours your drive without an accident the better your driving record and insurance rates will be. Most players will have good reputations and be seen as “green” good players you’d enjoy playing with. Even those good players might receive a few player feedback reports each month and that is OK. Xbox Live is looking to identify players that are repeatedly disruptive on Xbox Live. We’ll identify those players with a lower reputation score and in the worse cases they will earn the “avoid me” reputation. Looking at someone’s gamer card you’ll be able to quickly see their reputation.
>
> I suggest reading the entire article before passing judgement.
>
> This is what Microsoft is promoting.
>
> From what I can tell, Microsoft has chucked TrueSkill entirely and has written (or is at least using) a completely new algorithm. I have no evidence to support that, but that’s the vibe I’m getting.

Isn’t the first quote related to a function on the xbox one? Not Smart Match on it’s own.

Much like alt-tabing from a game on a pc while I wait for the game finding a match for me.

> Again, I suggest reading the entire article as it has an entire section devoted to reputation.
>
> Apparently Smart Match is not just relying on player input.

No, smart match isn’t just relying on player input, but the reputation relies on player input.

Also, the language aspect is yet another thing that potentially can be screwed up a lot. i343 screwed up that part in Halo 4, then again, i343 isn’t microsoft and not designing XBL. However, the fun thing about it is that I live in Finland and is a Fennoswede, which is a Finnish minority of people who have Swedish as their main language.

Luckily I could put my location as Finland and Language as Swedish. However I never saw any descriptions in Swedish for anything on XBL, only in Finnish. Even better, Halo 4 had all text in Finnish. Not even changing my language to English helped, so, I now live in the UK on my xbox, but the time zone is still Helsinki. I still have DLC, Arcade and game descriptions in Finnish.

Hopefully it’ll be a very loose setting, because frankly, Halo isn’t big in Finland and I’d guess the player pool will be very small for me to play with if it’s a strict setting.

> I suggest reading the entire article before passing judgement.
>
> This is what Microsoft is promoting.
>
> From what I can tell, Microsoft has chucked TrueSkill entirely and has written (or is at least using) a completely new algorithm. I have no evidence to support that, but that’s the vibe I’m getting.

Thanks, but I did read the entire article . . . along with others . . . and had already done so before you posted your links.

I doubt Microsoft has chucked TrueSkill for the skill measurement. As I stated before, TrueSkill is actually a robust calculation. Nothing in that article - or other articles - gives me any reason to believe that TrueSkill has been abandoned. Instead, they all give me the impression that the SmartMatch advancement is to systematically incorporate parameters other than skill alone when deciding what makes a “good” match. It is that higher-level algorithm that is being called the “advanced matching algorithm”. In support of that, TrueSkill was never entitled a “matching algorithm” by Microsoft. It was always referred to as a skill calculation that games could subsequently utilize for the purpose of making matches based on skill. Just like H4’s matching algorithm uses TrueSkill as a data input, I suspect that Xbone’s matching algorithm uses TrueSkill as a data input.

Furthermore, even if they did chuck TrueSkill and invent an entirely new algorithm, it must work on the same statistical principles. There are only so many ways to skin the cat mathematically. What they will have must be an iterative estimation algorithm. It must be able to predict victory given a team pairing and an allowed uncertainty. It must allow for time-varying skill.

Any algorithm subject to the above constraints has the same requirements for the input data as what I described above. Call it TrueSkill or MegaSkill or SuperAdvancedWamodyneXboneMegaSmartSkill, the data requirements remain exactly identical. So all comments about what needs to get fed to the system apply equally to all systems.