After realizing I’ve been making a false assumption (details here), I’ve put more thought into the problems with H4 matching and ranking. I’ve also spent a lot of time reading and re-reading about TrueSkill and the H2 ranking / matching system. I believe there is a better way than either of those alone.
First, it’s important to list what we want a ranking system to do:
…1. Reward behavior that contributes to a team win.
…2. Punish behavior that contributes to a team loss.
…3. Accurately reflects some combination of individual skill and ability to play as part of a team.
…4. Updates based on recent performance.
…5. Easy for players to understand.
Second, it’s important to list what we want a matching system to do:
…1. Accurately estimate a player’s skill level.
…2. Place players of nearly equal skill into matches with each other.
…3. Match parties in such a way that each team has an equitable mix of parties and lone wolves.
These are not the same goals. They are similar - but not the same. So if you attempt to use the exact same number to accomplish both ranking and matching, you will always run into problems.
Let’s say you want to punish a habitual quitter via rank reduction. In all previous systems, doing so will result in that person getting easier matches. It doesn’t punish that player - it punishes all those who are forced to play against someone who’s actual skill level is not properly reflected by rank.
Or let’s say that you have a player to whom K/D and win percentage is more important than the rank number. That player can easily partner up with some low-level players and kill farm in CTF while allowing his team to lose. He only gets one loss (but a boost to the K/D), but because the loss is to a much lower-ranked team, he can drop quite a few ranks. That gives him easier matches while he ranks up again, and the overall effect on his win percentage is to artificially inflate it.
There are other difficulties as well that do not involve intentional manipulation of rank. In the TrueSkill* system, ranks grow very stable over time. While this is good for matching purposes (since a player’s skill changes only gradually), it’s not ideal for ranks. We want ranks to change based on recent games to provide a constant incentive for winning every game. The H2 system definitely provides that feature - but sacrifices matching stability (which allows players relatively simple avenues to manipulate ranks and stats).
In short, there is no single system that can provide both optimal matching and optimal ranking.
H2 for Ranking
The H2 ranking system is quite beautiful, in my opinion. It is simple and easy to understand, so players will always know why they gained or lost rank. It provides a clear incentive for winning (as you cannot rank up unless your team wins). It constantly updates based on recent performance. Because it is a points-based system, it allows easy implementation of ways to punish undesirable behavior or even allowing rank to decay with time. These things make it a very elegant system for ranking. They also make it unstable and a poor estimate of true individual skill, which is an undesirable trait for a matching system.
TrueSkill for Matching
The TrueSkill ranking system is mathematically robust - given the right inputs. It has the ability (if used properly) to distinguish between individual skill even among members of the same team. Because it converges to a better and better (and, hence, stable) estimate of a player’s skill over time, it provides an excellent means of matching players based on their actual estimated skill. The characteristics that make it good for matching, however, render it less desirable than H2 for ranking. The problems with using TrueSkill in H3/Reach/H4 were not issues with TrueSkill itself, but rather with the information that those games fed TrueSkill - a decision driven by the desire to also use TrueSkill as a ranking system.
If TrueSkill is used for matching only, it can be optimized for finding individual skill. Statistically, using individual performance (rather than team performance) provides not just more information, but also more accurate information about the true skill level of a player. Individual performance has obvious issues when used for rankings - but not for matching - so long as some measure of ability to win is included. To accomplish that, each player on the winning team should be given a win bonus prior to ordering the players by points. Points should only be awarded for activities that contribute to a win - like kills in Slayer and flag caps in CTF. Snapshot/comeback kill/assassination bonuses never should be applied. Going to individual performance and eliminating the extraneous scoring will greatly improve the matching algorithm over both H3 and H4.
Benefits
Combining the two systems as described above provides the following benefits:
…1. A rank system that rewards winning, is easy-to-understand, allows punishing undesirable behavior, and updates based on recent events.
…2. A matching system that tracks individual skill such that:
…a. Players that sometimes play in parties and sometimes play solo can be accurately matched in both cases.
…b. Deranking no longer has any statistical benefit (as deranking =/= getting matched against worse opponents, since TrueSkill is more stable than the ranks).
…c. Rank punishments no longer benefit the punished player by providing easier matches (since rank and skill estimation are decoupled).
…d. Kill farming and other means of not playing the objective result in continually harder matches with no corresponding benefit to rank.
…e. Increased matching efficiency by starting every player at the median (rather than biased low in H2).
…f. Avoids the need for complex scoring formulas (like score = K + A - 0.5 * D) because kamikaze play no longer has any rank benefit.
Regardless of whether you agree or disagree, hopefully this makes sense.
TL;DR: Use the H2 system for ranks. Use individual TrueSkill (adding win bonuses and removing extraneous scoring) for matching.
*Note: H3 used TrueSkill. So for those of you who pine for H3, you probably ought to know that the H3 system is exactly the same as the H4 system for team-ranked lists (objectives, proving grounds). There is zero difference between the two.
