TrueSkill2 and Objective Play

Objective Play, and why it (aka. non-kill stats) shouldn’t be (and isn’t) considered in your Rank

Notes:

  • This is for Arena only. I don’t really want to get into all the variables for Warzone; that makes things really messy. - When I refer to stat padding, I really hope I don’t have to explain the specific scenario that leads to the specific stat to be padded. But I will explain, if prompted. - Later on, I will discuss Non-Objective Play kills (Slayer) and Objective Play kills (non-Slayer) with the understanding that the Objective of Slayer is to obtain Kills.
    Let’s Begin

Let’s make a list of all the stats that we have available for us to consider.

Capture the Flag

  • Flags Captured, Flag Returns, Flag Grabs, Flag Carrier KillsCaptures: There are only 3 flags in a game. That’s a terrible statistical sample size.
    Capture Assists: Similar to captures, but allows for stat padding.
    Returns: Horrible way to judge skill. (And why/how did you even get to the point where someone from the other team was able to even touch your flag?) Also, allows for stat padding.
    Grabs: Similar to Returns, and also allows for stat padding. However, I can see arguments for when a Grab is actually a “skillful action”, compared to just a “pad-stat action”, but this really depends on many, many specific mat-by-match scenarios, which would take way too much computing power and/or time to calculate an appropriate weight for
    Flag Carrier Kills: Subset of Kills, and ultimately not a unique indicator of skill when compared to/with the Kills stat.

Assault

  • Goals, Goal Offenses, Goal Defenses, Ball Clears, Ball Held DurationGoals and Goal Assists (I don’t even know if that’s an actual thing or not): Similar to CTF above.
    Goal Offenses/Defenses: Subset of Kills, and prone to stat padding.
    Ball Clears: Similar to Flag Grabs above, and prone to stat padding.
    Ball Held Duration: Unhelpful Camping (aka stat padding).

Oddball

  • Score (aka, Ball Held Time), Oddball Kills, Carrier Kills, Carrier ProtectionsOddball Kills: Subset of Kills, but could be argued is a little more skillful than just a regular kill, given the severe restrictions Carriers currently have placed on them.
    Carrier Kills: Similar to Flag Carrier Kills.
    Carrier Protections: See next point.
    Ball Held Time: Arguments could be made again for this similar to the Flag Grabs argument, but also falls weak to the computational requirements that befall Flag Grabs.

Also, Ball Held Time is a bit of a loose cannon. Get a team together of Slayers/Killers, and give the ball to the weakest link. Have the slayers defend the ball carrier. Doesn’t reflect personal skill of the ball holder at being able to win an Oddball Match (unless they only play with those same 3 players).

Strongholds

  • Strongholds Captured (including Assists), Secures, Defense, Total ControlThis is an interesting one, and I had to think quite a bit about it, but if I’m completely honest, none of these stats really identify personal skill, but rather the aggregate effect of the Team Skill.

All Modes

  • Kills (including Power Weapon Kills), Deaths (including Suicides), Assists. (note: KDA is just a calculation of these 3 variables, so I’m not considering this as a true “variable”) - Headshots (including Perfect Kills), Accuracy, Damage - Protections - Power Weapon Grabs - Damage DealtDamage Dealt: not an good indicator of personal skill. One scenario is that you perfect everyone, 100% of the time, and you’re not going to have a very high damage dealt. The flip side of that (what I have found I have a tendency to do) is that you take potshots at the enemy, regardless of whether or not I, or anyone else, can clean up the kill. Leads to high damage dealt, but isn’t necessarily converted into an Assist or a Kill.
    Accuracy: This is probably a shared set with Damage Dealt, if only for the potshot argument I made. Also, having a high accuracy doesn’t necessarily mean you’re more skilled. It could just mean you only shoot people with an AR at point-blank range when your shots won’t miss. Of course, at that point, you’re probably dead or near death, so there is that downside.
    Headshots: Probably a combined subset of Damage Dealt, Accuracy, Kills, and Assists.
    Power Weapon Grabs: There’s a video from Halo Reach, where someone gets a Sunburst medal because teammates bring a bunch of Spartan Lasers from spawn. I think that’s the ultimate argument of why Grabs shouldn’t be considered.
    Power Weapon Kills: Subset of Kills. Similar arguments to that of Flag Grabs could be made for (most) power weapons. However, using Rockets doesn’t actually require a whole lot of skill to convert into a kill. A Sniper, on the other hand, requires a fair bit more skill.
    Protections: I’m not even sure how to approach this one, other than it might be a subset of Assists? Could also be prone to stat padding.

Now on to the meat:

Kills
I think we can all agree that Kills are important, regardless of game mode (Slayer, CTF, etc.).
The whole point (Objective, if you will) of Slayer is to get kills. But what about the other game modes?

  • Getting slays at the right points in CTF matches allows for a great flag pull/capture; - Getting slays at the right points in Strongholds matches allows you to take and hold total control, or prevent a base from being overrun; - Defending the ball carrier by slaying his attackers; - Etc.So this is why Kills are important. But once again, the Objective of Slayer is to get Kills; the Objective of CTF is to capture Flags; the Objective of Strongholds is to get Points by holding bases. Etc…
    While these Objective Kills are important and contribute to the overall success of the Primary Objective, they are not weighted as heavily as Non-Objective Kills are in Slayer.

Deaths
Opposite (stat wise) to Kills, but just as important, for the same reasons as kills.

Assists
Oh, where do we begin… How much should an Assist contribute to personal skill? KDA wise, an Assist is worth one-third as much as a Kill is. But is that actually true?
I can come up with a number of arguments both for and against whether Assists should contribute, but the best one I can come up with on why it is not is this:

Microsoft Research, with the help of millions of matches of data from Halo 5, determined that Assists were not an accurate indicator of skill.

Thanks for reading!

Not exactly seeing what your point is. Kills are obviously imporantant in any game match, Slayer and otherwise. I don’t disagree that only K/D should determine your overall skill. However, when playing an objective based game if you only focus on kills, your team will lose and you will lose CSR. Example your game is CTF, but no one on your team grabs the flag. If someone doesn’t attempt the grab you will not ever get the capture. The best you can do is play your way to a tie. A tie is not a Win. So is skill required to grab the flag? Consider that most teams will have at least one member defend to flag, you must get past that person. Usually, you have to kill them. If the other team have mics as soon as you get near the flag all other members will be alerted that they may be needed. Even if they don’t have mics the announcer will let them know as soon as the flag is grabbed. Then if even one gets one shot on the carrier the carrier is marked. (This is why many drop the flag and pick it up again but not everyone know that can even be done, more so in lower ranks). Then you have to cross the map going slow, with no Sprint, back to your base. And then once at your base you have to have your flag at home before you can cap it. Most likely once you grab the enemy team’s flag they will grab yours to prevent the cap. Now, your team has to go find your flag while you hold a flag and not die. This requires either hiding or killing and often times both. Have a team that is good at killing helps, a lot. But, if they don’t pay attention and are not focused on killing the right people (the enemy carrier and anyone getting to close to your carrier) you will lose. Even if your team gets more kills. (I have played one insane CTF that one player on my team got like 50 kills, but the other team was able to keep one player alive long enough to reset their flag keeping us from capturing). And yes, I believe he was on a Smurf account, but that is a different post.
Then you have a situation that you grab the flag and get killed (it happens) another team member needs to grab that flag. This requires that your team knows where you are running the flag and getting to it before the return and could also require killing one or more enemy Spartans.

As for oddball, heck as soon as you pick up the ball the announcer let’s the enemy know your team has the ball and you are marked the entire time you hold it. All the while everyone knows the holder has reduced killing abilities. The ball holders team must defend the ball holders. The enemy has to fight through to get that ball carrier kill. If you consider that to get the ball you likely have a couple team members go down and will likely spawn on the other side of the map while the other team is respawning before your team there is a delete in getting backup. I don’t really consider it '‘hiding’ when you have a literal target on your head. Are there strategies to deal with these things? Sure. There are also strategies to combat your strategies. And once again, just killing and not picking up the ball/defending the carrier will not get you the win.

In both cases, straight Slayer is considerable easier. See Spartan. Kill Spartan. Don’t die. One person can carry an entire team in Slayer. In team arena games you need a team. Do any one of the little things matter? Not much really. It doesn’t take much skill to grab an unguarded flag. But, you have to grab it as a first step to capturing it.
I do think that a stealth capture should give the other teammates more credit. They were killing the enemy and keeping them away from the carrier. The skill is more on the teammates in that case.

Which is more important in odd ball, holding the ball (the objective) or killing the enemy (without this the carrier can’t hold the ball)? When you want a steak dinner (the Win) what is more important, having a cow (objective) or having a butcher(slaying)? You kinda need both.

All of the items you have listed are certainly indicators for factors that contribute to team or individual skill based on gametype. Perhaps a ratio or metric that you are trying to consider is Effectiveness - Flag Pulls / Flag Captures - the lower the number the more effective.

Headshot ratio as against overall kills (what if you only use auto weapons?).

Overall the only thing that matters is the W in this case though - cause and effect.

Thank you both for your posts so far.

> 2535424828953691;2:
> Not exactly seeing what your point is.

The point:
TrueSkill (the old system) only considers W/L.
TrueSkill2 considers K/D, DNF, Party Size and Playlist Experience, in addition to the W/L.

A lot of people were left wondering why other Objective stats were not considered in the system’s estimation of skill, and this post attempts to provide explanations of the reasons why those stats were not part of the TS2 model.

Regarding your statements on Kills in Objective modes:

> While these Objective Kills are important and contribute to the overall success of the Primary Objective, they are not weighted as heavily as Non-Objective Kills are in Slayer.

This is to say, Kills are arguably important, but having high kill stats doesn’t necessarily mean that you win the game. For example, there have been many CTF games where the other team has more kills than my team did, but we still won because we were just overall better at Capping/Defending, scoring critical kills, etc.

> 2533274805696963;3:
> Effectiveness - Flag Pulls / Flag Captures - the lower the number the more effective.
>
> Headshot ratio as against overall kills (what if you only use auto weapons?).

Maybe you are right, maybe a good metric could be the ratio of flags capped to flags grabbed. But I don’t have access to the raw data, so I can’t really comment on that. Or maybe it was deemed too difficult to implement this kind of metric into a blanket “one-size-fits-all” system, without creating a new “CTF-only TrueSkill2”?
Although, there are times when you grab the flag for the sole purpose of giving your team just a few more precious seconds to get that Enemy Carrier Kill/Goal Stand/Flag Reset.

I think your “auto weapons” argument is a great argument for why to not include (any ratio of) headshots as an indicator of skill. But then again, there was talk about the AR leveling out in “skillful use” by highly skilled players, which is one of the reasons why they tuned it in the recent weapons tuning update.

> 2535469658606456;1:
> Microsoft Research, with the help of millions of matches of data from Halo 5, determined that Assists were not an accurate indicator of skill.

Just to clarify a little. I don’t think MSR has personally done a ton of research into assists, etc., but I could be wrong.

I know 343’s own data science team has looked at this, and I’ve looked closely at it with other data science teams in other games.

So it is possible that MSR will find it more relevant than the rest of us thought (they tend to be smarter than us) at which point we could add it.

> 2533274839818445;5:
> > 2535469658606456;1:
> > Microsoft Research, with the help of millions of matches of data from Halo 5, determined that Assists were not an accurate indicator of skill.
>
> Just to clarify a little. I don’t think MSR has personally done a ton of research into assists, etc., but I could be wrong.
>
> I know 343’s own data science team has looked at this, and I’ve looked closely at it with other data science teams in other games.
>
> So it is possible that MSR will find it more relevant than the rest of us thought (they tend to be smarter than us) at which point we could add it.

Fair enough. I will amend my statement above.

> 2533274839818445;5:
> > 2535469658606456;1:
> > Microsoft Research, with the help of millions of matches of data from Halo 5, determined that Assists were not an accurate indicator of skill.
>
> Just to clarify a little. I don’t think MSR has personally done a ton of research into assists, etc., but I could be wrong.
>
> I know 343’s own data science team has looked at this, and I’ve looked closely at it with other data science teams in other games.
>
> So it is possible that MSR will find it more relevant than the rest of us thought (they tend to be smarter than us) at which point we could add it.

Actually, section 8 of the white paper specifically states:

> In TrueSkill2, the goal is to correlate kill/death counts with the existing player skill variable. In game modes where the objective to score the most kills, then we expect this correlation to be high. In game modes where the objective is to capture territory or simply stay alive as long as possible, we expect this correlation to be low. Even in modes where the objective is to score kills, there may be teamwork effects where players can help their team win without scoring kills themselves. **We ultimately want player skill to reflect a player’s ability to win, not their ability to score kills.**Our solution to this problem is to predict individual statistics from the performance variables. As explained in section 2, the performance variables determine the winner of the match. Therefore the model cannot choose skill ratings that only predict individual statistics. They must predict the match winner first, and secondarily predict the individual statistics. We expect that the most benefit from including individual statistics will come in team games. In such games, the players on a team can have different performance values, as long as their total performance is consistent with the match result.

> To determine how the individual statistics depend on performance, we partitioned players by skill, the average skill of their teammates, and the average skill of their opponents. We found that the statistics were linear in all three, with teammates having a very small effect. Therefore to keep the model simple, we dropped the dependence on teammates.

However, even the paper says:

> The linear model here could be replaced with a more flexible model, if the data warranted it. However, it is important that the model is constrained to be monotonic. A monotonic relationship incentivizes players to maximize their kill count and minimize their death count. A non-monotonic relationship would give players misaligned incentives, such as stopping when they reach a certain number of kills.

Because I didn’t know what monotonic meant:

> monotonic: (of a function or quantity) varying in such a way that it either never decreases, or never increases.
> i.e. a function or quantity is called monotonic if and only if it is either entirely non-increasing, or entirely non-decreasing.

Of course, all this was determined based on:

> To eliminate variations due to skill, we only considered matches where all players had a near-average skill rating.
> To eliminate variations due to game mode, we only considered matches in the Warzone mode.

I’d be curious to see what kind of results they would have received for “determin[ing] how […] individual statistics depend on performance” if they had used Slayer matches instead.

I disagree with some things, mostly nit-picking because some things you say are a little annoying to me.

> 2535469658606456;1:
> The whole point (Objective, if you will) of Slayer is to get kills. But what about the other game modes?

The objective of Slayer is not simply your slaying power, you probably already know that, rather, it is to outslay the enemy opponents while minimising deaths. You can have 20-30 kills on the team but if you’ve died 25+ times you’ve still given them half of their quota or more.

I’m not bothered to read through the post again, but I get your point.

Personally, I believe that objectives should be considered in a way that their weighting doesn’t make a large impact.
The whole idea is to make sure that people actually play the objective rather than treat every match as slayer.

I also think that every 3 assists should be considered equivalent to a kill (3-12 shots in total for 3 assists), KDA already does this anyway and support-role players often have high assists and mid-range kills, although it’s kind of hard to play support if your teammates are running around like headless chickens.
I guess you could say that assists are a measure of how well you and your teammates can work together.

> 2533274888753908;8:
> I disagree with some things, mostly nit-picking because some things you say are a little annoying to me.
>
>
>
>
> > 2535469658606456;1:
> > The whole point (Objective, if you will) of Slayer is to get kills. But what about the other game modes?
>
> The objective of Slayer is not simply your slaying power, you probably already know that, rather, it is to outslay the enemy opponents while minimising deaths. You can have 20-30 kills on the team but if you’ve died 25+ times you’ve still given them half of their quota or more.

Yes, that’s quite nit-picky, but I don’t care/mind. I suppose I super over-simplified it, so look at it a different way, which is the way it really should have been said: “The objective is to reach 50 kills before the enemy team.” The obvious conclusion should be that to prevent the enemy team from reaching 50 kills, your team, by definition, should be minimizing its deaths (1 kill for one team = 1 death for the opposite team).

> 2533274888753908;8:
> I’m not bothered to read through the post again, but I get your point.
>
> Personally, I believe that objectives should be considered in a way that their weighting doesn’t make a large impact.
> The whole idea is to make sure that people actually play the objective rather than treat every match as slayer.

It actually does. It’s called winning, aka. capturing 3 flags, securing and holding strongholds, etc. Kills and deaths are not considered as heavily in those modes.

> 2533274888753908;8:
> I also think that every 3 assists should be considered equivalent to a kill (3-12 shots in total for 3 assists), KDA already does this anyway and support-role players often have high assists and mid-range kills, although it’s kind of hard to play support if your teammates are running around like headless chickens.
> I guess you could say that assists are a measure of how well you and your teammates can work together.

I wholeheartedly agree with you on this point, but how can that teamwork effect be quantified from a team statistic into an individual statistic? But you also bring up a strong counterpoint: does a high number of assists when running around like a chicken and forcing your teammates to rescue you actually translate into being highly skilled? I don’t think so. Neither does having 0 assists necessarily mean that you aren’t skilled.

Great response, by the way.

> 2535469658606456;1:
> > 2533274888753908;8:
> > I’m not bothered to read through the post again, but I get your point.
> >
> > Personally, I believe that objectives should be considered in a way that their weighting doesn’t make a large impact.
> > The whole idea is to make sure that people actually play the objective rather than treat every match as slayer.
>
> It actually does. It’s called winning, aka. capturing 3 flags, securing and holding strongholds, etc. Kills and deaths are not considered as heavily in those modes.

It might just be me but I often only see my teammates playing slayer roles (appreciate that I don’t have to carry though) in objective games. It seems, although an objective is used to win, everyone I play with seems to play OBJ gametypes (yeah, I’m getting a little lazy) as if it’s slayer. Clearly this comes from my own personal experience.

> 2535469658606456;1:
> > 2533274888753908;8:
> > I also think that every 3 assists should be considered equivalent to a kill (3-12 shots in total for 3 assists), KDA already does this anyway and support-role players often have high assists and mid-range kills, although it’s kind of hard to play support if your teammates are running around like headless chickens.
> > I guess you could say that assists are a measure of how well you and your teammates can work together.
>
> I wholeheartedly agree with you on this point, but how can that teamwork effect be quantified from a team statistic into an individual statistic? But you also bring up a strong counterpoint: does a high number of assists when running around like a chicken and forcing your teammates to rescue you actually translate into being highly skilled? I don’t think so. Neither does having 0 assists necessarily mean that you aren’t skilled.

Both are valid. Although both are clearly rhetoric, I can still take the effort to respond.

Assists are a sort of grey area, they are the middle ground between a death and a kill. If you are making a deliberately crappy situation for yourself because you thought you could come out alive then that’s a problem. There is always a problem with every system, it’s kind of impossible to make a perfect system. Having so few assists barely happens, but those with low assists often have mid-range to high kills, unless they’re actually that bad at the game, obviously.

> 2533274888753908;10:
> Both are valid. Although both are clearly rhetoric, I can still take the effort to respond.
>
> Assists are a sort of grey area, they are the middle ground between a death and a kill. If you are making a deliberately crappy situation for yourself because you thought you could come out alive then that’s a problem. There is always a problem with every system, it’s kind of impossible to make a perfect system. Having so few assists barely happens, but those with low assists often have mid-range to high kills, unless they’re actually that bad at the game, obviously.

Yes, it was kind of rhetorical, and once again over-exaggerated. But only to show that it is rather difficult to apply a one-size-fits-all system that can account for the varied types of assists that have a potential to happen. You hit the nail on the head: it is indeed a very grey area.


(edit, to avoid double-posting)

One thing I want to mention:

The TrueSkill2 white paper specifically mentions that one of the requirements/priorities game designers/developers had was a low computational cost:

> Skill updates are done on servers hosted by the game studio, and skill ratings are stored in a database hosted by the game studio. This means that skill representations should be small and updates should be cheap.

Maybe if this requirement was removed and many more variables added into the system, we could track things like assists, damage dealt, weapons used, player positioning, etc… Maybe in a decade (hopefully machine learning technology advances that quickly for such a complex dataset) we will have a system like that. I’m sure we could have one now, but how long would it take from putting data into the system to getting an answer back out? Weeks? Days? Hours? I really don’t know…