Halo 5: designing by data

This is a pretty long post about properly using data to inform design decisions. From what I can tell, the metrics being used to determine the ‘competitiveness’ of maps and quality of weapon balances aren’t being decided properly. I think this is why we’ve landed on the poorly received weapons tunings, and why overgrowth is considered a competitive map.

How 343 measures maps:

343 believes that a map is ‘competitive’ if (1) Red and Blue have equal win rates and (2) higher skilled teams win more often than lower skilled ones. Neither of these have anything to do with how competitive a map is.
Their first metric actually measures fairness. Having equal win rates just means neither side has inherent advantage. Competitive maps should be fair. But fair maps aren’t neccisarily competitive. For perspective: A Coin flipping contest would be competitive by this metric.
Their second metric actually measures the influence of skilled elements. Higher skilled teams should always have an advantage because they are better at the game. Seeing that this advantage is typically converting into wins isn’t a sign of a highly competitive map. It’s just a sign that skill can usually overcome whatever non-competitive issues are present.

How they should be measuring maps:

They should be less interested in whether or not the expected team wins, and more interested in what skills need to be displayed in order to win.
If grenade proficiency is flat, regardless of skill level, then the maps is probably rewarding of grenade spam instead of skillful placement.
If grenade and Melee damage dominate the damage dealt, then those lesser skilled elements are overshadowing aim skill.
If many engagements aren’t ending in death, then there might be site line issues.
They need to examine heat maps to ensure that there aren’t impregnable positions or overly dominate strats.
They need to look at survival rates off spawn to see if spawns are too predictable for a team to recover.
Basically, there are a million different things that they need to measure to see if their maps are competitive. And if their maps were truly competitive, they wouldn’t even have to check the measurements they’ve been testing because those would automatically work themselves out.

They’ve got similar issues when gauging the sandbox. For example the BR:

Menke mentioned that the current BR isn’t less competitive than its previous version because 1) is has similar spread to the BR in previous Halos, and those games were considered competitive and 2) better players are win the majority of their engagements with lesser skilled players.
The first one point is nonsense because in previous halo’s extreme measures were taken by TOs to mitigate the RNG Bungie added to precision weapons.
The second point runs into the same issue as their map metrics. the question isn’t whether or not skilled players will usually win. It’s whether or not skillful execution is THE determining factor. skill can help mitigate RNG, but it can’t eliminate its influence. Therefore the outcome between similarly skilled players has a high change of being decided by luck rather than execution- and the outcomes would still be 50/50, which would through of the interpretation of the data.

And regarding the magnum:

It’s been said that the magnum isn’t fit for competitive because, on paper, it now outclasses all the other rifles- as seen on the TTK and Range charts they published.
The problem is, the charts only detail how effectively these weapons CAN be used, not how effectively they ARE used in practice.
Their comparisons need to include, means and medians for killtimes and engagement ranges, in order to gauges how players are ACTUALLY using these weapons and to understand how skill effects usage. It would also be useful to break this down by skill divisions. Since the magnum isn’t included in the test, not only is the claim invalid, but they aren’t gathering the info needed to get the sandbox properly balanced around the magnums inclusion.

I know a lot of people are totally against analytical development ,but there’s literally nothing wrong with the concept. The problem arises when the incorrect data is used or data is used incorrectly, which is what i believe happening here.

Thoughts?

What I think you could have done, instead of making two whole threads concerning this, was go into the matchmaking thread and ask ZaedynFel directly if these other items are things they are considering.

I seriously doubt that with the maps and the weapons they are only looking at who wins over who. Yes, I know it has been mentioned by Josh, but I think that is more about him trying to reassure us that things are not as dire as is being reported.

> 2533274863544717;2:
> What I think you could have done, instead of making two whole threads concerning this, was go into the matchmaking thread and ask ZaedynFel directly if these other items are things they are considering.
>
> I seriously doubt that with the maps and the weapons they are only looking at who wins over who. Yes, I know it has been mentioned by Josh, but I think that is more about him trying to reassure us that things are not as dire as is being reported.

Why should it go into the matchmaking thread when its about gameplay design, not matchmaking? I actually wanted this in the H5 thread.

I actually did ask that question directly to Zaedynfel, but after responding to a few queries, he dripped. That said this is the metric that he provided. Beyond that, I’m interested in how others feel about it.

You’re Mhunterjr from Beyond?

> 2727626560040591;4:
> You’re Mhunterjr from Beyond?

Yup

> 2676692992818466;3:
> > 2533274863544717;2:
> > What I think you could have done, instead of making two whole threads concerning this, was go into the matchmaking thread and ask ZaedynFel directly if these other items are things they are considering.
> >
> > I seriously doubt that with the maps and the weapons they are only looking at who wins over who. Yes, I know it has been mentioned by Josh, but I think that is more about him trying to reassure us that things are not as dire as is being reported.
>
> Why should it go into the matchmaking thread when its about gameplay design, not matchmaking? I actually wanted this in the H5 thread.
>
> I actually did ask that question directly to Zaedynfel, but after responding to a few queries, he dripped. That said this is the metric that he provided. Beyond that, I’m interested in how others feel about it.

Huh, okay, I thought since you had made a whole new thread about it that maybe you didn’t post it there. And I suggested that since Josh usually posts in that thread.

I don’t see a lot of people against analytical development. I would like to see what you’re proposing, but there must be some reason why Josh didn’t respond any further. Maybe the win/loss in terms of skill is all they’re looking at right now. It would be nice if they went a little further than that, but who knows.

I agree with the OP.

Numbers work to a certain extent and very important information can be gathered from them. For example, a few months ago Josh Menke posted about how the AR loses it’s relevance in a competitive field after ~Diamond rank.
Competitive players had been asking for the removal of Autos for over two years and it took Josh and some data for 343 to finally give up and remove these weapons from competitive.
Numbers just proved what the players were saying.

  • Logic first - Numbers second
    Now, Josh is saying that he is measuring the ‘competitiveness’ of these new settings. Starting with the numbers won’t work, the ‘better’ team will always have an advantage over the opponent regardless of the settings. By these 'standards" we could have needler starts and they can ‘measure’ the competitiveness all they want it doesn’t mean they are arriving at the most ‘competitive’ settings.

So, if start with the numbers and leave logic last we end up with a Rocket Launcher on Regret, Overgrowth as a possible HCS map, a BR with ghost bullets and Splinter grenades back in HCS.

  • Why are they trying to reintroduce what doesn’t work for HCS? - Why do they want their entire sandbox to be featured on the most competitive playlist instead of giving up things that just don’t play well on this environment? - Why are Pro players (and 9 pages of angry players and countless twitter interactions) being dismissed?And then they say they are listening to players.

At this point 343 should realize that they need a 3rd party to work on the competitive settings, someone that isn’t biased toward their own sandbox, just like MLG did for past Halos.

They alienated all competitive players in Halo 4 by doing exactly what they are doing right now.

> 2676692992818466;1:
> 343 believes that a map is ‘competitive’ if (1) Red and Blue have equal win rates and (2) higher skilled teams win more often than lower skilled ones. Neither of these have anything to do with how competitive a map is.

This is entirely a semantic issue. One could take (1) and (2) as the definition of a “competitive map”. The real question here is not whether this is a good way to measure competitiveness, but what “competitive” means. It seems to me that the word “competitive” is simply being used here as a placeholder word to describe these two properties of a map. And really, there’s nothing inherently wrong with that. You can try to argue that by generalizing these definitions, coin flip is competitive, and you can consider that absurd. But there’s no inherent reason a competitive game needs to be deep. It’s completely fine to only require a competitive game to be fair, and have the better team to have a higher win-probability. You may prefer a more restrictive definition, but as I said, it’s just semantics.

Also, assuming you’re basing (2) on this quote from Menke:

> We see if known higher-skilled teams win more often than random (50/50) and then rank all the maps on that.

you’re missing the crucial part that they’re not only measuring whether a higher skilled team wins more often than random, but also how big that difference is. Which actually is a fairly sensible measure of how much skill influences the match, since the more randomness the match contains, the closer you expect it to be to that 50/50 figure.

> 2676692992818466;1:
> They should be less interested in whether or not the expected team wins, and more interested in what skills need to be displayed in order to win.
> If grenade proficiency is flat, regardless of skill level, then the maps is probably rewarding of grenade spam instead of skillful placement.
> If grenade and Melee damage dominate the damage dealt, then those lesser skilled elements are overshadowing aim skill.
> If many engagements aren’t ending in death, then there might be site line issues.
> They need to examine heat maps to ensure that there aren’t impregnable positions or overly dominate strats.
> They need to look at survival rates off spawn to see if spawns are too predictable for a team to recover.
> Basically, there are a million different things that they need to measure to see if their maps are competitive. And if their maps were truly competitive, they wouldn’t even have to check the measurements they’ve been testing because those would automatically work themselves out.

And who says they aren’t? I may be missing some context, but unless they’ve explicitly stated, or at least heavily implied, that the properties (1) and (2) are the only metrics they track, I see no reason to take this as an indication that they aren’t tracking any other metrics.

> 2676692992818466;1:
> Their comparisons need to include, means and medians for killtimes and engagement ranges, in order to gauges how players are ACTUALLY using these weapons and to understand how skill effects usage. It would also be useful to break this down by skill divisions. Since the magnum isn’t included in the test, not only is the claim invalid, but they aren’t gathering the info needed to get the sandbox properly balanced around the magnums inclusion.

Again, how do we know that such data isn’t being used? If Menke tells us that they use statistics A and B to make design decisions, or that statistics A and B indicate something, it doesn’t mean that statistics C, D, And E aren’t being taken into consideration, at least in regards to some other aspect.

> 2676692992818466;1:
> I know a lot of people are totally against analytical development ,but there’s literally nothing wrong with the concept. The problem arises when the incorrect data is used or data is used incorrectly, which is what i believe happening here.

Is this really the case? That some people shun the use analytics to make design decisions?

I mean, aside from the fact that I think you were a bit too hasty to jump to conclusions about their use of data, I totally agree with the sentiment that if you have the ability to collect data, that data should totally be used as an aid to certain design decisions.

But ultimately, all data needs to be interpreted by someone. The data never tells you what design decisions you should make. The data is just a form of feedback, and it’s up to the designer what kind of feedback they want to see, i.e., what is indicative of good gameplay.

> 2533274825830455;8:
> > 2676692992818466;1:
> >

To be honest, they can have metrics A, B - Z and it wouldn’t make a difference.
Something isn’t right from a competitive standpoint when they place rockets on Regret or give Overgrowth a chance at being an HCS map. Regardless of the numbers they get from that.

They can collect all the data in the world but if they are not even testing relevant things why bother?

> 2533274825830455;8:
> > 2676692992818466;1:
> > 343 believes that a map is ‘competitive’ if (1) Red and Blue have equal win rates and (2) higher skilled teams win more often than lower skilled ones. Neither of these have anything to do with how competitive a map is.
>
> This is entirely a semantic issue. One could take (1) and (2) as the definition of a “competitive map”. The real question here is not whether this is a good way to measure competitiveness, but what “competitive” means. It seems to me that the word “competitive” is simply being used here as a placeholder word to describe these two properties of a map. And really, there’s nothing inherently wrong with that. You can try to argue that by generalizing these definitions, coin flip is competitive, and you can consider that absurd. But there’s no inherent reason a competitive game needs to be deep. It’s completely fine to only require a competitive game to be fair, and have the better team to have a higher win-probability. You may prefer a more restrictive definition, but as I said, it’s just semantics.
>
> Also, assuming you’re basing (2) on this quote from Menke:
>
>
> > We see if known higher-skilled teams win more often than random (50/50) and then rank all the maps on that.
>
> you’re missing the crucial part that they’re not only measuring whether a higher skilled team wins more often than random, but also how big that difference is. Which actually is a fairly sensible measure of how much skill influences the match, since the more randomness the match contains, the closer you expect it to be to that 50/50 figure.
>
>
>
>
> > 2676692992818466;1:
> > They should be less interested in whether or not the expected team wins, and more interested in what skills need to be displayed in order to win.
> > If grenade proficiency is flat, regardless of skill level, then the maps is probably rewarding of grenade spam instead of skillful placement.
> > If grenade and Melee damage dominate the damage dealt, then those lesser skilled elements are overshadowing aim skill.
> > If many engagements aren’t ending in death, then there might be site line issues.
> > They need to examine heat maps to ensure that there aren’t impregnable positions or overly dominate strats.
> > They need to look at survival rates off spawn to see if spawns are too predictable for a team to recover.
> > Basically, there are a million different things that they need to measure to see if their maps are competitive. And if their maps were truly competitive, they wouldn’t even have to check the measurements they’ve been testing because those would automatically work themselves out.
>
> And who says they aren’t? I may be missing some context, but unless they’ve explicitly stated, or at least heavily implied, that the properties (1) and (2) are the only metrics they track, I see no reason to take this as an indication that they aren’t tracking any other metrics.
>
>
>
>
> > 2676692992818466;1:
> > Their comparisons need to include, means and medians for killtimes and engagement ranges, in order to gauges how players are ACTUALLY using these weapons and to understand how skill effects usage. It would also be useful to break this down by skill divisions. Since the magnum isn’t included in the test, not only is the claim invalid, but they aren’t gathering the info needed to get the sandbox properly balanced around the magnums inclusion.
>
> Again, how do we know that such data isn’t being used? If Menke tells us that they use statistics A and B to make design decisions, or that statistics A and B indicate something, it doesn’t mean that statistics C, D, And E aren’t being taken into consideration, at least in regards to some other aspect.
>
>
>
>
> > 2676692992818466;1:
> > I know a lot of people are totally against analytical development ,but there’s literally nothing wrong with the concept. The problem arises when the incorrect data is used or data is used incorrectly, which is what i believe happening here.
>
> Is this really the case? That some people shun the use analytics to make design decisions?
>
> I mean, aside from the fact that I think you were a bit too hasty to jump to conclusions about their use of data, I totally agree with the sentiment that if you have the ability to collect data, that data should totally be used as an aid to certain design decisions.
>
> But ultimately, all data needs to be interpreted by someone. The data never tells you what design decisions you should make. The data is just a form of feedback, and it’s up to the designer what kind of feedback they want to see, i.e., what is indicative of good gameplay.

I disagree that we have a semantics issue he. It seems that there is agreement that ‘competitiveness’ is a measure of the impact of skillful execution.

I said nothing of the concept of depth. Said that measuring ‘competitiveness’ by the accuracy of the expected outcome is failure to test for the desired metric. To conflate “fairness” with “competitiveness” is inherently flawed.

To your next point, Randomness isn’t the only factor that can negatively impact competitveness. There’s mechanical skill gap and there are intangibles (ie spawn systems, weapon placements, etc) Simply making a game fair does not neccisarily address those.

Performance/score differentials ARE NOT, a good measure of the influence of skill on a match. A team CAN win by a lot because they demonstrated much more skill than their opponent. Or they could have demonstrated a little bit more skill at crucial times. A 3-0 CTF match could be a slaughter. Or it could have been the result of the winning team getting getting 3 returns at the last second.

No no ones saying they aren’t considering other data- but Menke specifically said they were measuring competitiveness, and then explained how they measure that. But the measurement he described doesn’t fit the definition being used.

> 2535405256565919;9:
> > 2533274825830455;8:
> > > 2676692992818466;1:
> > >
>
> To be honest, they can have metrics A, B - Z and it wouldn’t make a difference.
> Something isn’t right from a competitive standpoint when they place rockets on Regret or give Overgrowth a chance at being an HCS map. Regardless of the numbers they get from that.
>
> They can collect all the data in the world but if they are not even testing relevant things why bother?

Nailed it.

It’s not really “jumping to conclusions” when 343 has repeatedly demonstrated absolutely ludicrous ideas.
You don’t need to crunch numbers to know if 1-shot-kill speed boost/spartan charge combo is competitive.
You don’t need to test riptide to know it’s a bad map, you can literally just look at a schematic of the layout.

Knowing what data is and is not worth gathering is a prerequisite to good statistical analysis.
There are just some questions that only require a very basic understanding of the material, combined with a bit of common sense, to answer.
The fact that 343 continues to insist on “testing” maps that have no sight lines, for potential use in competitive play, already proves that there’s a serious flaw in their workflow.

@OP
One issue I see with looking at damage is that the best players are going to specifically choose to avoid the nooby tactics, even if the map rewards them for it. And the better player is probably going to win either way, even if they are handicapping themselves.

Something I think might be more useful is looking at the variety of engagements, things like:

  • Does the “angle” of shooting tend to vary, or stay the same all game (eg. compare vertical/horizontal sight line averages) - Do the good players and bad players tend to run the same routes, or does the map have a big difference in how skilled & new players traverse it? - Can team shooting be accomplished from a variety of positions & ranges, or are they essentially just "standing next to each other?"It’s the movement & sight lines that end up creating these nade fest situations in the first place, so a more direct measure might be more accurate.
    I’d love to see some statistical number crunching on Halo 3’s heat maps, for example – comparing the variety in movement patterns between good & bad players on proven competitive maps like Heretic, versus a hallway map like Orbital.
    The issue of course, is whether 343 is even set up for those kinds of metrics.

But all of this is just a really complicated round-about way of a much simpler solution: hire competent map designers.
It’s clear that the people behind H5’s DLC maps have no idea what they’re doing.
You’ve got to learn to walk before you run, so there’s really not much use in all this high-level number crunching when 343 still has to finish basic training.

> 2676692992818466;10:
> I disagree that we have a semantics issue he. It seems that there is agreement that ‘competitiveness’ is a measure of the impact of skillful execution.

Reading more into it, if (1) in reference to this comment, Menke says “Looking at stats that measure competitiveness . . . , blue/red balance . . . , and quit rates”. He doesn’t seem to be implying that blue/red balance is one of the competitiveness stats (or if he is that’s some weird phrasing). So, perhaps it’s not a semantics issue, but a communication error.

> 2676692992818466;10:
> Performance/score differentials ARE NOT, a good measure of the influence of skill on a match. A team CAN win by a lot because they demonstrated much more skill than their opponent. Or they could have demonstrated a little bit more skill at crucial times. A 3-0 CTF match could be a slaughter. Or it could have been the result of the winning team getting getting 3 returns at the last second.

But it’s not score differentials they’re looking at. They’re looking at the deviation of a team’s win probability from 50%. It’s not about demonstrating a large performance difference in one match, but demonstrating a large performance difference consistently over a large number of matches. And isn’t the ability to consistently demonstrate a large performance difference what skill is about?