This is mainly aimed at ZaedynFel, and was fueled by a lot of coffee.
> 2533274839818445;170:
> > 2533274841661584;169:
> > > 2533274839818445;168:
> > > At the end of the day, Slayer is more a KPM race than anything else. Players that carefully maintain their KDA, but don’t pump out kills consistently lose matches against better opponents, whereas players with worse KDAs but higher KPMs also consistently win against better opponents. They may go negative sometimes, but on average it pays off.
> > >
> > > This proves itself repeatedly in the data: KPM > DPM >> KDA over and over again, regardless of whether it’s intuitive or not, it’s just the way it is across millions and millions and millions of matches. It’s not even subtle in the data.
Is this conceptually valid?
Obviously, it’s counter-intuitive that a proxy outcome (kills) is a better predictor of winning than the true metric (kills/deaths), when the definition of winning a Slayer game is to have a team (which we can consider the sum of individuals) have kills>deaths. I’ve read the paper and am not a total layman; Trueskill2 is obviously driven by wide swaths of data which is fantastic, but also brings the caveats you have when researching with any ‘big data’, which is the possibility of capturing trends and associations that are not meaningful to what we’re trying to measure, or are not being interpreted appropriately. Side note, the paper does not appear to discuss the discarding of death rate in the model. Is there additional documentation regarding that decision lying around?
Anyway, if the data demonstrates that KPM is superior to KDA for predicting who wins games (and therefor is used to assess player skill at the individual level), to what extent did the model explore the potential that KPM drags in team performance more than KDA? For example, a higher KPM is likely correlated with teammates that assist more (or control power weapons and powerups better, or control the map more, both lurking variables that I imagine are hard to capture), and therefor could possibly serve as a proxy for a team that works well together. On the converse, there are far fewer ‘protector’ medals than assists, meaning a good team is marginally more like to jack up kill numbers for one another than to prevent deaths, meaning good teams are correlated with KPM more strongly than KDA. As a result, KPM as a partially team-driven (essentially, confounded) variable could be a stronger win-predictor than KDA as an individually-driven variable (in fact, anyone who’s played Slayer knows that a team of four that plays with teamwork wins more often against a team of solo queues, even if the solo team has higher individual CSR/MMR [would be very interested to see if that could be confirmed in data]).
You note that “Players that carefully maintain their KDA, but don’t pump out kills consistently lose matches against better opponents, whereas players with worse KDAs but higher KPMs also consistently win against better opponents. They may go negative sometimes, but on average it pays off.” This strikes me as a potentially problematic? Regardless of how things converge over a sample size of millions of games, I would contend that players should be ‘rewarded’ with increased MMR based on their actual performance, not a proxy that may approximate skill and win percentage over the long run. The logic is “This person had an objectively bad game (high kills, higher deaths; by definition this is a bad game because the objective of slayer is to have kills>deaths), but we’re betting over the course of their career they’ll end up doing better.” It’s like if you surveyed thousands of basketball games and determined that frequent shooters, regardless of actually scoring on those shots, typically do well over their career; then the NBA determines who won a game (or the skill of a player) based on number of shots, regardless of who/which team actually got the points in a given match. Proxies might predict well in huge sample sizes, but are they really appropriate to grade individual instances? Especially if doing so is unintuitive to users (players)?
This is all subject to discussion. The architects of Trueskill2 are a lot smarter than me. Just curious and trying to think though reasons that the system confuses the hell out of everyone who plays the game.