spartaninsights.com - Halo Hackathon - Furiousn00b

<mark>Entry</mark>

The intention was that once a match was discovered, a player can use the new Halo Waypoint functionality to bookmark the film, then jump in game to create a montage. Imagine if you could look up the exact moment of every Airborne Snapshot! medal or every Ground Pound kill to help create new and interesting montages like the following (one of my favourites): www.youtube.com/watch?v=yXsKZgLNMbw

  • *Population*This feature is often requested and I believe offers an interesting insight into the overall health of Halo. The Population feature allows a user to see how many unique players play and unique matches are played for every day of a season. Keep in mind that for the BETA these charts only show the population for the April 2016, Team Arena playlist. It doesn’t include any other ranked/unranked playlist, or Warzone.

  • *CSR Distribution*This is one of the more competitive insights. Having a CSR is one thing, but being able to place yourself on a line of every other player to get the bragging rights over your friends is another. This feature includes every ranked player that qualified for a CSR in the April 2016, Team Arena playlist.

  • *Game & Map Rotation*Every wondered if a map or game type is more or less likely to occur than another? This feature lets you answer that question and back up any discussion with facts. See every match played for the April 2016, Team Arena playlist and see what game type and what occurred the most. You can drill down into each game type to get additional insight.

  • *Bravo vs. World*This was a fun addition. I’ve created a leader board for B is for Bravo’s Kill/Death Ratio. This isn’t an Arena wide, or even playlist wide statistics, this leader board shows the Kill/Death Ratio against every player that Bravo crossed in the April 2016, Team Arena playlist.

  • Metadata SpyThis was built to solve a very small, but very specific, problem I’ve seen appear every so often. There are a number of people that periodically manually search the API for nuggets of information. This seems like an ideal candidate for automation. I’ve built the Metadata Spy to capture any and all API Metadata additions and changes, as well as maintaining a full audit history for each item. For example, recently the Team Arena playlist was renamed, we can see that the name changed from ‘Team Arena’ to ‘Team Arena Summer Preview’ and then back again to ‘Team Arena’.

Maybe the Metadata spy will give us some insights into what’s coming. Although I have noticed 343 being more sneaky with the data updates :slight_smile:

<mark>Final Words</mark>

Best of luck to everyone for the competition, some of the entries are looking fantastic!

<mark>Technical and Design Issues</mark>
The Data Importer
Initially I wanted to import statistics and events for every Arena match played by every player (cue laughter from 343’s API/Data Teams). This was beyond impossible, especially for the Hackathon schedule.

I reduced the scope way down. Just 1 (ranked) playlist in 1 (completed) season. The April season was chosen as it was simply the latest completed season before the hackathon closing date. Team Arena was a harder decision and was ultimately chosen because of the variety of Game Types and Maps (Assault, CTF, Strongholds, and Slayer).

At the end of the import I managed to discover roughly 500,000 unique players that played roughly 900,000 unique matches. To throw around an even larger number; these 500,000 players have played a combined total of 152,114,127 (non-unique) matches.

In order to import such a huge number of games, I needed to nail the speed of the data importer. It takes me roughly 6 seconds to request 200 matches with HaloSharp. If I could process and commit all those matches in under 4 seconds, I could achieve a maximum throughput of 200 games every 10 seconds (the current production rate limit). I got it down to 5 seconds, for a total of 11 seconds. I could import just over 60,000 matches an hour, which meant I could request all the data, process it, and save it to my local database in under day.

It turns out that import speed wasn’t an issue, this many matches required an unexpected large amount of storage.

Storage: Azure SQL Database
This is easily the biggest problem and was the key issue with retrieving and storing additional seasons and playlists. I initially imported and committed all the data to a local SQL database on my development machine with the intention to upload the database to Azure closer to the closing date.

On the 24th of May I created a standard blob storage account and began the upload of a 12GB BACPAC (compressed SQL data) file to Azure. I managed to complete this task in just under an hour and thought the hardest part was over (For an Australian, having to upload anything is always scary).

I then created the fastest Azure SQL Database I could reasonably afford (S3 Standard - $191.05 AUD/month). The assumption was that this task was going to be completed in a short period of time, after which I could downsize the instance and save some money. I pressed ‘import’ and waited … the restore process took 125 hours! 5 days and change was needed to import 12 GB of data, rebuild indexes, and have the database up and running. I thought for sure I was going to run over the end date.

The final database size is <mark>126.29 GB</mark>.

After this was complete, I got a chance the test the performance. It’s not quite where I would like it, some queries are data intensive and probably indicate that SQL, or Azure SQL, isn’t adequate for the task. This is probably the first thing I’ll revisit after the hackathon.

Gamertag Changes, Carnage Report Consistency, and Halo Waypoint
May 18th, 2016 was not a good day for me or the project. With Microsoft’s announcement of plans to release nearly a million unused Gamertags, I was nervous. A non-insignificant number of players that I have yet to import were likely to change their Gamertag.

This is an issue for me because when a user enters their current Gamertag into spartaninsights.com they expect to get their full game history, including matches that were played on their old Gamertag. It’s even worse if a user’s matches are spread across multiple Gamertags.

An (unexpected) issue for me was that when I request a match, I expected to get the match as it was played. Currently, if a player changes their Gamertag, their new Gamertag is returned even for old matches. Two requests for the same match might return two different results (depending on if a player has changed their Gamertag).

Another issue was that old Gamertag could no longer link to Carnage Reports on Halo Waypoint. The Halo Waypoint Carnage Reports links that I use require both the Match Id AND a players current Gamertag. If a player changed their Gamertag and you try to link to a Carnage Report with their old tag, you either get a page error or a redirect to your own player stats (bummer).

I eventually found a work around that requires me to query Xbox Live for a player’s XUID and restructure my data around the XUID being the unique identifier (rather than the Gamertag). I can then periodically update Gamertags based on their XUID. My job would have been a lot easier if the API returned a player’s XUID, but it’s currently not an option.

Great job dude, you should get some bonus points for HaloSharp too!

I actually started off trying to achieve something similar. I collected the complete game history for 4 players and my database was over 750MB, I decided to put that aside for now and focus on the match viewer I submitted.

Funnily enough I had some very similar ideas, I actually sent my mates graphs of the map/gametypes weightings and told them which players they had killed/been killed by the most.

You can do some amazing queries with access to such a large amount of data I have a whole bunch if ideas in a google doc. I even had a very similar Idea for finding games where you’d killed Bravo!

I also thought it would be amusing to show players the number of times:

  • Sniped by Snip3down - Snapshotted by Royal-2 - Naded by Naded - Assassinated by Ninja - Assassinated Bravo - Ground Pounded by Jimbo - Halo CE Pistoled by Ogre 2 - Killed/Assist by RoyBoxI also thought a “Goose Finder” would be funny and looked at all the games that me and my friends goosed.

I’d be interested in hearing about the technical issues you mentioned above.

Well done and good luck in the competition.

Wow, this is very well done and very well presented! Great work!!

> 2732987988174916;3:
> Great job dude, you should get some bonus points for HaloSharp too!
>
> I actually started off trying to achieve something similar. I collected the complete game history for 4 players and my database was over 750MB, I decided to put that aside for now and focus on the match viewer I submitted.
>
> Funnily enough I had some very similar ideas, I actually sent my mates graphs of the map/gametypes weightings and told them which players they had killed/been killed by the most.
>
> You can do some amazing queries with access to such a large amount of data I have a whole bunch if ideas in a google doc. I even had a very similar Idea for finding games where you’d killed Bravo!
>
> I also thought it would be amusing to show players the number of times:
>
> - Sniped by Snip3down
> - Snapshotted by Royal-2
> - Naded by Naded
> - Assassinated by Ninja
> - Assassinated Bravo
> - Ground Pounded by Jimbo
> - Halo CE Pistoled by Ogre 2
> - Killed/Assist by RoyBox
> I also thought a “Goose Finder” would be funny and looked at all the games that me and my friends goosed.
>
> I’d be interested in hearing about the technical issues you mentioned above.
>
> Well done and good luck in the competition.

The data import and storage was easily the biggest issue. I didn’t get a chance to build a data cube or denormalise the data in any way. So I have a giant 120GB database running on Azure just crushing queries. There’s nearly a billion rows across 90 something tables, haha.

All that and I’m still limited to just Team Arena for the April season. It’s an insane amount of data. I’ll probably do a data drop after the competition and release the database for others to interrogate. I’m probably going to have to pivot to something else or find someone to fund my Azure habits :slight_smile:

When I get a bit of free time, maybe this weekend, I’ll do a more in depth write up of the technical issues.

I really like the goose finder, and the other queries too. I’d be keen to hear the other ideas you had in your spreadsheet (maybe give it a few days after the deadline!).

> 2533274809451838;4:
> Wow, this is very well done and very well presented! Great work!!

Appreciate it, thank you.

I could expect no less from the guy who keeps a tight update cycle on HaloSharp. You do amazing work man. This in particular is inspiring. Knowing how much data you are actually storing is also mindblowing. I couldn’t imagine tackling on the Arena data set which is why I’m sticking with my custom games domain… lol.

But seriously a lot of these features are truly inspiring. The Montage assistant is genius and I hope that I can do that myself or get another developer to do that for custom maps with CAPI. Being able to know where great clips are for a Map trailer, is genius. I had also planned on using the Bookmark Film feature with CAPI since it allows map creators to access films that they were never part of on their map.

You do great work man. Keep it up.

> 2533274847905488;6:
> I could expect no less from the guy who keeps a tight update cycle on HaloSharp. You do amazing work man. This in particular is inspiring. Knowing how much data you are actually storing is also mindblowing. I couldn’t imagine tackling on the Arena data set which is why I’m sticking with my custom games domain… lol.
>
> But seriously a lot of these features are truly inspiring. The Montage assistant is genius and I hope that I can do that myself or get another developer to do that for custom maps with CAPI. Being able to know where great clips are for a Map trailer, is genius. I had also planned on using the Bookmark Film feature with CAPI since it allows map creators to access films that they were never part of on their map.
>
> You do great work man. Keep it up.

Thanks for the kind words. It means a lot.

I was totally blown away by the amount of data, given more time I’d probably have considered a better way of storing it. Some of the code I wrote really puts the hack in hackathon :slight_smile:

I’m really interested to see what you come up and how you keep it performant. I’m not even sure how many custom matches are out there, it’s exciting stuff.

Speaking of HaloSharp, a new content release usually means API updates. Halo means I never sleep.

I’ve updated the second post in this thread with some discussion about some of the technical and design issues I ran into while building spartaninsights.com.

Can’t wait for the results to come out and start iterating on version 2!

This thread is a fascinating read. I was attempting something similar for the hackathon, but it crushed my AWS free tier after the first 100 gamertags or so… Most of it was due to inefficient writes eating the memory, but even after rewrites I was having problems. I’ve had more luck on my local machine, which is fairly beefy, but it’s still a ton of data.

I was thinking of keeping the datasets manageable by packaging up the data by month with the SQL Server data dump utility, then loading those files as needed into the free tier for processing. Planning on exploring the SQL Bulk load feature this weekend to see if it’s possible. However, the 125 hour load time seems prohibitive lol…

What do you think of using a local machine to do the number crunching, store the results in a staging table, then hosting subset of results for your web app? The website would have to be more static since you can’t sift data on the fly, but as a trade off, there’d be less expensive hosting costs.

Really interested in whatever solution you come up with, as it’ll probably inform my own :slight_smile:

> 2533274813500699;9:
> This thread is a fascinating read. I was attempting something similar for the hackathon, but it crushed my AWS free tier after the first 100 gamertags or so… Most of it was due to inefficient writes eating the memory, but even after rewrites I was having problems. I’ve had more luck on my local machine, which is fairly beefy, but it’s still a ton of data.
>
> I was thinking of keeping the datasets manageable by packaging up the data by month with the SQL Server data dump utility, then loading those files as needed into the free tier for processing. Planning on exploring the SQL Bulk load feature this weekend to see if it’s possible. However, the 125 hour load time seems prohibitive lol…
>
> What do you think of using a local machine to do the number crunching, store the results in a staging table, then hosting subset of results for your web app? The website would have to be more static since you can’t sift data on the fly, but as a trade off, there’d be less expensive hosting costs.
>
> Really interested in whatever solution you come up with, as it’ll probably inform my own :slight_smile:

Appreciate the comment.

This is pretty much how future versions are going to look for me. Having that amount of storage in Azure SQL is prohibitively expensive and I can’t afford the appropriate performance tiers.

Going forward, I’ll have to change up how I store the data. The decision, similar to yours, is to choose between keeping it local (i.e. a server at my house), or online.

Most likely, I’ll move my data from Azure SQL to a self managed Azure VM. The idea being that I’ll build a low-performance high-storage SQL VM. I’ll have period tasks that will import data from the API, then export a sub-set to some other high-performance low-storage storage (probably Azure Table Storage).

This pretty much guarantees that I’ll have to drop some of the more dynamic functionality, but I think there’s still a lot of valuable insights to be made with aggregate queries.

Now that I don’t have the time crunch from the hackathon, I’m working on doing it right!