Unfortunately none of you thought to contact me to use a definite stimhack league tag (like [BGG-L05] for the BGG leagues) so I can’t provide a filter to crunch those data. Nevertheless, should give you some interesting data to play with.

First observation: the H&P Criminal IDs are stored as Criminal | subtitle, not Criminal | name. So Iain is Criminal | Retired Spook, not Criminal | Iain Stirling. Same for Ken and Silhouette.

Pruning unreleased IDs only removes 1,254 of 306,310 games - not bad. There aren’t a ton of people playing Fisk / Collective / Mind-Mapping.

I decided to orphan the old code repository in the course of rolling my code into a package that any R user can install with devtools::install_github(). After parsing the latest data, I pushed my latest code to a new repository at github.com/AjarKeen/netrunner.

So if you’re an R user, you can use devtools to install my package like so:

The package has standard documentation, with the caveat that download.octgn() shows up in the documentation but isn’t actually user-accessible (because it doesn’t work). You need to download the data and put it in your working directory yourself.

After that, you can read the data file, prune it in various ways, rate the players using Glicko, and do winrate and matchup calculations.

If I get rigorous enough with package development, I may eventually submit it to CRAN so you can just use install.packages(), but I think I’d need to rewrite all of my dplyr code to use standard evaluation.

Out of curiosity (and boredom), I took a look at the data as well. Unless I made a mistake:

All results are for players who have at least 20 games played. Both Skewness and Kurtosis were very close to 0, suggesting a normal distribution.

Win %

Mean - 45.5
Median - 45.8
Std - 15.0
Q1 - 35.1
Q3 - 55.9

If you reduce the population to only those players who have played at least 20 games, the Corp wins 52.4% of games.

If you reduce the population to only those players who have played at least 20 games AND at least one person involved in the game has a win percentage of 60% or greater (1 sigma), the corp wins 50.6% of games.

criminals and whizzard are doing better than I expected, really surprised by the tenma numbers actually (not just the winrate, but the games played as well).

As far as I can tell, we are looking at win percentages of competitive players against the entire field. What about win percentages for NEH versus runners who are also high percentage win rate players? In other words, how does NEH fare against experienced runners, rather than just the entire field, in the hands of capable players.

To elaborate, the statistics we got out of our local regional (San Rafael) showed that overall, Corp win rates were above that of Runners, but when you looked at just the top 16 players, Runner winrates were still above that of Corp. That was before Upstalk of course, but the point is, high level players vs high level players is what we are trying to look at here.

Yeah, I thought after I left work that I should have been more clear.

If you reduce the population to only those players who have played at least 20 games, the Corp wins 52.4% of games.

If you reduce the population to only those players who have played at least 20 games AND at least one person involved in the game has a win percentage of 60% or greater (1 sigma), the corp wins 50.6% of games.

I didn’t limit the dataset by date, so its all inclusive. Something I will look at tomorrow.

No, my competitive cut data is only games played by the players who made the cut – players whose Glicko rating is more than one standard deviation above average.

I can easily change that threshold to two standard deviations (or some other arbitrary threshold) if you only care about the best of the best, but bear in mind that the sample size will be small. For example, with the one sd cutoff, we keep 1588 players and 118,466 games. With a two sd cutoff, we keep 257 players and 4,049 games. The two sd cutoff leaves a total of 21 NEH games in the dataset, which is nowhere near enough to generalize from.

Did you use ratings from old data, (were the users the same number as last time), and/or did you calculate/recalculate only with the new data? And can we see it?

i dont see what more data can tell us. Astroscript aside the NEH’s ability equates to extra clicks. the balance between runner/corp revolves around 4 clicks vs 3 clicks + 1 draw…except with NEH its 3 clicks + draw + draw if the deck is built correctly. of course NEH is very good.

all i know is im very angry at all the people who voted for laramy fisk because you thought you were saving netrunner…that extra click the collective gets seems right in line with IDs like NEH and blue sun.

I redid the ratings with the new data. Now that all of the code is written, it actually runs very fast – each step takes at most couple of seconds, and usually much less.

I could post a ratings file, but you’d also need to figure out your player ID from @db0’s original data.

There’s a lot more stuff I want to look at, in addition to redoing / updating my matchup plots. The matchups are still just simple winrate calculations on subsets of games. I’m curious about more complex questions, like whether loyalty to an ID is correlated with rating. Do people who play the same ID for a long time tend to get better?

So I’m continuing to look at the data, mostly for practice/curiosity. If anyone can double check my results, that would be awesome. I constantly check for errors I’ve made and attempt to correct them when I find them.

First, I wanted to check and see if Kingsley was right and that I had made some kind of mistake. He was. I fixed my code and reran. I then also wanted to see if there was a difference between corp win rates before and after Upstalk. I’m not sure when Upstalk was released, but I think it was in early July. I’m also not sure when cards start getting play on OCTGN, so I set my time periods as “Pre” upstalk (dates before July 1) and “Post” upstalk (dates on and after July 1). Games played had to be >=20. Frequencies showed that:

Pre July 1
Corp win % for all games: 51.8
Post July 1
Corp win % for all games: 57.1

Small jump. Is it significant?

I did a logistic regression analysis to see if there was a difference in corp win rates by time period for all games.

Here are the results. Analysis shows significance, with a point estimate of 0.2166 in favor of the Post period.

I then did it again, this time limiting my dataset to games >=20 and where BOTH players had a win % >= 60%. Results also show significance, and point estimate went up to 0.3688.

Interpretation: So basically what that’s saying is that the difference in the Pre time period and Post time period was significant, and was more pronounced when players of “high ability” (win % mean plus sigma 1) were playing. The Least Squares Mean estimate is the log odds ratio. My take away from this is that something happened during the Post time period to change the rate at which Corp wins games.