Testing, testing, 1 2 3

How much testing do you do with decks?

Netrunnerdb seems to be teeming with people that have rolled on a random chart and put a deck together before sharing it with the world, but even those that have tested have frequently run single-digit games. “This deck has gone 6-1 in testing!”

It might just be my background and career choice, but this doesn’t seem anything but a basic “does the deck compile?” unit test. For me, real testing needs tens of games against multiple archetypes.

Essentially, if you can’t count the number of times you’ve used the deck on your fingers, it’s not tested yet.

How about you? What constitutes a well-tested deck? How many games do you give something shaky before you admit to yourself it’s not working? What number of consecutive successes convince you that you’ve minted pure gold from base lead?

As one of the segment of playyas that likes to make decks last minute, in the wee hours before a tournament, I’m not suggesting anyone can’t play untested decks at a Store Championship, or isn’t allowed to post it online until they’ve had 100 games minimum… but it is curious to me that a handful of games counts as “tested”.

2 Likes

Can I tack on another related bugbear? The habit of saying something has an x% matchup against something else. I know it’s shorthand and a broad estimate, but I’m extremely doubtful that enough testing (see, I’m bringing it back on topic) has been done to give those percentages any weight whatsoever.

2 Likes

yeah really not enjoying the onslaught of crappy lists that are different by one or two cards, or otherwise complete trash, and get posted saying 8-0 or 12-2 or 15-1 on jinteki/locally. Hopefully that’ll die down a little bit after store champs but maybe there should be an option to downvote lists or something. I’ve just generally stopped reading netrunnerdb but the influx of decklists being posted on facebook and on reddit are also very irritating…but yeah.

The way I see it, as much testing as you can is a good idea. I tend to make more off-meta decks, so I’ll be trying them out weeks in advance to see if they actually work. Although, I mostly play my testing games on Jinteki.net, which has a bit of a warped meta compared to tournaments.
I think 7 games is a solid start for a deck, enough to put it up on something like netrunnerdb, but you need to say that it’s still a work in progress. After 7 games I think you’d know if the deck actually works, and the rest is just figuring out which cards can stay and go, as well as new additions.

50+ actual games, probably twice as much on goldfishing and on playing vs myself.
Not gonna assert that a deck is good/bad before that.

3 Likes

A full time career and 2 kids, combined with the sheer number of archetypes (and permutations of archetypes) the card pool offers renders any serious, robust testing as just not feasible. Frankly, you need a computer for it because even if you do have the time to commit to it (its a young mans game) the natural variance of the game means that your sample size has to be huge to come to any reliable conclusion. e.g. is the deck good but I am not good with it? has natural variance skewed my results? have I been playing against weaker players or against favourable matchups? There are too many variables to really isolate any particular one.

For my part my “testing” involves playing 10-20 games per week of real play and a hell of a lot of thought/theory crafting outside of that. A lot of decisions are made upon intuition and understanding of the game rather than hard numbers. Group think on the internet is an excellent shortcut, but it also leads to confirmation biases. For example, looking at store champ results, Faust Anarch & NBN FA are “good” - but they also make up 50-70% of the field at some events, so that becomes somewhat inevitable.

I believe that efficacy trumps outright deck power in the majority of cases anyway.

11 Likes

Yup. There’s been plenty of times where I’ve been playing online with a new deck, and have thought ‘is this deck good, or does my opponent just suck?’

I took a really gimmicky deck to a tournament and got trashed because without realising it, I’d built it to specifically counter an archetype I saw online a lot.

When I start a homebrew, I number it as v0.01 and start playtesting. I keep record of my games in a google doc, including notes about particular plays, rulings and ICE/Icebreaker matchups that I encounter. Every 20 games or so I “version up” [i.e: v0.02] look at the games I played, difficulties I encountered and switch some cards out.

2 Likes

For me:
Proof of concept is 5 games.
Proof of effectiveness is 15 games.
First tiering tentative in 25.
More tests for corner casing.

3 Likes

Once I’ve settled on most of the card choices, probably something around 30 games. I might switch 1-2 cards around after that, but that’s when I try to get good at playing the deck, and then decide whether it’s strong enough. Apparently I’m a crappy deckbuilder, because no deck I’ve built has passed my own test.

IDK about the exact numbers, but I think it’s really important to be pointing out these differnt levels of “tested.” The idea of everyone performing a statistically robust survey for every deck they post is as silly as it is impractical. I think it’s safe to assume that when someone posts a deck and says

serious players will take that to mean the obvious: Proof of concept, or as you say, “does it compile.” I don’t think there’s any need to shovel dirt onto these people for wanting to share something they’re working on and get people’s feedback - this point of testing, where you can safely say “I tried this deck a bit and it didn’t fall apart immediately, so I think we can have a conversation about it beyond ‘this deck sucks and is unplayable’” seems like a perfectly reasonable point to be posting a deck.

TLDR: I don’t think most (any) people have a binary idea of “this deck is tested or it isn’t tested.” Especially if they’re posting their exact numbers, they’re just letting you know where they’re at with testing it - I don’t think it’d be fair to say “you’ve only played 7 games so please refer to this deck as untested because you know literally nothing about how good it is.”

10 Likes

I agree and also add that sometimes, 25 games after a design, the meta changed (new datapack, etc), invalidating all previous tests.

So describing lines like “the deck was tested for 100 games” is strictly invalid to my point of vue, when 25% of the original design is still here : it’s more ID testing than deck testing, and I’d assert that overtesting IS an issue.

2 Likes

Netrunner is still a small enough game that you can figure out how most decks deal with most situations just by looking at them TBH. How games play out still depends a lot on how an individual plays the game/archtype as a whole as well. Someone who likes to poke HQ with spare clicks early on (no matter the deck they’re playing) will have better results vs. someone who is more prone to keeping agendas in HQ, no matter the deck they’re playing.

3 Likes

It seems fine to me to post a deck after its gone 6-1. Like people have said, you know the deck at least functions at all at this point, now you could keep testing it as is, or try to tap the huge wealth of knowledge that the Netrunner community has gathered. I think the problem OP is describing is when someone posts a deck that’s gone 6-1 and declared it to be written by God’s hand and capable of converting the non-believers.

A bit of hyperbole, but NRDB write-ups tend towards exaggeration. Take it with a grain of salt and, if you care to, leave some constructive feedback. Personally, I would publish a 6-1 deck as a proof-of-concept and humbly ask for input.

1 Like

Yes, I think you nail it here. This is the real source of people’s annoyance I think: over confident write-ups. Which is an entirely different issue than “people don’t understand how much testing is enough.” It is very possible (and even common) for people to greatly exaggerate the strength of their list even after dozens and dozens of games worth of testing.

2 Likes

My general take on testing is it does require several stages.

  1. Simulated draws in NetrunnerDB, am I getting enough econ/ICE/tutors/whatever to drive my game plan? Basically only enough to check that econ engine runs, but any deck is hopeless without this basic level.

  2. Testing on Jinteki.net. Casual and Competitive don’t really seem to offer much of a split in terms of challenge level. The nice thing is seeing all of the jank thrown at it can give a good idea of how broadly the deck works and is not teched too hard against my expected meta (e.g. Faust or NEH)

  3. Testing with my brother. I maintain example Tier 1 decks to simulate tournament games against, and he has several of his own. Things like Foodcoats, Butchershop, NEH Fastro, Dumblefork, Faust Noise, and Shaper/Crims we can get to mostly work. We are both pretty good players, so this level of testing can be pretty unforgiving, but it’s where the best decks get made. These are nice to test tweaks and fine-tuning that we may do between tournaments to tech against particular match-ups.

  4. Play at local casual tournaments. Most shops with regular meetups will have a tournament for some leftover prizes/store credit/bragging rights. These can be great for testing new decks, since you will see a mix of tier 1 decks, and some tier 2 and jank decks.

I don’t need to use all 4 steps for every deck. Some will never go to a tournament, so obviously they won’t need 3 and will only casually use 4. Decks I copy from NetrunnerDB can skip pretty quickly to 3 and 4 if I am confident in the game plan.

As far as Netrunenr DB comments go, they are all bluster. I’ve met some people who wrote up the most bombastic text to sit alongside a decklist, and they were much more humble in person and open to different ideas than their write-up would suggest. Take any write-up with a grain of salt.

5 Likes

My usual testing involves running it against the ‘gauntlet’ which is basically what I would expect to see at a tournament. So If I’m making a corp deck I’m going to run it against Noise, and Whizz primarily and then Maxx and Pitchfork secondary. If I’m making a new runner deck I’m testing it against NBN FA, Foodcoats primarily and Haarp Kill secondary.

1 Like

@Sotomatic This is what I call “drunken boxing” : you will loose most of the times and try to learn from there.
It’s my method, it have strengthes and weaknesses.
To me, it’s making you a better more combative player for sure but it’s a slow “bottom-up” method to improve your decks.
Since I try to mix bottom-up and top-down (aka “Stimhack classics” : you need X econ card / Y multiaccess / Z tutor / T draw card / U breakers etc) analysis, I made these last monthes way-way better decks.
I use what I learn in 3 years of drunken boxing to test the inherent bluffing part (this works btw, and yes, there’s first glance good/bad designs there too), since bluffing is sometimes the only way to beat T1 decks with a Tlow deck.

1 Like

I can understand why somebody would be annoyed by this, but doing the Glass Shop writeup was fun, exiting, and a crowd pleaser. If people really didn’t like the cocky/overconfident write ups, they wouldn’t be popularized on NRDB. I’m not saying that it’s “correct” or anything, it’s just fun for the reader and the writer.

3 Likes

I find testing in NR is very difficult compared to how it was for me for Magic. The sheer number of decisions you make in a game where a wrong call can mean a game loss makes evaluating deck quality very tricky.

It’s also often very hard to know where the “ceiling” of a deck is, i.e. early drafts will usually kind of suck, but how much better will the ideal build be than that early draft?

It does help a lot if you can find a consistent testing partner and also alternate the sides you play for each matchup. Without switching sides you will often be just finding out who is better at playing those decks since piloting is invariably a major factor.

“Real” quantitative information like a “x% matchup against Y” is essentially impossible to obtain for practical purposes. Especially since changing a single card can be significant when you often see most of each deck. When people say numbers like 75% I read it as more of a “this deck is heavily favoured against Y”.

3 Likes