Narrow Tier Rosters 2014+ Explained

During the course of 2013 I had to answer a lot of questions about NTBB that I had never thought I'd have to answer - in some ways my house rule project had become a victim of its own success. Most importantly NTBB was criticized for being based on mere discussion rather than solid proof. Which is quite true, since back at the inception of NTBB there was very little data available. Even so the critique opened my eyes to the fact that by now a lot of data on CRP play exists, and that this data is could be used to guide the changes made in NTBB. Thus NTBB2014 contained more changes than I had originally planned. One might almost call it NTBB2.0

With NTBB 2014 I introduced inferential data as a way to identify/justify which teams to nerf and buff with roster changes, which in turn caused me to roll back changes to some teams while introducing changes to other teams. I'll present the NTBB2014 (and 2015) changes and the thoughts behind them first. Afterwards I'll get to the more technical stuff.
As always - enjoy.

Part 1: NTBB 2014 - The roster changes
Given the tier percentages defined by the BBRC, but interpreted in the way
outlined below, only Amazons, Undead and Wood Elfs have overperformed, while Orcs, Chaos, Gobbos, Halflings and Ogres have underperformed for their tier. When you add in the NTBB goal of raising the bottom of all tiers, then Slann and High Elfs have also underperformed, as have the tier 2 Vampire team. That's a total of 11 teams.
With that as my starting point, I'll explain the roster changes used in NTBB, and the adjustments made between NTBB2013 and NTBB2014/2015

Top Performers:
1)
With no evidence that they overperformed the nerfs to the Dwarf and Orc roster have been revoked. much as I liked them. (And I really did - I think Orc blitzers for 90K was a very fair price, and Dwarven Slayers with Juggernaut rather than Block made them a lot more interesting, not to mention in-character suicidal. But so be it).

2) Wood Elfs have overperformed in NAF tournaments. This is more than anything to do with Wardancers picking Strip Ball before anyone can counter properly. Being elfs with a super hard big guy just compounds the problem, but as I won't remove a position entirely, and because the Treeman statline is shared by the struggling Halfling team, then a nerf to the treeman is out of the question. In the end I decided to trade one of the Wardancers most useful skills for the generally less potent Fend - meaning that wardancers won't develop quite as quickly into cage breakers. Sure, they may go with Wrestle as first pick, to some extent recovering what was lost. But being a Wardancer who is often on the ground is very risky.

3) The Amazon team has massively overperformed in the short term in the Box stats. While this may to some extent be due to their ability to trim their roster (aka min-max) in short term play, it is worth noting that they have also overperformed in the longer term. NTBB2013 tried out some rather extensive changes to the amazon roster - in retrospect moving beyond the more humble ambitions of NTBB. Many of these changes were based on the assumption that the Amazons perform poorly in the longer term - but the data certainly does not support this assumption. Furthermore, in the (admittedly limited) playtest of NTBB2013, the Amazon have done quite well, so there is no real indication that 2013 version was a nerf at all.

Thus, the NTBB2013 version has been scrapped, and the Amazon team reverts to the original roster, with a simple +10K price increase on the linewomen. This will make the team less useful for min-maxing, while at the same time making the Throwers, Catchers and Blitzers (comparatively) more appealing.

4) The Undead team has overperformed in both the Box stats and the NAF stats. NTBB2014 maintains the nerf from NTBB2013, where rookie Mummies have Grab rather than Mighty Blow. Using Mummies to dish out pain to opponents that haven't yet been able to deepen their bench is certainly one source of early Undead power. Another is the power of early Blodge Ghouls - but as those are shared by the Necro team their stats could not be easily altered.

Underachievers:
In NTBB2014 I made the decision to slightly raise the bottom of tier 1 (from 45% to 46˝%). This meant that a few teams who have performed on the very edge of the BBRC's tier 1 get a marginal buff in NT.
5)
Slann retain the 10K discount on the Blitzers already instated in NTBB2013.

6) High Elfs have been weak right from the outset and in early development. In an attempt to help out starting High Elfs in a way that gets less potent as the team develops, the famous High Elf Thrower gets Accurate for free. The buff is on the Thrower because he is a signature player of the team, but also because as Throwers develop, their skill options aren't super thrilling, meaning that starting with a skill that would otherwize have been a prime pick gives diminishing returns with every new skill the player acquires.

7) The Chaos Team used to start out weak and finish strong. So weak in fact, that they have underperformed in NAF tournaments. Even though the CPOMB nerf will reduce the power of developed Chaos, I'm still wary of giving them a short term buff that will also be valuable in the long term. In the end, I put Leader on the Minotaur, turning him into a Minotaur Lord. The result is a great bargain for starting Chaos - but as most Chaos teams develop into high TV, the Minotaur becomes a liability and is often cut from the roster.

8) The rookie Vampire Team has not performed below the BBRC's standard for Tier 2, but it has underachieved compared to the NTBB definition of Tier 2. Therefore NTBB2014 keeps the buff to the Vampire Team finalized in NTBB2012: The Thralls get Thick Skull for free in order to make the team less likely to implode. In buffing the thralls I rejected a classic 6338 statline because it can only be priced at 50K, and that higher cost would make the buff quite ambiguous. Besides, the human species seem to be 6337 + skill/AV+, so it fits that template perfectly.

Gobbos, Snotlings and Halflings
Since it is the ambition of NTBB to improve the performance of the tier 2 and tier 3 teams, the stunty teams have been getting buffs from the get go. In NTBB2014 the 'Right Stuff voids Tackle on blocks' rule from Galak’s original list has been reinstated. It was originally excluded from playtest of earlier versions of NTBB because some key playtesters disliked the sound if it - and at that time I really needed playtesters. But with the stats presented below, it is clear that the tier 3 teams have had a hard time performing even at the very low level that is the bottom of the BBRC's tier 3 (25%).
With a buff to these three teams included in Plasmoids CRP+ they have had to lose some of the other freebies previously granted by NT roster changes - and I used the opportunity to streamline the NT buffs.

9) The Ogre team was the only one which already in NTBB2013 already benefitted from the new Right Stuff rule - just applied to Titchy at the time. No further changes were made.

10) The Gobbo team in NTBB2013 had lonerless Trolls, and their Secret Weapon had a +10K price hike but started with Sneaky Git. Some gobbo coaches had remarked, that it was a shame that the new Sneaky Git was granted like that, rather than being a development option. At the same time I felt that the +10K/SG buff wasn't an unambiguous buff, because of the higher cost on the starting roster. So, I removed the change to the Secret Weapons, and instead further buffed the Trolls. I found 2 skills for them with nice appeal, and which matched a GW fluff description of River Trolls quite nicely.

11) The Halfling team of NTBB2013 had AV7 linemen and 0-4 AG4 Catchers (as a nod to the super agile halfling catchers of 2nd edition). Just like the gobbos getting the new Right Stuff meant something else had to go. I removed the AV7 linemen both for balance concerns and in order to make the team change more simple - just one statline tampered with rather than two. In effect this shifts some of the halfling buff away from the starting roster and into the longer term, which lines up fine with the data, which implies that starting halfing teams aren't in that dire straits.

12) The tier 2 underworld team also enjoys the new Right Stuff bonus, and has as a result moved back to the original 70K reroll cost. This moves the bonus from early performance to later performance, and later development seems to be where the team struggles more anyway. Perhaps Underworld should have had both bonuses, seeing as how they're one of the teams using CPOMB, and hence hurt by the nerf of CPOMB. On the other hand, the CPOMB combo is also currently very deadly to a squishy Underworld team.

13) Ripple effects: Gobbos on an orc team will also benefit from the Right Stuff, which seems OK considering how generally unpopular gobbos are on orc teams. Not to mention that in the stats the Orc team is the first bashy team to collapse, so they could do with a consolation prize. Dwarfs and Chaos Dwarfs will also be getting slightly fewer freebies against the Right Stuff sides, which can only be a good thing.

Part 2: NTBB 2014 - Goals and definitions
In this second half of the document I want to explain why NTBB looks the way it does. As already mentioned, the ambition of NTBB is to make Blood Bowl more closely matched, and hence more varied and exciting. As a set of house rules, I hope that they will appeal to coaches that share this view of BB. But I've been asked to clarify my thinking, and so I shall.

Looking for Evidence
When I was first encouraged to base NTBB more on tangible data and less on discussion, I was told that the data was so plentiful that it wouldn't be a problem. Unfortunately, that is far from the case. Ideally, I'd like for all of the Blood Bowl races to fit within their tier performance brackets in all three major meta-game environments: Resurrection tournament, TV-matched online play and table top league. As it happens, there is no extensive collection of data for tabletop play, so I've been working with Resurrection data and TV-Matched data. My assumption is that at least low to mid TV league play resembles Resurrection play (roughly matching TV1200) and low to mid TV online play.

As you will see below, there are teams that have overperformed or underperformed in the available data pools, and which we can infer with a 95% certainty would continue to do so in the same situation. This is what NTBB uses to decide which teams to tweak. What I cannot do is prove that the team tweaks made work as intended, so the prudent thing to do would be to do nothing. And it is indeed your prerogative to not do anything. But if you feel, like me, that the uncovered anomalies match your personal playing experience, then you're welcome to adopt these house rules in the hope that they even the playing field. It's your call.

Defining Balance
The term 'tiers' has been used in Blood Bowl for a long time to differentiate between the good teams (tier 1), the substandard teams (tier 2) and the horrible teams (tier 3). When the BBRC started work on LRB5, they came up with a clearer definition: As a lifetime average across all teams, coaches and opponents a tier 1 race should have a win percentage between 45 and 55 percent, tier 2 between 35-45% and tier 3 between 25-35%.

I find it absolutely uncontroversial that taken as a measure of balance, the tier 1 definition is lacking. I don’t think this reflects poorly on the BBRC, but rather on the quality of the data that the BBRC had available at the time.
Back then the BBRC defined balance as lifetime performance – indeed they had no TV-specific data – so a team which is super weak in early development and super powerful when fully developed is perfectly fine under that definition. But I disagree.
Similarly, a team being super broken in one meta-game but fine in others, for me constitutes a balance issue.
Also, the BBRC couldn't filter out mirror matches. Mirror matches mask the true performance of any team by pulling their win percentage towards 50. While I can understand the argument that mirror matches are a natural part of a team’s performance and should be included, I disregard mirror matches because if a team is indeed broken, then it will be played more, generating more mirror matches it will generate, essentially hiding the problem.

So I'm using a different tier definition to the BBRC one: The tier percentages represent the performance that the team in question ought to have within any extended TV-bracket (say, 30 to 40 points), excluding mirror matches, (ideally) in any of the 3 major metagames.
On top of that, it is the goal to lift the performance/competitiveness of the tier 2 and 3 teams, and to treat the tier 1.5 teams as tier 1.

The Narrow Tiers
There are several stumbling blocks to getting all teams to comply with that definition - not to mention knowing (or proving) whether I did!
As stated earlier, you should play these rules because you think they make sense - because I can't prove that everything works perfectly

Never the less, these are the tiers that NTBB is aiming for:
Tier 3: 35-45% [Goblins, Halflings, Ogres]
Tier 2: 40-50% [Underworld, Vampire]
Tier 1: 46˝--55% [Amazon, Chaos, Chaos Dwarfs, Dark Elf, Dwarf, Elf, High Elf, Human, Khemri, Lizardmen, Necro, Norse, Nurgle, Orc, Pact, Skaven, Slann, Undead, Wood Elf]
- the tiers overlap, unlike the BBRC definitions. This is because, as mentioned, the BBRC definition is for lifetime total, while the NTBB definition is the boundary that the team should not stray outside at any prolonged span.

To summarize, the NTBB tiers are narrower and hence more equal because:
1) The tiers contain fewer teams that stray above or below the tier boundaries.
2) I’ve changed the definition of performance from a lifetime performance to something more limited, meaning that the tiers are narrower even though the percentages look the same.
3) The tiers don't contain teams that have their true performance masked by mirror matches.
4) 1,5% has been chipped off the bottom of tier 1, making tier 1 slightly narrower.
5) The total area covered by all the tiers has been narrowed.

The Data: Tournament Play
The tournament data used as the foundation for NTBB2014 was collected from Doubleskull's
tournament data site in January 2014. The data is LRB6 only, but could not be sorted into TV brackets. I've only included those teams near the edges of their tier.
The margins shown in the table represent the 95% confidence interval.
I should note that it seems that the NTBB-like practice of granting different bonuses to teams based on team performance is becoming increasingly popular in NAF tournaments, thereby making the resulting stats less and less indicative of team power.
Therefore I won’t be tracking NAF stats beyond january 2014.

NAF Data

Team

Points/Games

Mirror Matches

New Total

Mean (%)

CI

Total

Undead

5656/10154

552/1104

5104/9050

56.40

1.02

55.38 - 57.42

Wood Elfs

4948˝/8868

342/684

4606˝/8184

56.29

1.07

55.22 - 57.36

Lizardmen

3676˝/6808

205/410

3471˝/6398

54.26

1.24

53.02 - 55.50

Nurgle

753˝/1690

11/22

742˝/1668

44.51

2.39

42.14 - 46.90

Chaos

1194/2701

34/68

1160/2633

44.06

1.90

42.16 - 45.96

Halfling

956˝/2741

49/98

907˝/2643

34.34

1.91

32.43 - 36.25

Ogre

767˝/2406

45/90

722˝/2316

31.20

1.89

29.31 - 33.09

Goblin

1260˝/3914

137/274

1123˝/3640

30.87

1.50

29.37 - 32.37

To summarize:
Both Undead and Wood Elfs can be confidently said to have performed above the boundary of tier 1.
Nurgle and Chaos have performed near the bottom of BBRC-tier 1, possibly underperforming. As NTBB has raised the bottom of tier 1 by 1˝%, Chaos has performed below NTBB tier 1, while Nurgle with their substantially fewer games can not be confidently said to have been outside tier 1.
Finally, the tier 3 teams have definately performed below the NTBB definition, but within the tier 3 definition of the BBRC (with Halflings near the top of the tier, potentially in the bottom of tier 2).

The Data: TV-Matched Online Play
The second source of data for NTBB2014 was FUMBBL's notorious
Black Box - a TV-Matched Online environment. (Switch from "post-fix" to pre-fix" on the site in order to see the data).
The data can be seen in the massive table below. Basically, I went through the data in 100 point TV increments, writing down all performance means. If a mean was outside of the tier boundaries, then I checked means in adjacent brackets also - and if several neighboring means were either too low or too high, then I calculated the statistical inaccuracy with a 95% confidence interval. The bright green bands denote overperformance and the bright red bands denote underperformance. The paler green and red denote tier bands where a team has performed partially outside the tier boundaries - but can not be confidently said to have performed fully outside the tier boundaries.

Looking at mid- to high-TV performance (say, 1700+) is largely irrelevant for the purpose of NTBB. Not that I don't want to encompass long term play, but Plasmoids CRP+ leads to a different metagame than CRP, and that will really start to show it high TV. (Furthermore - high TV Box stats are nothing like league play: Roughly a third of all games are played by Chaos and Nurgle, so win stats in high-TV Box are mostly about being good against Chaos and Nurgle). Anyway, we can see that the majority of classic bashy team have crashed performancewize as TV rises.  Some have fallen completely outside tier 1, while others have fallen suspiciously close to the 45% mark. The main exception is Chaos and Nurgle, who have done well, but not overly well - possibly because they are driving eachother's numbers down. At the same time the (comparatively few) elves and skaven have thrived - with Wood Elfs and Dark Elfs clearly overperforming. Finally, the Gobbo and Ogre teams have performed below the BBRC tier 3 (earning them a CRP+ buff), while the halflings have performed marginally better - but nowhere near the level intended by NTBB.
It is my contention that these high-TV stats reflect some of the meta-game problems that Plasmoids CRP+ rules try to address, but that these problems are further compounded by the nature of the TV-Matched Online metagame.
 

That leaves the low- to mid-range TV stats. I'm estimating this to be TV 1700 or less - but that's a rough estimate. In this range we can see that:
Both Amazon and Undead have performed above tier 1 from TV 0 - 1500
Wood Elfs and Dark Elfs have started to perform suspiciously well around the 1500 mark, though it is not quite clear exactly where their overperformance kicked in.
Vampires have performed below the NTBB-adjusted tier 2 from TV 0 - 1500
Both Slann and High Elfs have performed near the bottom of BBRC tier 1 - and below the NTBB-adjusted tier 1 - for TV 0 - 1500 and 1000-14000 respectively.
Finally Gobbos, Halflings and Ogres have performed way below the NTBB-adjusted tier 3.

Team

0

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

2200

+++

Chaos

50.0

50.4

48.8

48.1

47.1

46.4

46.8

50.3

51.9

51.6

52.7

53.1

53.5

53.06 - 57.42

Hum.

40.0

49.2

51.3

49.0

48.8

49.5

49.2

47.9

48.0

46.3

46.6

45.3

47.0

47.3

45.7

Nurgle

56.7

47.0

47.2

44.0

46.4

46.8

48.5

49.6

49.3

50.4

49.8

52.5

53.6

54.96 - 58.30

Slann

43.58 - 46.34 (4945)

47.9

47.4

46.4

47.1

45.8

39.8

43.8

41.3

CDwar

53.66 - 55.14 (17517)

52.6

53.2

51.9

51.3

53.5

51.9

46.7

50.4

Liz

54.23 - 56.19 (9879)

53.5

51.3

51.2

53.0

52.1

53.7

44.4

50.0

Necro

50

52.4

52.1

53.0

53.27 - 55.47 (10655)

52.6

54.0

47.7

54.0

50.3

37.3

Amazon

59.28 - 61.40 (8240)

54.25 - 57.87 (2888)

54.0

59.5

44.6

Undead

55.95 - 57.97 (9248)

53.2

50.5

50.9

50.1

45.0

42.7

26.52 - 43.02 (128)

CPact

64.6

51.1

50.4

51.2

51.6

49.1

49.6

48.1

49.3

48.2

46.0

38.35 - 44.97 (851)

Dwarf

40.0

53.9

54.1

53.4

52.3

51.7

51.6

51.0

48.0

47.9

41.11 - 44.55 (3185)

Khemr

50.0

52.5

50.4

47.6

49.6

50.5

45.9

47.6

46.0

47.1

45.9

42.8

42.3

41.0

43.4

Norse

57.9

52.8

53.1

52.0

52.0

52.6

49.5

48.1

46.6

47.2

38.30 - 44.65 (955)

Orc

38.9

49.7

49.5

49.4

48.0

47.5

45.7

40.83 - 42.83 (9404)

DElf

-

51.1

49.8

51.3

51.9

50.3

51.4

55.82 - 58.44 (5460)

Elf

0.0

48.1

48.9

50.9

47.6

48.8

53.4

56.2

52.6

53.6

55.22 - 64.54 (425)

Hi Elf

100.0

46.8

43.42 - 46.48 (4042)

48.2

48.5

49.4

52.8

58.3

58.0

54.7

65.4

56.1

WElf

50.0

54.1

49.8

50.6

51.0

52.1

53.0

57.71 - 60.93 (3574)

Skav

68.8

54.3

53.8

51.4

53.5

53.4

53.5

52.2

54.4

54.5

58.2

59.0

55.9

61.7

48.8

T2 UW

47.5

43.6

42.7

43.9

41.5

42.9

41.5

44.6

35.0

42.8

38.4

38.9

42.1

45.7

38.9

T2 Va

68.8

35.88 - 39.38 (2946)

43.2

41.6

45.1

42.3

48.3

51.21 - 57.21 (1057)

T3 Gob

25.0

43.5

32.1

25.7

20.34 - 24.12 (1851)

T3 ˝s

40.9

30.0

31.6

32.6

29.3

27.0

28.6

15.32 - 25.6 (237)

T3 Ogr

17.9

41.5

30.7

26.0

25.9

26.8

27.0

18.99 - 24.01 (1028)

Below you can see the specific data for the coloured bands, if you want to check for yourself.

The Box Data

Team (TV span)

Points/Games

Mean (%)

CI

Total

Slann (0-1500)

2223˝/4945

44.96

1.38

43.58 - 46.34

CDs (0-1500)

9528˝/17517

54.40

0.74

53.66 - 55.14

Lizardmen (0-1500)

5454˝/9879

55.21

0.98

54.23 - 56.19

Necro (1200-1700)

5809˝/10655

54.52

0.95

53.27 - 55.47

Amazon (0-1500)

4972/8240

60.34

1.06

59.28 - 61.40

Undead (0-1500)

5267˝/9248

56.96

1.01

55.95 - 57.97

High Elf (1000-1400)

1817/4042

44.95

1.53

43.42 - 46.48

Vampire (900-1300)

1108˝/2946

37.63

1.75

35.88 - 39.38

Gobbo (1200+)

411˝/1851

22.23

1.89

20.34 - 24.12

Halflings (1500+)

48˝/237

20.46

5.14

15.32 - 25.6

Ogres (1500+)

221/1028

21.50

2.51

18.99 - 24.01

Dark Elf (1500+)

3119˝/5460

57.13

1.31

55.82 - 58.44

Wood Elf (1500+)

2120/3574

59.32

1.61

57.71 - 60.93

Amazon (1500-2000)

1619/2888

56.06

1.81

54.25 - 57.87

Undead (2100+)

44˝/128

34.77

8.25

26.52 - 43.02

Dwarf (1800+)

1364/3185

42.83

1.72

41.11 - 44.55

Norse (1800+)

396˝/955

41.52

3.13

38.30 - 44.65

Pact (1900+)

354˝/851

41.66

3.31

38.35 - 44.97

Orc (1500+)

3933˝/9404

41.83

1.00

40.83 - 42.83

Elf (1800+)

254˝/425

59.88

4.66

55.22 - 64.54

Vampire (1800+)

573/1057

54.21

3.00

51.21 - 57.21