Why ‘one-trick-player’ Specialists Ruin the Competitive Experience

No subtitle necessary.

 

This is a point that I talk about a lot on my stream and with other highly rated players. It’s such an accepted reality for top players that there is not much to discuss; we all recognize that one-tricking ruins games because it so often ruins our games. I would imagine that the behavior is similarly frustrating to flexible and competitively minded players at any rating. Nonetheless, for many it is a point of contention and this post will regard it as such. I felt that it might be helpful to the discussion to clarify the argument from my perspective. I encourage you to read through and comment below with your thoughts.

‘One-tricks’ is a term that picks out those players in Overwatch who only play one hero. This is an importantly distinct concept from ‘Mains’, which is a term that picks out players who (perhaps significantly) specialize in one hero, but are willing to swap if the game demands it. This article will contend only that the former (one-tricks) are significantly corrosive to the competitive experience in ranked matchmaking.

One-tricking is always a problem because every hero is, to some extent, situational. The Skill Rating system that has been engineered to create competitive matchups is, however, unilateral. By ‘unilateral’, I mean that each and every player has a numerical value that distills their expected game impact and contribution to victory. This number is reflective of both historical W/L and individual statistical performance, but this is of course a topic for another article.

Since the matchmaker seeks to create teams with similar average SR, it crucially relies on the accuracy of its judgement of ‘expected contribution’ to create fair games. If a given player were to consistently under- or over- perform relative to their expected contribution, it will adjust in pursuit of equilibrium. This is of course the way the Skill Rating system is supposed to work.

What about in the inconsistent case?

Every player is to some extent inconsistent; we are not machines and our success rate will of course vary from game to game. However, for the vast majority of players this inconsistency will entail a fluctuation above and below an average. This fluctuation is unavoidable, but I would argue that it is not significant enough to pose a tremendous problem for the matchmaker. For some players, though, the inconsistency of their real contribution is much more significant.

For one-tricks, the impact of map selection on their real contribution is massive. Since every hero is at least better on some maps than on others, one-tricks will see radical differences in their performance on favorable vs. unfavorable map draws. For extremely specialist one-tricks (namely the builders), this RNG can spell victory or defeat before the assemble screen is over. Even the coin-flip to start on Attack or Defend can be significant; Torb/Sym players can do especially well in the first defense and, in many cases, sap at the enemy team’s will to live. Momentum is actually a very important part of the game. An extremely successful first round is much more likely to produce a victory than a calamitous failure followed by a miracle comeback.

The matchmaker, in its current state, is thus unable to accurately predict the contribution to victory from these players. This significantly reduces the likelihood that matches which include one-tricks will be competitive. If you are placed on a team with a one-trick on a map which favors their hero, their impact will likely be much larger than the expectation of the system and vice versa for an unfavorable map. Perhaps one-trick players accept this randomness and can enjoy the game whilst reducing a significant portion of their games to a coin flip, but for the rest of the players who are matched with them it is a deeply joyless experience.

The natural further consequence of this is that players in a match affected by one-tricking have their Skill Rating distorted as ‘underdog’ teams with favorable map selection are unduly rewarded and ‘expected victors’ are unduly punished. These players then take their distorted Skill Rating into the next game, wherein they are slightly less likely to perform at the expectation of the match-maker. Game quality declines marginally, even for games devoid of one trick players.

The more specialist the character, the more significant the distortion. The more significant the distortion, the worse competitive matchmaking gets. Some would contend that players who main ‘off-meta’ characters are unfairly loathed compared to those who are extremely specialized in a character like Tracer, Zenyatta, or Winston. This is where the distinction between specialists and generalists becomes very important. Although even these characters are to some extent map dependent, describing them as ‘generalists’ is accurate because they are only marginally affected by map and side RNG. There is still a distortion effect, but the impact of matchmaking/map RNG on a generalist is far less significant.

Succinctly, this is why I think that one-tricking specialists ruins the competitive experience: when the true randomness of map selection becomes a crucial determinant of victory or defeat, I lose interest. I don’t play Overwatch because I want to watch particle effects while I flip a coin. I thought we figured this out in season 1.

 

Let me know your thoughts in the comment section below or on twitter (@jake_overwatch).

 

 

The Path to Pro

Sustaining the Overwatch League

By: Izzy “Noukky” Müller

Note from Jake: I’ve been planning for some time to bring contributing authors to this blog, and Noukky is the perfect one to start that trend. I hope you’ll enjoy this very important piece on the sustainability of the Overwatch competitive scene, I think she has done great work.

“When solving problems, dig at the roots instead of just hacking at the leaves”

– Anthony J. D’Angelo

Let’s talk about the term ‘grassroots’. If we think of roots we think of a foundation that gathers resources to let something grow. In esports this foundation is often called the developmental tier, including not only tier 2 but also the amateur scene. One thing you might ask yourself is, ‘what makes the developmental tier so important? Though this article is focused on Overwatch, we will look generally at  why the developmental tournament landscape looks so grim right now and explore ideas of what will render it fertile once again.

How did we get to this grim state? Blizzard can’t be reasonably held responsible for specific tournaments cancelled, leagues delayed, or players dropped. Nonetheless, the announcement of the Overwatch League absolutely had a critical impact on the entire Overwatch esports industry.

Unlike other publishers that let the roots of their esports scenes grow naturally over a long amount of time, Blizzard rushed forward announcing the Overwatch League’s development, not even one year after launch. Blizzard was always vocal about Overwatch becoming a successful esport, but what does the OWL mean for other tournaments and their organizers?


Instead of being able to grow naturally, build up brand recognition, and plan for long term goals these organizers have to worry about the possibility of upcoming restrictions and monitoring in a post-OWL world. Blizzard tried to set guidelines for presentation of their product, which is not necessarily a bad thing. However, those circumstances made investing into the scene rather unattractive for tournament  brands as well as for many esports orgs. This all snowballed into a big pile of uncertainty about the sustainability and stability that a naturally grown circuit would bring. The grassroots scene has  been left wondering: “How do I get return on investment?” and “Will my work get taken away? Will there be even a place for me once OWL rolls around?”. Riot did a similar thing after they launched their League of Legends Championship Series (LCS), leaving the organizers that built up their scene for the first 3 years and establishing their own system. The big difference here is the time factor. Whereas the League of Legends circuit had time to grow naturally, Overwatch was in a much less mature state as its definitive league was announced

In a more optimal setting, third party organizers would be able to develop a sustainable tournament circuit by themselves. Bigger events like Gamescom, Dreamhack, and IEM act as anchor points that give smaller organizers the opportunity to fill in the gaps with weekly, monthly and seasonal events. One of the most important things to point out here is: there needs to be a path for aspiring players.

The path for an aspiring player looks something like this: you get into a team, you scrim 2-6 hours a day, and you try to compete. The first thing you need to look for is regular competition. As a newcomer it’s really important to have weeklies so you can get tournament experience under your belt. It’s not even about skill improvement if you get matched against stronger teams, but to learn how to play under pressure and not lose your cool on stream, even if it’s only in front of a few hundred people. These events don’t need to be numerous, but it’s crucial that they are consistent. National events and leagues are a part of this early development spectrum too. As time progresses, these smaller competitions suffer diminishing returns because you increase your level of play in a steady environment. For every player that improves beyond this level of competition, though, another will be quick to take his/her place.

Towards the top, your progress will demand bigger gaps between competitions because scrim time gets more valuable to work on strategy and mechanics. At this point you have graduated to “Mid-Tier” competition. These competitions are still not able to provide a sustainable income for you, but are equally as important as they are the top of the developmental tier. Teams and players start to gain recognition at this point, and it is a big stepping stone on the path to tier 2 competition (I define ‘tier 2’ as the point of self-sustainability). A mix of monthly events, seasonal tournaments, and online qualifiers for bigger events should comprise this tier. They give teams the chance to prove themselves against even harder competition to get on the radar of esport organizations. From this stage onwards, if the team progresses steadily, they count as a good investment for esports orgs. Successful teams, even at this level, aid brand recognition via tournament performance, social media, and on streams as well. There are hardly any tournaments not streamed in this stage. A smooth transition into the tier 2 and pro scenes should be within reach from this stage onwards.


With Overwatch League coming around, a big shuffle will happen and 12 teams with 6-12 players will rise to the highest level of competition. This will leave a large number of lost players that are looking to continue their careers. I am fairly confident that things will fall into place once this shuffle is over. The first thing I would really love to see is Contenders expanding as a regular competition for the semi-professional scene. This alone will not be sufficient; we need the third-party organizers to fill in the gaps between seasons with events too. Due to the uncertainty of the future of the tier 2 scene resultant from the OWL announcement, Blizzard needs to work proactively with these third party organizers to create a self-sustainable circuit so players will have a clear career path once again. Blizzard could either plan seasons around Dreamhack or similar organizers or preemptively line up tournament organizers for the offseason of Contenders to increase the chance for the scene to thrive. Another big factor is promotion and exposure. Using social media and the game client to promote this tier of events is essential to player and tournament sustainability. If Blizzard intends to run the tier 2 scene alone the Seasons need to be close together with a bigger prize pool than they have right now. In addition to that, an offseason tournament like they do with HGC in Heroes of the Storm will be needed.

With the upcoming team shuffle and the start of Overwatch League, one way to help the scene to gain long term viewership is to create incentives for career fans. The way Blizzard introduced this to Heroes of the Storm on Twitch should be, in some form, replicated into Overwatch. Allowing people to cheer with bits for their favorite teams would be a nice incentive to engage the viewer base that also aids with sustainability.

Starting with the idea of Open Division, what I would like to see is a relegation tournament between the best teams of Open Division and the lowest-ranked of Contenders. This not only creates competition, but also ensures that the players’ work will not be lost in the long run. At this point, we are still not sure about the real path to pro that was advertised because communication in general about Blizzard’s plan for sub-OWL tournaments is rather slim towards the public. If Blizzard has a cohesive plan to solve for sustainability in the tier 2 scene, failing to communicate it ahead of time might do just as much damage as having no plan at all. Tournament organizers will need time to adapt to Blizzard’s vision no matter what it looks like.

Being on a Blizzard broadcast channel or qualifying for a desired tournament are dreams that players should be able to reach with daily hard work. With support from Blizzard, organizations will also realize that this path of competition can be a sustainable part of their business. Fresh talents that work their way up from the amateur scene to the top of Contenders and get the chance to tryout for the big Overwatch League teams are potential good investments. Without certainty that the path exists, organizations will have a hard time making the original investment in these potential pros. Without the grassroots, there can be no springtime bloom.

On Learning & Esports:

Constructing an individual learning model adapted to competitive gaming

September 29th, 2017

 

“These young guys are playing checkers. I’m out there playing chess.”

-Kobe Bryant

 

In modern Western society, education is a government responsibility: a standard to be maintained. The institutional realization of No Child Left Behind policies in the United States have propagated the norm, for better or worse, that learning is a process of development to conformity. Perhaps this norm is efficient in that it accomplishes ultra-broad acquisition of a certain educational attainment (imo, we as a society can absolutely do better in education). This model of learning, however, is fundamentally based on the notion of memorization via repetition. These are natural foundations for education when the desired outcome of the learning process is ≥%70 score on a multiple choice test. Such a goal in the context of Esports is, of course, laughable. Different goals demand different processes, and as such the learning model of standardized education fails in Esports.

Another common model for learning has been built in traditional sports. This model is founded in rigorous competition and so mirrors the motivation and connection that many feel in competitive Esports. The standard established by top traditional sports professionals is laudable and should inspire anyone who dreams of the very hard work that is fundamental to success in any competitive arena. At the highest levels of traditional sports, peak physical condition is the expectation. The work will be hard, and generally speaking the harder the better. The conditioning realities inherent to traditional sports should be instructive for aspiring Esports professionals, but building a model of learning adapted to Esports will demand that its unique characteristics be taken into account.

My contention is that success in Esports demands not necessarily that the work be hard, but rather that the work be mindful. The work is a prerequisite, yet there will be no automated reward dispensed as ‘Total Time Played’ ticks over ten thousand hours. In my experience, I have found that real improvement is achieved not in the 16th hour of a marathon practice session, but rather in the subtleties of a key moment dissected. My model of learning is something more akin to that of craftsmanship than those of traditional sports or education. It’s contents are as follows:

 

  1. Perfect play is possible.
  2. Perfect play is never realized.
  3. Improvement lies in a Sisyphean will to thrive under the contradiction.

I use ‘Perfect play’ here to indicate the optimal strategy for victory from the perspective of an individual. It is important to make a distinction here between optimal individual decisions and an optimal team strategy. The latter is infinitely more complex and is my current pursuit. Maybe in a decade or two I’ll feel qualified to discuss it.

When a player loses a fight in which a teammate (or 5) made obvious and significant mistakes he or she is liable to experience anger and perhaps a sense of dispossession of control over the outcome of the game. The ‘perfect player’ finds such an experience alien. They experience only two moments: that of past failure and that of future success. The perfect player understands that every death is an inflection point; each offers a valuable moment of reflection in which one might uncover some personal imperfection. In Overwatch, there will rarely be a death that was impossible for one’s teammates to prevent. The perfect player, however, does not expect that their teammates will perform perfectly (or even exceptionally). The perfect player creates layers of redundancy, their play is carefully structured so that every ally has the easiest possible time in the contexts in which their roles intersect. If a teammate stumbles, the perfect player reacts instantly to bring the battle back under their control. Asking as little as possible from their teammates while delivering more than the team would ever expect are hallmarks of perfect play. Future unforeseen, reactionary missteps: these are the mistakes that the perfect player eliminates. The perfect player recognizes that every millisecond on a movement key is a decision, every flick of the mouse is a choice. The perfect player is, ironically, open always to the possibility of their own imperfection.

Perhaps item 1. is false. It might be the case that a system with 11 other irrational decision makers has no ‘optimal strategy’ in the game-theoretic sense, yet I doubt that there isn’t something very close for practical purposes. The realization of the possible is not vital, though. We are each and every one of us so radically far from perfect play that, lo and behold, gaming is actually fun. For some, though, the possibilities in our mind’s eye demand that we seek to empirically prove item 1 to ourselves and to the world. I think very few successful players lack the humility to see that they can always do better and accomplish more. Even the best in the world must recognize that such status is inherently ephemeral.

Regardless of whether or not item 1 is false in theory, we can be certain that the perfect player does not exist in our world. The crucial insight necessary to achieving competitive success is that the perfect player does not need to exist in order to inspire us. For every player on the planet there is another who, in at least one aspect of their game, is in some way superior. The perfect player can exist as an amalgam of all of those real players who best us in some absurdly specific category. It is only when we become as Sisyphus, giving the entirety of our will and dedication to a task that cannot ever be completed (the pursuit of superlative greatness) that our potential is unlocked. The player who has relaxed a relentless cross-examination of their own play is the same who has experienced a plateau in skill. When we accept our decisions, our movements, or our plays as ‘good enough’ and choose instead to cast aspersions on mistakes of others that we see on our spectating screens, we undermine our own goals and refuse our own potential.

I am not the perfect player, in case that wasn’t obvious. Neither am I the perfect learner. I fail to implement this rigorous self-critical framework constantly. The process of actively seeking out personal failure is both extremely effective and very challenging to maintain. If my responsibility for a loss could be valued at even one tenth of one percent, the entirety of my conscious focus ought to be on that exact fraction of the game. The reality is that we can only be responsible for our own improvement. This is the essential idea that I think is perhaps valuable to the improvement of others. Internalization thereof is much more difficult than understanding, however. On that front, perhaps traditional sports have figured something out about this ‘hard work’ stuff.

 

 

P.S. I can’t speak to a ‘formula’ for creating Esports greatness. I can only speak to my own experience of learning and adaptation that has borne fruit. Relentless pursuit of improvement via deep self-analysis and critique is probably not for everyone, just as the vast majority of people who seek success in Esports will fail. At the most charitable, this process is but one meager component on a long recipe that includes a healthy dose of luck. I wish any Esports hopefuls the best in that latter category.

Women in Esports

In a break from my typical design-analysis subject matter, I’d like to use this post as a platform to spark a discussion about the status and role of women in Esports. If you, the reader, feel any animosity toward this subject as a topic of openminded discussion, you might wish to take a moment to ask yourself why that is.

In traditional sports, the need for a gender separation is obvious. The different effects of testosterone vs. estrogen on muscle development are pronounced and undeniable. In competitions of strength and speed, then, it is clear that women should not be expected to be directly competitive with similarly talented and dedicated male opponents. In competitive gaming, this biological disparity falls away. Winning a competitive match in gaming has nothing to do with strength and usually quite little to do with speed of the type enhanced by muscle development. The freedom to employ any control scheme/sensitivity further suggests that testosterone is not a direct biological advantage in the way that it is in traditional sports. All this, of course, only begs the question of why it is that women are so rare in professional play.

Last year, the first and only cisgendered female professional Overwatch player (as far as I know) was signed to South Korean Esports organization UW Artisan. Even if there are more female Overwatch pros of whom I am not aware, they are certainly few and far between. Geguri is a highly talented flex-tank player, and her impressive aiming ability and overall skill secured her a spot on a salaried professional team. This should come as no surprise; strong players are signed to professional teams all the time. And yet, Geguri is of course special for the simple fact that she was and continues to be such a rarity as a woman in Esports. Her story (which I will return to later in the article) mirrors that of a very small number of other trans- and cisgendered women who have advanced to the highest competitive levels in their respective games. This article seeks to embark on an even handed analysis of why it has taken so long for women to breach the highest echelons of competitive gaming. These reasonings are neither complete nor necessarily empirically verifiable, but hopefully they pique readers’ interest in investigating further in their own experiences.

Demographics:

The most plausible candidate to explain the lack of female representation in Esports is demographic differences. While the market for games in general has shifted dramatically in the last decade to near-parity across genders, this does not hold true across all subdivisions. First Person Shooter (FPS) games are one of the most significantly skewed genres; young male target marketing demographics have been the norm for decades and have shaped the expectations of the industry. Overwatch is, in my view, the first FPS game that has truly sought to expand its playerbase beyond the expected demographic of the genre. The vast majority of FPS games are semi-realistic war simulations in which the protagonist is exclusively male. Beyond this simple fact, the marketing for these titles very clearly targets the young and male demographic beginning with advertising placement and continuing all the way through design appeal.

I don’t contend that the marketing for games in the Call of Duty or Battlefield franchises is somehow nefarious, rather I suggest that the industry has artificially reinforced the expectation that FPS games will always draw the attention and interest of male gamers. Since Overwatch is the first FPS title with player-base demographics that represent anything close to gender parity, one might expect to see more women rising to the top of the competitive scene. However, presently the overwhelming majority of professional Overwatch players are male. Another fact about the overwhelming majority of professional Overwatch players is that nearly all of us played a previous FPS title at a professional or at least intensely competitive level. I think, though, that gender parity in professional play is going to take some time given the past-professional experience that seems to be nearly a prerequisite to playing professional Overwatch at this moment in the game’s lifespan. If Overwatch is your first competitive FPS title, you are very unlikely to go pro relative to those with past experience. In other words: almost every pro player was already a pro in another game prior to Overwatch’s release. Even though more women are playing FPS games now with the release of Overwatch, it is the demographics of previous FPS titles that are being represented in the set of current professional players. The lack of female pros in those previous titles can be explained to a significant degree by the radical disparity between male and female player bases.

Cultural Norms & Bias:

It can be challenging to write about sexism in online (particularly gaming) contexts. Virtually everyone in Esports (myself included) would love for the sport to embody a perfect meritocracy wherein the best of the best are rewarded for their talent and dedication in precise measure. The reality, however, is much less rosy. Starting in the experience of every day players and criss-crossing the path to professional stardom lies a significant and undeniably gendered bias. Some friends of mine have reported that they choose not to communicate with teammates in Ranked Matchmaking for fear of being ostracized or harassed. Others choose to employ a voice changer or imply to teammates that they are in fact teenage males. It is the rare woman who defies these norms and fearlessly communicates strategy in voice chat (the only practical method of strategic communication). Sadly, it seems clear from the reactions when they do speak out that many players do not like listening to what their female teammates have to say. This is not universal, but it doesn’t have to be to create an expectation of toxicity.

Here the story of Geguri’s rise to professional signing is a powerful example. Although very talented players are regularly accused of cheating prior to (and sometimes even after) offline validation, few receive the level of attention that Geguri experienced. Multiple South Korean professional players spoke out about the ‘fact’ that Geguri employed artificial assistance to play as well as she does. One even suggested that he would quit his professional career were she exonerated. It was only when Geguri played on a live stream with cameras showing her hand movement on mouse and keyboard that the accusations abated. I imagine the latter accuser felt quite sheepish at this moment. Compared alongside the experience of a player like Dafran, any illusion of parity fades away. Despite popping up into the scene without any warning as perhaps the most mechanically talented competitor to ever play the game, no one publicly accused Dafran of cheating. There were some rumors as is to be expected from such a formidable talent, yet I couldn’t find a single professional player making anything close to the kind of accusations that Geguri was receiving. This comparison is imperfect because it exists across regions (Dafran is a North American player and Geguri plays from South Korea), yet it should nonetheless push an openminded reader to ask themselves if Esports really is such a perfect meritocracy. Many South Korean (male) pros display impressive talents in online play without attracting such confident accusers.

While I would refrain from defending the idea that its impossible for women to play games like Overwatch at a high competitive level under these conditions, it is clear to all who see with eyes unclouded that a gendered disparity exists. While I don’t think that communication or teamwork are in fact necessary to reach something like the 95th or 99th percentile, I can’t help but think this norm of invalidating female players discourages many from trying to go pro. If the vast majority of your experience of online play included teammates nakedly disrespecting your abilities and understanding, a professional competitive career would hardly seem like the next logical step. Going pro is virtually never a happenstance moment of luck, rather it is most often the result of a consciously set goal and a tireless dedication to improvement and growth. If some percentage of women are disincentivized from setting such a goal by these cultural norms, then the norms are at least partially to blame for the gender gap.

Personally, my experience has suggested that these biases exist (and perhaps even become more intense) all the way up through the highest echelons of professional play. In a crucial way, though, these biases are connected to demographics as well as external society. As the player base for FPS titles becomes more and more evenly spread across genders and as society continues to make progress on accepting women as full equals, I hope that these biases will melt away into history.

Structural:

I believe that it’s important here to bring into the light the underpinnings of these biases. Many people still believe that, for all the societal progress of the last century, men are fundamentally more capable or more intelligent than women. For these people, the lack of female representation at the highest levels of competition is only a warrant for their position. Rather than examining the wider picture, they find it much easier to reject the potential for progress. I sense that some of these people are threatened by equality and the emasculation that it potentially represents.

I can’t help but also believe that positive change in this area will benefit not only women but all participants in and fans of competitive gaming. If it is the case that some of these structural elements have discouraged talented women from pursuing professional careers, then it is also the case that the level of competition is not where it could potentially be. When Bill Gates spoke at a summit in Saudia Arabia on modernizing the Middle East for economic growth and business development, he was asked what would hold Saudi Arabia back from its goal of being a top 10 technology leader by 2010. “Well, if you’re not fully utilizing half the talent in the country,” Gates said, “you’re not going to get too close to the Top 10.”

Let us not be the Saudi Arabia of professional competition. It is the responsibility of everyone who loves Esports to create the meritocracy that is so patently within reach. Banishing the basement-dweller, women-hater stereotypes associated with competitive gaming is also a powerful rebuke to those who reject the core potential of Esports. This can be as simple as not harassing or mocking female teammates in online play. Being a ‘white knight’, however, can be just as repressive and quite cringe-y too. The solution is quite simple: treat female players the same way you would treat anyone else. Dear reader, I’m confident that you can do it.

P.S. Discussion here on the blog site itself is moderated by me; please be respectful with whatever opinions you wish to express.

Something to Strive For

Or: Rewards for dedication are severely lacking in Overwatch

Rancor is in the air. Many are calling Season 5 of Ranked Matchmaking in Overwatch the worst since S1’s disastrous coin flips. Due to the total lack of transparency from Blizzard, it is unclear if the Matchmaking algorithm has in fact changed (and led to genuinely worse matchmaking) or if tensions are simply reaching a boiling point. Either way, the potential for disruption in the market for competitive Overwatch matchmaking has never been greater.

The reasons are obvious: one-tricking remains a behavior officially accepted by Blizzard, the punishment system feels toothless, and the Skill Rating algorithm is embarrassingly manipulable. 1/10 Ranked games feel competitive and interesting on a good day.

Even if these glaring failures are rectified, the prioritization of queue time minimization has left striving for the top of the ladder feeling deeply unrewarding. Overwatch, from a fundamental game-design perspective, is the eSport with the greatest demand for constant coordination. Games like CS:GO, Dota 2, and League of Legends reward coordinated executions and smart team play, but Overwatch demands it constantly. True 1v1s are incredibly rare and virtually every fight is decided with crucial contributions from many players. As individual SR presses past 4300, however, wins and losses are decided by carry play and team coordination goes out the window. When a 46-4700 rated player solo-queues into a game, it is virtually impossible that his/her teammates will be able and willing to keep up. Although queue times stay relatively fast with this system, it feels as if the matchmaker asks only the question of which team will more effectively stymie the efforts of their one or two carry-players. There’s no value in a brief queue time if the majority of matches are poor quality.

As a player at this skill range, these sort of games are incredibly frustrating. Although Ranked Matchmaking will never perfectly simulate an organized competitive environment, its power to shine the spotlight on new talent (as in other eSports titles) is directly correlated to the degree of similarity it can achieve. One of the most compelling parts of eSports is its accessibility. There is a sort of egalitarian charm to the idea that anyone can make a name online and earn a chance to be rewarded for their dedication and skill. Overwatch is failing terribly in this respect.

A competitor to Ranked Matchmaking (similar to the offerings of Faceit or ESEA in other games) may be the best path forward. There is tremendous demand for a more meaningful proxy to true competitive Overwatch, both from established professional players and from those who wish for a legitimate arena in which to display their potential. Something as simple as a captain’s draft system or a classical Elo measurement would yield a product far superior to what Blizzard has produced.

Beyond prizes and external motivations, I know that I would personally pay for a subscription just to guarantee a consistently serious and competitive mindset among my teammates. Ranked in its present state is a remarkably poor environment in which to practice the most important skill of Overwatch: team play. I had hoped Blizzard would act faster, but the deterioration of the past few seasons makes one thing strikingly clear: Blizzard’s game development priorities seem to put Quickplay on par with Top 500. For the organic growth of the eSport in the long term, the need for something to strive for is greater than ever.

 

 

P.S. My apologies for the delay between articles. I’m taking college courses online now in order to finish my degree (on top of World Cup practice), so my time is a bit more constrained than usual. As always, let me know what you think in the comments and on twitter at @jake_overwatch

The Perfect Meta-game

And how to achieve it.

It’s time to go there. This piece will be structured with relatively simple contentions, the defense of which aims to construct a coherent set of guidelines for how to balance Overwatch most efficiently and effectively.

For those with a bit less eSports savvy: in Overwatch, the ‘meta-game’ is comprised of the sum of expectations about which team compositions are strong in certain situations.

Firstly, the standard:

Claim 1: The degree of freedom that a meta-game instantiates is the best available standard by which to evaluate its quality.

I contend that the ideal meta-game consists of the maximum amount of competitively viable team compositions and styles of play. There is no objective way to measure what makes a game fun, however I would argue that novelty is a close proxy and a goal internally worthy of pursuit. Novelty is best measured through variety, counterplay, and creative potential; in other words the degree of freedom that a meta-game instantiates.

We can compare this standard against the emotive responses of the playerbase to help evaluate its quality as a metric for meta-game quality. The infamous Quad-Tank meta (under which my team, then Bird Noises, made its name) was near universally despised. What my team discovered in this patch was that there was no need to run any other composition in any circumstance; the only counter to Quad-Tank was to play Quad-Tank more aggressively than your opponent. This would be an example of a meta-game with an extremely low degree of freedom; only one composition and one play style is viable. Here my standard would concur with community sentiment of the time; a meta-game with less choices is less fun.

Comparing this meta to the post-Dva-&-Ana-nerfs meta, nearly everyone would agree that the ‘quality’ of the meta-game went up relative to Quad-Tank. Out of the three plausible off-tanks in this meta-game, different teams chose different pairs out of the set of Dva/Zarya/Roadhog. Some teams (e.g. Selfless) bucked even the name of the meta-game and chose instead to play a 2-2-2 style with a very high degree of success. While virtually the rest of the world continued to play Rein-centric compositions, Rogue impressed everyone paying any attention with a Triple-DPS dive comp that took the competitive scene by storm, proving its viability with undeniably dominant results. Again, my standard matches the sentiment of the community in declaring this meta much more fun than the one preceding it.

If asked to evaluate the present Counter-Dive meta, most would call it a regression from what was previously achieved (although perhaps better than metas like Quad-Tank). Once more my standard concurs with this sentiment, since the spectrum of viable compositions and play styles has grayed drastically over the past few patches. Presently, Dva/Winston/Tracer/Lucio are approaching perma-run status with a few exceptions on exceptionally enclosed or flank-less map locations. The choice between Zen/Ana and Soldier/Genji (or Pharah+Mercy) with the occasional and situational Sombra flex is essentially all that is available to competitive teams. Apex results seem to show that even Rogue’s unmatched mastery of the Triple-DPS play style was insufficient to overcome the dominance of the 2-2-2 meta. Those stubborn teams that have stuck to Rein-centric compositions have been consistently trampled underfoot by one very angry scientist.

From these instances, I conclude that what makes a meta-game good or bad is the degree to which teams can convert their unique individual styles and ideas about the game into genuinely competitive strategies. Fostering creativity as a means to victory is a powerful way to elevate Overwatch above the aim-duels that are lent such primacy in mirror matches. As a side note, I believe that diminishing the importance of these extremely mechanical aim-duels and elevating the importance of team-composition makes Overwatch vastly more entertaining and watchable from a spectator’s perspective. The narrative of one team outsmarting the other is much more compelling in my eyes than that of the more skilled players dismantling their weaker counterparts.

The immediate next question to ask once one accepts this standard is ‘how does one best achieve the maximum degree of freedom in a meta-game?’. This question is slightly more complex, yet no less answerable:

Claim 2: At their core, Overwatch’s meta-games and overall balance are about team composition.

Winning or losing a game of Overwatch depends entirely on a team’s ability to successfully attack and defend various objectives within a roughly given timeframe. As tempting as it is to consider a hero’s balance in a vacuum, such an hero-centric approach to balancing is doomed to failure.

It seems quite plausible that the vast availability of statistics regarding hero play in Ranked Matchmaking has tempted the OW dev-team to think of each hero as an island. When a hero seems to be winning or losing a little too often it seems a prime candidate for a nerf or a buff, respectively. This logic misses what was in front of our eyes the whole time, that one hero choice is only strong or weak relative to other options and the team composition that surrounds and opposes it. Heroes don’t win games, compositions do.

Consider Genji. In Triple-Tank his role is essentially to farm Dragonblade as quickly as possible to participate in combo play with his primary enablers: Lucio, Ana, Zarya, Rein, etc. In dive compositions, however, Genji acts as the secondary initiator alongside Winston and Tracer. Dive seeks to enable the Genji to maximize dash resets while the primacy of Dragonblade is significantly reduced relative to Triple-Tank Genji play. The shift in team composition fundamentally alters the role of the Genji player as his primary ‘partner heroes’ become fellow damage-dealers rather than defensive enablers. This is a crucial distinction to recognize. Hypothetically, were Genji oppressively strong, composition-defining, and thus demanding of a nerf it would be very important to change him in the right way so as to properly affect the meta-monopolizing composition without fully eroding his general viability.

Dva can benefit from a similar analysis, sans hypotheticals. After her originally massive buff was toned down, she didn’t feel oppressively overpowered in tank compositions. Her mobility wasn’t so incredibly useful in slower compositions, yet it felt like she had a good place in countering spam-centric opposing team comps and enabling more aggressive DPS choices in Triple-Tank (like Genji). Without any changes directly to Dva, the massive buffs to Winston, Lucio, and Zenyatta combined with Rein & Roadhog nerfs have left her feeling oppressively strong. The Zenyatta buffs and the Lucio rework established a much more cohesive backline than had ever existed in Rein-less compositions. Dva perfectly fit the niche of peeling for this backline perfectly while also soft-countering Discord Orb and often preventing the all important Dash-resets of Genji comps. This instance reveals that hero balance cannot be examined in a vacuum, even with statistical evaluation; Dva shifted from ‘viable-yet-unpopular’ to ‘must-have’ without a single direct change to her kit.

Herein lies the biggest problem to successfully balancing Overwatch. The above paragraphs are significantly less true if we are considering Ranked Matchmaking rather than organized competitive eSports. In Ranked, the near total lack of coordination greatly diminishes the importance of full compositions and lends much more credence to claims that a hero is strong or weak in a vacuum. Without fixing Ranked play (see my earlier blog posts on the subject) I can’t imagine a solution to this dilemma, except to plead with all my heart that Blizzard prioritize balance for those who dedicate their dreams, careers, and lives to Overwatch.

Playing eSports doesn’t make you better or more valuable than a casual player, but I believe that that kind dedication is deserving of the respect and priority of the dev team. If a character is a bit too strong in low-skill public games, some casual players will have an infinitesimally more difficult Ranked experience. If the Overwatch eSports meta becomes stagnant and/or unenjoyable to watch, careers and lives are potentially ruined. The best of the best will find success regardless, but it is the scale of the eSports scene upon which those on the margins of top play depend. Furthermore, I would argue that balancing for eSports will ultimately benefit the whole playerbase, although that’s a topic for another article.

The world could always use more heroes.

Claim 3: Presently, the game is more defined by choice of Main Tank than by any other role. Choosing Winston or Rein will dictate more strategy than almost any other role selection.

With the heroes presently available in Overwatch, the degrees of freedom available in terms of composition and strategy selection are almost entirely dictated by Main Tank selection. When a team selects Winston, more than half of the heroes in the selection screen might as well be blacked out for how weak and non-viable they are in aggressive dive compositions. Reinhardt hero selection acts in a similar way, except that he fully ‘blacks out’ fewer heroes and rather simply demands that a significant portion of his teammates’ heroes are devoted primarily to his defense (a role for which there are only a few meaningful choices).

Under this situation, then, ensuring the viability of both Rein-centric and Winston-centric compositions (as close to a 50/50 as possible) is what will result in the most variable and creatively adaptable meta-game. In the short term, this is the only solution to stagnant meta-games that prevent individual and team flavors from expressing themselves in team-composition choice.

Ideally though, I’d like to see heroes that either add a third option to the Rein/Winston dichotomy or allow the game to potentially be played in a way that isn’t so fundamentally tank-centric (although this may simply be a reality for Overwatch in the medium term). I’m looking at you, Doomfist…

 

If you read this far, don’t hesitate to give me feedback in the comments or on twitter at @jake_overwatch. This article was pretty intensely theoretical, so if you made it all the way through I appreciate your dedication.

I’d also like to thank Wojtek for his instrumental assistance in refining this piece and also for inspiring its focus.

 

Elegy for a Swine

or: Constructive Feedback on Roadhog’s Balancing

Roadhog was the first character in Overwatch that I really wanted to master. Coming from TF2 as an avid MGE (My Gaming Edge is a popular 1v1 practice mod in TF2) Soldier player, discovering the balance between the Rocket Launcher and the Shotgun lent incredible depth to a character of such apparent simplicity and formed the foundation of my passion for competitive gaming. Diving into the nuances of Roadhog, I felt just the same as I had in the early stages of mastering MGE’s Soldier duels: perfect play was so clearly possible and yet always tantalizingly out of reach.

Missing or landing a hook almost never felt like a game of chance, and a perfect understanding of range was a must-have to do battle while Hook was on cooldown. The character rewarded skill with unique carry-potential and punished mistakes and poor positioning with significant contributions to enemy ultimate tempo. Then came the hook 2.0 update.

It only took a few minutes of playing the new Roadhog for me to recognize the gravity of what was lost. Impressive hooks were often broken by odd geometry or falling opponents, only occasionally did this new mechanic yield the sense that your target had truly outplayed the hook. On the receiving end, I felt the same. Once in a while I truly intended to sidestep a hook and broke it with my cover, but more often than not my response was to say a quick prayer to RNGesus.

The mechanical change to the way the pull itself occurred was also rather shockingly bad. Characters hooked off of high ground or from height were not brought straight to the Road player, but rather in a diagonal trajectory that put the two players on level ground. As someone who practiced with the original hook mechanics for hundreds of hours, this change was both annoying and counter-intuitive while having no discernible impact on the balance of the character. Some heroes were originally quite difficult to consistently one-shot combo as Roadhog, most notably Ana. Prior to these changes, only a very small minority of players could truly achieve a very reliable maximum damage combo. The change to pull consistency was perhaps well intended yet in my view only achieved a ‘dumbing down’ of the character’s fundamental mechanics.

I don’t contend that no nerf was deserved, but rather that changes to Roadhog have been poorly designed. Roadhog as he was on release was most certainly overpowered. His one-shot potential was simply too high and counterplay options were sharply limited by his low cooldowns. The hook was also apparently designed for the lowest common denominator of players with a hitbox nearly the size of a payload cart. Despite these problems, at the end of the day the Pig was a hell of a lot of fun to play because the hook was a hell of a lot of fun to use.

The initial changes proved insufficient to properly balance the hero, so the devs turned next to a 33% increase to Hook’s cooldown combined with buffs to the spread of the Scrap Gun and a decrease in pull distance to compensate. The intent was apparently to make him less reliant on his role-defining cooldown and remake him in the style of a classic DPS character. Philosophical problems with this kind of change aside, this patch led to a significant decrease in Roadhog’s vulnerability during Hook cooldown and a significant increase to his ability to drop enemies into death-pits.

In the most recent balance patch, our beloved swine was finally driven into competitive irrelevancy with Ranked win rates approaching 40% and a near total lack of playtime in professional play. The developers had the following to say: “The Scrap Gun changes reduce the power of his hook combo and alternate fire burst damage potential while still keeping his DPS roughly the same.” (For those unaware, the recent patch decreased Hog’s damage by 33% while increasing rate of fire by 30% and clip size by 25%).

The notion that Roadhog could still realistically output the same amount of DPS as pre-patch is pretty laughable. As soon as I saw these patch notes on the PTR I knew Roadhog was destined for the garbage bin if they went live (and live they went). In a game with healing as cheap and effective as it is in Overwatch, burst damage is vastly stronger than damage dealt over time. Just because Roadhog has about the same ability to break a Rein shield (under perfect conditions) as before doesn’t mean that his meaningful DPS will be anywhere close to what it was. Furthermore, dealing the same DPS requires landing more shots than before and thus exposing oneself more than before.

If the developers had written that ‘Roadhog was much too strong and that these changes were intended to bring him in line’ I would disagree with their assessment but agree with the means by which they responded to it. What is really upsetting is that, from the above developer comment I interpret that they didn’t see these changes as a significant nerf at all. Perhaps with the decrease to his critical hitbox size they potentially even saw this update as a buff. The reality, however, is that these changes are perhaps the most significant nerfing any character has received throughout Overwatch except perhaps all of the Ana nerfs combined into one patch. Worse yet, this brutal blow from the nerf-bat was delivered to a character that was already fading out of competitive viability. What that says about the developers’ understanding of their own game is up to the reader to decide.

I am happy and sad at the same time with these changes. Happy because Roadhog stopped being fun for me with the first iteration of Hook 2.0 and now he is so competitively irrelevant that I’ll never need to touch him again barring radical re-balancing. Sad because Roadhog was the character that first made me want to become great at Overwatch.

All along, the changes Roadhog needed were so simple. Were I balancing Overwatch, the next balance patch would do the following.

  1. Reset Roadhog to exactly as he was on release
  2. Hook cooldown to 9 seconds
  3. Hook hitbox size decreased by 33%
  4. Take a Breather healing down to 250 from 300 (or 1-2 second increase in cool down)

Being deleted in one shot isn’t very fun. For newer players it is probably quite frustrating since they don’t understand the game well enough to really engage with counterplay options. These changes will make successfully landing hooks much harder, remove the original ability of the hook to pull players who were completely out of line of sight, and increase the size of the vulnerability window that Roadhog creates when he uses Chain Hook. Reverting the spread changes will push the character back into his original role of a space-denying tank and defender of back lines and further open up counterplay options to reward players who successfully bait out a Hook. If Blizzard remains really resistant to reverting the hook-break mechanic, the hook should instantly stop the motion of its target from the moment of connection through the completed pull so that skillful and ‘legit’ hooks are at least more rarely broken by gravity or odd map geometry.

 

P.S. This essay was more in a narrative structure than my previous pieces because I felt that it served the point I was trying to make better. Let me know what you thought in the comments; more pieces and potentially more new styles coming soon.

In Defense of Purist Skill Rating

Weds. Jun 21st:

Intro:

This essay will defend a vastly simpler implementation of Skill Rating adjustment than currently exists in Overwatch’s Ranked Matchmaking. I will suggest that removing all influencers of Skill Rating besides winning & losing (adjusted to game difficulty) will result in a number of improvements to the Ranked Matchmaking experience, especially with an eye towards the OWL and the eSports possibilities for Overwatch in general.

Incentives & Behavior:

Most game theoretic models begin with a simple assumption termed ‘rational self interest’, or the idea that individuals will take the course of action which most benefits themselves. This assumption is imperfect, as humans have been repeatedly shown to exhibit altruistic and pay-to-punish behavior patterns in empirical studies. However, broadly speaking, the notion that people will act in service of their own goals is a plausible one. It is especially so in an online context that lacks face-to-face empathic accountability.

Beginning from rational self interest, then, we can understand and predict the behavior patterns of players in Overwatch by examining the incentive structures that they face. Furthermore, alterations to these incentive structures have the power to dramatically change the decisions players make and even the mindset with which individuals approach the game.

The most clear and impactful incentive that Overwatch players (or at least those that choose to play Ranked Matchmaking) face is Skill Rating (hereinafter ‘SR’). Rising through the ranks feels satisfying and validating, placing in a top division can be a status symbol, and a high top-500 placement might even land you tryouts to play professionally. Naturally, then, many players are highly incentivized to seek to maximize their SR.

Skill Rating Maximization:

SR maximization will always be an incentivized behavior pattern. People want to be highly skilled, but more than that they want to appear to be highly skilled. This distinction seems small but is in fact very important. Crucially then, the key motivation for many (especially for the vast majority of players who will never compete in an eSports context) is to reach the highest SR that they can. This should be juxtaposed against the incentive to become the best player one can be: seeking to have the maximum impact upon a given team’s win probability (i.e. the eSports motivation).

Ideally then, the SR system should be set up such that ‘SR maximization behavior’ guides players to make the sort of decisions that positively impact the community and create the best gameplay environment possible. In my judgement, such an ideal system would align the SR maximization behavior with the eSports motivation, especially with an eye towards the Overwatch League. The current system fails to accomplish this alignment.

One Trick Players (OTPs):

While ‘one-tricking’ is not a behavior that I think should be actively discouraged or disallowed, I contend that it’s also a behavior that shouldn’t be specifically incentivized. In my view, the ideal system would be entirely equivocal towards OTPs.

Consider a hypothetical Mercy OTP (anecdotally the most commonly one-tricked hero, although I don’t have data that support this) who has reached a very high SR with essentially no other heroes played.

The current SR system rewards players who are playing at a high skill percentile compared to other players on that hero. This comparison is drawn not within one game instance, but rather across the entire dataset of all Ranked Matchmaking time played on that hero. What this means for our hypothetical Mercy OTP is that, so long as he/she plays better than other Mercy players, lost games will net a smaller SR drop and won games will net a larger SR gain. This impact is so significant that winning vs. losing is in fact a secondary concern to the ‘Mercy percentile’ our OTP is playing at.

We’ll get back to our hypothetical OTP in a moment, but now let’s take a step back to examine the bigger picture. The current SR system is crucially problematic for many reasons, but I’ll focus on two: (1) statistical judgements of skill are weak (for some heroes more than others) and (2) it leads different players to have different incentive structures.

(1) Statistical Judgements of Skill Are Weak:

The strength of this proposition is such that I’ll use the best counterexample as my own starting point: McCree. He is a hero with extremely low utility, extremely low survivability, and extremely high damage potential. A player with high accuracy, high damage per minute, and few deaths per minute is very likely to be a higher impact player than someone with weaker statistics. Such a player is minimizing McCree’s weaknesses (i.e. avoiding death) while playing to his strengths (high damage output). It is very likely that such a player is contributing more to an average game than a player with worse statistics. Even for McCree, though, these statistics are imperfect. Is a given player’s damage relevant? How often is he/she spamming enemy heroes without any plausible follow up (i.e. feeding ultimate charge to enemy supports)? A player who hits a few precise shots to pick a key player at a key moment (e.g. a support at the beginning of the fight or a DPS who is preparing to ult) is inarguably much more impactful to securing wins than one who merely sits in the back making poor focus decisions, yet the latter player would be statistically superior by the previously stated standards.

We can apply this same analysis to quite a few heroes, revealing that statistical judgements of skill become weaker and weaker as we move from the most mechanically demanding heroes in the roster to those with very little ‘traditional FPS skill’ requirements. Even a hero such as Roadhog demands a deeper statistical evaluation to really get at skill. One must weigh damage per minute and survivability against damage taken, as a great Roadhog knows how to minimize his exposure and with it the rate at which he feeds the enemy team ultimate. There is no magic formula to successfully achieve such a balancing act. How can one statistically capture the impact of a Whole Hog that prevents a Dragonblade and a Primal Rage from destroying one’s backline (while doing very little damage and earning no kills)? In a game as complex and decision-rich as Overwatch, I don’t see a way that these judgements can be made accurately and reliably by a predetermined formula.

The ultimate example of how useless statistical measurements of skill are–and how bad percentile-based SR adjustment can be–is of course my favorite foil Mercy. The impact of virtually every aspect of Mercy’s kit is poorly captured by statistical measurements. Hitting a 5 player Resurrection that is responded to by a 6 player Earth Shatter or Graviton Surge is in fact game losing. The statistics show a high ‘resurrected players per ultimate cast’ while the reality in game is that the enemy team just farmed MULTIPLE new ultimates. The entire HP pool of your composition just went into the enemy team’s ultimate bank TWO TIMES OVER. I can’t really overstate how bad it is to make a poor decision about using Resurrection. In these cases, not only would it have been better to save one’s own ultimate, but also it would have been better to disconnect from the server and let your team play 5v6 because at least then you would have had a chance to swing Ultimate tempo. Even if there is no immediate Ult-response to a big Resurrection, if your team fails to win the fight the situation is the same: massive Ultimate tempo swing to the opposing team. Very often, the most impactful Resurrections are instant casts to revive one key player that just died (because the opposing team has often expended cooldowns and cannot kill them again). Thus, playing to maximize the statistical measurements of Resurrection (i.e. waiting for a big Res) is in fact seriously detrimental to the success of the team.

Resurrection is furthermore a relatively weak support ultimate because it requires your teammates’ deaths instead of preventing them as all of the others do (once again Symmetra is not a support). Thus a very smart Mercy player actually chooses not to heal in many scenarios so that her support partner can get his/her ultimate faster. Heals per minute is therefore a fickle statistic whose maximization does not reliably communicate skillful or intelligent play.

Low deaths per minute and high damage boosted are the only statistical measurements of Mercy play that I see as actually meaningful, as these statistics communicate intelligent play and impact maximization. Solo kills with the pistol are also probably quite meaningful, but of course a Mercy player who seeks these out at poor times would be called a thrower. It’s not that Mercy is a ‘no-skill hero’, the key problem is that skillful Mercy play is almost never communicated by impressive stats. Even these statistics I mention as impactful fail to even come close to telling the whole story of player skill and game impact.

(2) Failure to Align Incentives:

Not only are OTPs highly incentivized to  by the current SR system to continue one-tricking and to play for statistical maximization over wins and losses, these incentives are crucially opposed to the incentive structure that flexible players face. A flex player knows that he/she won’t be playing at the far right tail of his/her heroes’ skill distributions because his/her mastery of the game is spread across many heroes and many situations. The flex player seeks to achieve a high SR by playing the perfect hero imperfectly while the OTP seeks to achieve a high SR by playing the imperfect hero perfectly. While I don’t think that either of these strategies is deserving of punishment, I think that its important that the system not prioritize one over the other at any echelon of SR.

In the current system, the flexible player must maintain a higher win percentage (abstracting away from game difficulty) to reach the same SR as the OTP. This is deeply problematic in my eyes, as I see hero swapping as a fundamental part of the game. If an OTP doesn’t wish to engage with hero swapping as a part of gameplay, that’s fine, but their SR should reflect that choice. The same goes for players who don’t wish to engage with communication as a fundamental part of the game: you don’t have to talk, but if you lose games because of it then that is on you and ought to be reflected in your Skill Rating. A truly great player has the knowledge, intelligence, and decisiveness to pick the right hero for the right situation, filling in the gaps of his/her team composition while at the same time countering opposing composition decisions. Not every player has to aspire to be the greatest player of all time, but in my view the entire purpose of having a Skill Rating system to begin with is to measure and validate that very pursuit of greatness.

Suggestion:

Incentive alignment is a goal very worth of pursuing. When all players have the same goals, the potential for toxicity is greatly diminished (though certainly not eliminated). I personally find it quite frustrating to queue into Ranked Matchmaking with the goal of winning games, only to find other players do not share the same incentives. At the very top of the Skill Rating system, one should find other players that want to win games, not those that wish to engage in roleplay. This isn’t to say that OTPs can’t be good or impactful to winning games, my argument is rather that OTPs should be judged by their wins and losses rather than by the extent to which they engage in one-tricking. The current system punishes adaptation and experimentation vastly more than it needs to.

There is only one way to guarantee that every player has the same incentive: strip away all of the hidden formulas and percentile adjustments. Only when each player has only one incentive–to win–will incentive alignment truly come about. The only thing that should impact the SR consequences of a win or a loss is the relative skill of each team. Win a hard game and you should clearly be rewarded more than for winning an easy game, vice versa for losses.

The meaningfulness of Skill Rating is especially important as it is the only clearly available measurement of player skill outside of actual eSports experience. With the Overwatch League on the horizon, the time is now to restructure the system such that the very best rise to the top and have a fair shot at becoming professionals. Right now, the only way to scout talent is to do it on an individual, observational basis. Look at Dota 2, you will see fresh talent rising out of Ranked Matchmaking and being given a shot at a professional career simply for reaching the very top of the ladder. That’s because their MMR system answers exactly one question: ‘how good are you at winning difficult games?’

If I worked at Blizzard, I’d be demanding a HARD Skill Rating reset at the end of this season and an entirely purified win-loss SR adjustment regime going forward. If Blizzard really wants the best of the best to get their chance at fame and fortune in eSports, then there really is only one way.

Counterarguments:

The existence of percentile SR adjustment is primarily, in my understanding, to combat smurfing (or the purchasing of new accounts to play at a lower level than one’s true skill). Want to get serious about smurfing, Blizzard? IP & MAC check new accounts and tag them for evaluation while adding a report option for suspected smurfs to cross reference: if you can statistically target and punish throwers then there is no reason you can’t statistically target and adjust smurf accounts. It’s fine if statistical adjustments are used in exceptional and targeted cases, just get rid of them as the default for the entire player base.

“But I wanna one trick!” Go right ahead. No one can (or should) stop you. But if you lose games because of it, don’t expect special treatment. OTPs don’t deserve punishment, but they certainly don’t deserve specific rewards over players who choose to engage with hero-swapping as a fundamental and crucially necessary mechanic in Overwatch. This is especially the case as Blizzard is beginning to employ SR as a way to qualify for tournaments (see: OW Open) and they seem to be considering it as a potential scouting mechanic for new talent once the scene is more established.

To Blizzard: fix it now, or condemn the eSports potential of Overwatch in the long run.

 

EDIT: An earlier version of this article referenced Contenders as an example of a SR-gated tournament. This is inaccurate, as Contenders was never SR restricted. Rather it is the Overwatch Open that Blizzard is requiring a certain SR for.

The Fundamentals of Balance

Wed. June 14th:

Intro

This essay will focus on the balancing philosophy expressed in Overwatch. It will analyze the achievability of the Overwatch team’s apparent desire to achieve balance in all skill brackets simultaneously, and make a few suggestions to this end.

A Philosophical Problem

Before one evaluates the success and failure of specific Overwatch patches, one must establish a clear value–a metric by which to gauge the degree to which a change makes the game better or worse.

Jeff Kaplan, in an AMA three months ago, described the Blizzard approach as a ‘triangle’. “I feel like there are 3 key factors that guide us: The players, statistics and… us… our own feelings as players.” He continued on to add that “Internally, we have a ‘competitive’ playtest that’s helpful to get good feedback from Diamond+ players who work here […] None of this is perfect… but we try hard to listen to feedback and keep the game balanced.”

Ultimately, the system Jeff describes here (also confirmed by other Dev posts on the battle.net forums) is one that seeks to achieve relative balance throughout the skill spectrum. All three points of his triangle belie this reality: player feedback, developer intuition, and even statistics to some extent abstract away from player skill. Keeping a sharp eye on professional pickrates would be importantly revealing, but at the very least it isn’t clear that this is happening. The notion of balance-for-all seems nice enough prima facie, but further analysis reveals a considerable challenge to successfully implementing this broad balance goal.

This fundamental challenge is skill curve differential. Different heroes in Overwatch have remarkably different rates of return on skill growth investment; this is to say that they have significantly distinct skill curves. I use ‘skill curve’ here to mean the rate at which performance (i.e. game impact) increases with constant skill growth.

To illustrate the skill curve differential problem, consider two heroes: Genji and Junkrat. Now consider two players corresponding to each of these heroes. One of each in the 10th percentile of skill (worse than 90% of players) and one of each in the 90th percentile of skill. (worse than just 10% of players). The 90th percentile Junkrat is certainly more impactful than the 10th percentile Junkrat player, but the gulf between the 90th and 10th percentile Genji players is vastly larger. The 10th percentile Genji player is a glorified rock-slinger. Unable to consistently leverage dash resets or find high value reflects, he or she has far less game impact than the 10th percentile Junkrat player. When we reach the right tail of the skill distribution, however, exactly the opposite situation persists. Against strong opponents, Junkrat lacks high-level outplay options and is ultimately left to punish misplays or exploit weak links in the opposing team. At the very highest levels of professional play, this is why he is essentially unplayable outside a very small niche. For our high level Genji player, it is a different story. The design of the character yields exponential gains to game impact resultant from skill growth: as accuracy, speed, and aggression increase so do mobility and longevity in a positive feedback cycle.

Every character has a skill curve of some slope, that is, there is no hero which can honestly be said to require ‘no skill’. However, one can see the skill curve differential problem even embedded in core hero statistics. Ana and Mercy are both powerful single target healers (comparing two different heroes will always be comparing apples to oranges, but hopefully this example is nonetheless illustrative).

Heal Rates: (source: owinfinity.com)

ANA: 75 healed per shot * 1.3 shots per second * (% accuracy) = Effective Heals per second

MERCY: 60 = Effective Heals per second

These Effective Heals per second values equalize when the Ana player’s accuracy reaches ~61.5%. That is to say that an Ana player with accuracy lower than that value will heal less per second than a Mercy and an Ana with higher accuracy will heal more per second. The point of equalization isn’t particularly important, but the fact that Ana is able to do her central job as a healer (healing) faster and more reliably the higher her accuracy gets reveals that her skill curve is steeper than that of Mercy. This cuts both ways; at the far left tail of the skill distribution (where % accuracy values are generally much lower) Mercy outputs more heals per second than Ana. Mercy has important decisions to make in order to maximize her survivability, but Ana’s self defense options are no less complex or skill dense (in fact they, like her healing rate, are significantly more responsive to skill increases)

Ana and Mercy, Genji and Junkrat: contrasting these pairs reveals the central difficulty of simultaneously satisfying players across the entire skill distribution. Professional players lament that Junkrat is meme-trash-tier in organized competitive play while he simultaneously reigns as the uncontested King of Brawl Winnin’ and The Silver Division. Ana’s winrate, meanwhile, steadily climbs with skill tier from a tragic 38.9% in Bronze to a respectable 51.9% in Grandmaster.

Nowhere are the consequences of the skill curve differentials more apparent than when comparing Ranked Matchmaking (of any level) to organized professional play (hereinafter ‘eSports’). Mercy, statistically speaking, performs well (above 50% winrate) all the way up to the top few percentiles of Ranked Matchmaking with a remarkably high pick rate. In professional play, she goes virtually untouched outside of the Pharah + Mercy combo.

This difference is a consequence of an added challenge of balancing popular online games that are also eSports. True coordination (in composition choice and game style) radically changes the way the game is played. Because Resurrect is fundamentally reactive, high level teams will often simply not allow an unsupported Mercy to garner value from her ultimate. She will be hunted by flanking DPS while the rest of the team intentionally staggers kills or saves ultimates to reduce the effectiveness of any Resurrect the Mercy is able to cast. As someone who has spent every season of Ranked Matchmaking at the very highest level of play, I can attest that these sort of plays are rarely if ever made regardless of the skill level of players on either team. I contend that an important reason Ana is so weak in low-tier play is that she demands coordinated protection to fully leverage her abilities (coordination that is virtually nonexistent at low level play). Likewise Mercy is incredibly punishing of undirected or uncoordinated play. Fail to hunt her down at the proper time or forget to save a key ultimate to counterplay Resurrect and a teamfight is quickly lost.

So what can we do? How can Overwatch feel fresh and full of optionality in an eSports context while also remaining balanced and enjoyable to play for those further to the left on the skill distribution?

Moving Forward

Skill curve differential isn’t going anywhere, and in my opinion it shouldn’t. Blizzard intelligently marketed Overwatch much more widely than the traditional first person shooter target audience. This wasn’t just a marketing strategy though; the game design purposely features heroes, for instance Mercy, that aren’t so demanding of traditional arena shooter skills and rather allow positioning and decision making to determine game impact. In the long run, I think that this is a good thing. Purity is the enemy of innovation while community stagnancy is in direct opposition to promotion to a wider audience (something absolutely critical to achieving a public perception of legitimacy for eSports and even gaming as a hobby).

The only important question that remains is how to rise to the challenge of balancing for diverse skill tiers simultaneously. The approach that I’d like to see taken more often is the differentiation of mechanical changes and statistical changes.

Sometimes a number gets into the game that is simply broken. Bastion’s 35% value for his Ironclad passive springs to mind as a classic example of “utterly fucking busted”. Sometimes a character just doesn’t have the stats to compare favorably against his/her/its closest substitutes; pre-buff Soldier 76 is a good example. I don’t have date-accurate statistics for the strength of these heroes across skill tiers, but I contend that pre-buff Soldier 76 was probably too weak at every point on the skill distribution and pre-nerf Bastion was probably too strong at every point on the skill distribution. For these kind of across-the-board balance issues, statistical adjustments are warranted as they will have similar impacts on players of all skill levels.

These are the easy variety of balance problems. For the more complex varieties, a mechanical change in isolation or a combination of mechanical & statistical changes is necessary.

A strong example of a very good combination buff is the recent (live) patch to Hanzo. Hanzo felt a little too weak across the board, but at a high level aggressive compositions came to render him nearly obsolete. The 10% charge time buff to Hanzo is significant, but I would argue that even more impactful for high level players is the ability to hold a charged arrow while wall-climbing and to spawn your Dragonstrike early if the arrow collides with a wall. These changes make the space of options for Hanzo players significantly wider and enable much more aggressive and independent play. However, this kind of freedom doesn’t aid those who aren’t ready to use it. The change in totality made Hanzo players of all skill levels slightly stronger but had a significantly greater impact on expert players who can most creatively leverage the new mechanics. Widening the space of options doesn’t make a big difference to players who weren’t already pushing the boundaries of how a hero can be effectively played.

We can use this same mechanical vs statistical differentiation to better examine the past Genji nerf that removed his ability to triple jump in one continuous airtime via wall climbing. For low tier players who weren’t even aware of this possibility, the change had virtually zero impact. For high tier players who were exploiting it as often as possible to maximize mobility and survivability, the change had real consequences to Genji’s overall strength and playability. My honest assessment of the Overwatch development team is that they never thought about these differential effects and instead saw the triple jump as just an unintended bug to be patched out. The ledge-dash-super-jump mechanic was probably thought of similarly, and patching it out only really affected the few hundred (I doubt it was really this many) players who could hit it reliably enough to implement it as part of their play style. The important lesson here is that these pure-mechanical patches had radically different impacts on players of different skill levels.

These two examples provide a powerful blueprint for the formulation of balance adjustments that demand different impacts upon different skill tiers:

If a hero is in a good place for low-skill players but too weak for high-skill players: widen the option space by loosening mechanical restrictions and let creativity and talent shine through as increased game impact by high-skill players.

If a hero is popular and strong in the hands of high-skill players but a bit weak when used by newer players, combine a statistical buff with a restriction of option space. Make the hero more narrowly defined and yet more powerful within that narrow role. This variety of change must be done most carefully, though, as elite players will always seek to exploit any statistical buffs to their maximum potential even if it requires playing the hero in a radically different way (see the most recent attempt at nerfing Lucio).

That’s the theory, but here are my resultant suggestions for real balances changes. Feel free to leave feedback on the article as a whole or just these ideas! My twitter is @jake_overwatch 🙂

Suggestions:

Bastion is a worse choice than soldier 76 virtually 100% of the time in high level play, but has a comfortable niche at median and below skill. Remove the self-stun upon Tank Transformation and when returning to Recon mode to allow for more aggressive initiations and the option to use Tank Form as effective counterplay in a fight. Also remove or adjust the Self-Repair animations that block the crosshair (that shit’s just annoying, yo). Average players will play just like they always have, but those smart enough to leverage these adjustments into a much more aggressive style will reap the rewards.

Widowmaker has felt incredibly map-dependent across the skill spectrum even after her charge-time buff. Decrease Hookshot cooldown (I suggest by 2-3 seconds) to increase mobility and escape options versus the dive composition that has come to define the meta. It is very dangerous to buff this hero with pure DPS, but giving her a slightly less narrow role might help her pick and win rates with skilled players.

Junkrat is an effective spammer that applies a ton of pressure to slow team compositions. His ultimate is reasonably effective against newer players but rarely finds sufficient utility in high level play to justify what is very often a suicide play. Give the Rip Tire a new ability (activated with whatever key is bound to Ability 1) that allows it to hop into a drift (yes I do mean cart-racer style) with a short cooldown. This will give stronger players options to bait out counterplays and reasonably juke players with moderate aiming skill while being difficult to abuse by those lower-tier players that don’t have a precise understanding of which counterplays they need to bait and which enemies they need to juke.

(maybe I want to roleplay Junkenstien)

Quality over Quantity: Revisited

Sun. June 11th

 

I read the comment thread on /r/competitiveoverwatch and thought I should make this update to discuss two really crucial lines of argument that I noticed throughout the reddit thread and in people’s responses to my twitter.

Criticism 1: Jake, your system is idealistic. Players who refuse to play healers or tanks in game will still check those boxes just to get faster queue times. This will leave their teammates in the lurch and ultimately ruin the system.

Criticism 2: Jake, your system would encourage one-tricking and diminish the rewards accrued to versatile players.

In response to C1, I argue that the motivation not to abuse the system is inherent to its design. Lets say I’m a Bastion only one trick. I won’t play anything besides Bastion for any reason. I could check the healer & tank checkboxes–clearly justified in this instance because Bastion is both :)–but I won’t because I want the system to work. No matter what role you want to play, you have a better chance of a competitive and fun ranked match if your team has a more balanced composition.

For those who worry that true griefers/trolls (those trying to lose from the outset of the match) might abuse the system to maximize trolling potential, I would argue that the added impact is relatively small. If I were to pick Roadhog and self-heal in front of the enemy team every time I spawn, my team isn’t going to win. It really doesn’t matter what our composition is or who is willing to flex to what. I’ve actually done this exact thing to ensure that a hacker on my team loses. Even an aimbot isn’t enough to 5v6 a team that gets fed 900 hp of ult-charge per Roadhog spawn. Overwatch is a team game, one person aggressively trying to lose is more than enough to achieve that goal in the current system. Personally, I have not seen many players truly griefing in this extreme sense. Most ‘griefing’ comes from people being tilted about team composition or teammate performance in my experience.

C2 is a bit more tricky, though I would argue my system is nonetheless well designed to encourage versatility. Anyone who has played Overwatch for a significant amount of time can recognize that one healer and one tank is not the strongest team foundation in nearly any scenario. Even if your team already has one of each guaranteed by the matchmaker, there is still tremendous room to increase the strength of your composition by adding a second healer or more tanks. Versatile players can still accrue value from their diverse abilities under the role-queue system I defend.

Regarding the encouragement of one-tricking, I think of the system as a response to the prevalence of one-tricking rather than a cause of it. In the status quo, I already see a very high incidence of one-tricking a hero or, even more commonly, a role. It is the rare player who plays DPS, Tanks, and Healers all at an equivalent level. The vast majority of people, in my experience, have a significant majority of their playtime spent in one role (if not one hero). If the current system is punishing one-tricking relative to the system I propose, then it’s really doing a terrible job.

There is a deeper philosophical question here, though. Is one-tricking an acceptable way to play Ranked Matchmaking in Overwatch? Should it be discouraged? I would argue that, regardless of the answers to these questions, it cannot be stopped without a tremendous cost to the creativity that makes hero-driven shooters so fun. One-tricking happens in every game with character or weapon selection: it’s human nature to have preferences and some people really love to maximize their skill in a very narrow category rather than experience all the possibilities the game has to offer. If you can only play one hero, you don’t have much hope of going pro (except Lucio, but maybe Blizz will figure out how not to buff that hero someday). In my view, thats OK. Not everyone aspires to play professionally; people come to the game for really different reasons, even at the far right tail of the skill distribution.

The best way to design the system, in my view, requires accepting that there are many different types of players with many different motivations. Fighting to change people is a losing battle, why not build a system that offers fun matches whether you want to one-trick or flex every role as your team needs it?

 

 

P.S. Shoutout to /r/competitiveoverwatch for the great feedback and response! I’ll be back next week with another article, although I’m not sure exactly which topic to pick just yet. Tweet me some suggestions! (@jake_overwatch)

P.P.S. Some people suggested a DPS check box in addition to my suggestion. My main resistance to this suggestion is that Blizzard has done a really poor job with the hero classifications in the DPS role. Hanzo is, at least at a high level, unpickable on defense but sometimes viable when attacking. Many of the defense characters are like this, due to their one-dimensionality they are easily counterpicked and so are poor choices when actually defending. On offense though, they can exploit weaknesses in defending teams locked into their composition. Clearly every team does not need one Offense character and one Defense character in the same way that every team does need one Healer and one Tank. In my view, the problems with hero classification in the DPS role need to be solved before a system could be implemented that would specifically indict the failures of this existing hero classification system.