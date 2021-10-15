



Image: Blizzard

Diablo 2: Resurrectedlaunched, it’s all hellish real, but the D2 server immediately went to the Seventh Circle. Last week, players faced constant login issues and outages. And the sound of things must make a poor server engineer absolutely hate life.

First: Whenever a developer posts a blog with more than 2,000 words, he knows that the shit really hit the fans. A large description of all the issues Diablo 2 faces: recently revived players, and the issues aren’t caused by a single issue and can’t cope with the popularity of the game, its architecture, and even downs. By the way, it’s very widespread, due to a variety of things, due to the fact that players are much more efficient at shattering Diablo in 2021.

The first big question the team outlined is how the player’s character and data are stored. If you’ve played Activision or Blizzard multiplayer games over the last few decades, you’ll usually find that humans are logged in to a set of servers as close to their location as possible. It is not a separate server in itself, but a cluster of servers that serve the entire region.

Anyway, all of these servers have their own regional database that stores the data of the characters playing on the server. This is necessary because too many people are playing Diablo 2 to continuously upload everyone’s data to one central point.

Most of the in-game actions are performed against this regional database. This is because the database is fast and the characters are locked there to maintain the integrity of the individual character records. Blizzard states that the global database also has backups in case of a main failure.

Blizzard has a single record (with backup) of your chic level 88 barbarians, necromancers, etc., as these regional databases send information back to the central database on a regular basis. All this sounds fine until the central database is overloaded and the entire system needs a nap, like the engineers working on it.

Saturday morning Pacific time, a sudden and significant surge in traffic caused a global outage. This was a new threshold that the server had never experienced, even at boot time, Blizzard explained.

This was exacerbated by an update released the day before with the aim of improving performance in game creation. The combination of these two factors caused the global database to overload and time out. We have decided to roll back the previously deployed Friday update water. We hope this will reduce the load on the servers that lead to Sunday, while at the same time providing space for further investigation of the root cause.

But on Sunday, it became clear that what we did on Saturday wasn’t enough, traffic increased further, and another outage occurred. Our game server monitored disconnections from the database and immediately repeatedly attempted to reconnect. In other words, the game server was too busy processing a continuous stream of connection attempts, and the database didn’t have time to keep up with the completed work. During this time, we also found that we could improve the configuration of the database event logs needed to restore normal state in the event of a database failure, so we completed them and further analyzed the root cause.

It’s not a fun weekend recipe. It also explains why players had so many problems with progress. I picked a character, started the game and played for a while, but after the stop, the regional server was unable to communicate with the central database. As a result, the Diablo 2s Ground Truth source couldn’t be told that they had acquired new gear and XP, and frustrated players lost some of their progress.

The problem only got worse from there. The Diablo 2 server came back online, but the server quickly rebounded because it came back online while most players were online, but crashed as soon as hundreds of thousands of Diablo 2 instances were launched.

And if the weekend was bad, what followed Monday and Tuesday was no better:

As a result, we switched the global database on Monday, October 11th. This caused another outage because the backup database accidentally continued to run the backup process. This meant spending most of its time trying to copy from other databases when it needed to process a request from the server. During this time, I discovered more problems and made further improvements. We’ve found obsolete but taxable queries that can be completely eliminated from the database, optimized eligibility checks when players join the game, and further reduced the load. As we’re talking about, test performance is even better. It also didn’t happen on Tuesday, so I think we’ve fixed the database reconnection storm that was happening.

This is the point where I keep listening to my brother’s advice in my mind: Never go into networking.

For some reason, Diablo 2 wasn’t enough. The game hit a record high for concurrent players on Wednesday, Australia, after nearly a week of login problems and crashes. According to Blizzard, there are hundreds of thousands of players in a single region alone, which can be either many players or relatively standard players, depending on how the Blizzards server defines the region. (For example, hundreds of thousands are very impressive to Australia. In a region like the United States, not so many, but if the region is a small part of the United States, it will probably be. Blog posts are not. Please specify here.)

Screenshot: Kotaku Australia / Blizzard Entertainment

According to the developers, one of the biggest issues that causes all of this is how the original Diablo 2 handles the core part of the player’s behavior. Vicarious Visions updated the original D2 code as much as possible, but most of the project was to maintain which code worked.

This was fine until the scaling didn’t start.

Diablo 2 has a special way to retrieve data from a central database to ensure that players can do what they want to do. Do you want to participate in the game? It calls back to the central database. Want to switch characters? It’s another check to the central command, making sure you get the character you requested in all the gear you worked in, where you left it.

According to the team, Diablo 2 was designed to be centralized. The downside is that you can’t offload some of the weights to a local server because you can only run one instance of this particular service at a time.

The important thing is that this service is a singleton. This means that you can only run one instance to ensure that all players always have the correct and up-to-date list of games. We have optimized this service in various ways to comply with more modern technology, but as mentioned earlier, many of the issues are due to the creation of the game.

For now, there are various short-term solutions and roadmaps to rewrite the Diablo 2s architecture so that it can be scaled to meet modern demands. For example, a service that simply provides a list of games to players is split into its own services.

Developers will also introduce a login queue, ala World of Warcraft, to prevent server overload when hundreds of thousands of game instances are launched at once.

To deal with this, some people work with login queues, as they have experienced in World of Warcraft. This keeps the population at a safe level at the moment, allowing you to monitor where the system is straining and take action before the game stops altogether. Each time you modify a stock, you can increase the population cap. This login queue has already been partially implemented on the backend (currently it looks like the client has failed to authenticate) and should be fully deployed to the PC in the next few days, followed by the console. is.

Players are also rate limited, but only if the game is created, finished, and recreated in a short amount of time. This is mainly when the player is in an agricultural area such as Shenk & Eldritch or Pindleskin. When this happens, the error message indicates that there is a problem communicating with the game server. This does not indicate that the game server is down on this particular instance, but that the rate is limited to temporarily reduce the load on the database. Blizzard advised that he was interested in keeping the game running.

To be honest, it all sounds like an absolute nightmare. I feel like I’m in front of an engineer who looks like a retroactive fix over the months. Blizzard has a school of internet thought that saw this coming and said it should have been planned. But it’s also basically part of the risk you take with remastering. These games don’t have the popularity and accessibility of information and multiplayer services as they do today, and it’s hard to see if much of their old infrastructure has been scaled up as we thought it would be. It was written back in a difficult time. Sometimes it happens quickly to the point where everything collapses in a fiery mountain.

This article was originally published in Kotaku Australia.

