The Comedy of Errors That Caused the Dangerous Microsoft Azure Breach

Microsoft said the corporate account of one of its engineers was hacked by a highly skilled threat actor that acquired a signing key used to hack dozens of Azure and Exchange accounts belonging to high-profile users.

The disclosure solves two mysteries at the center of a disclosure Microsoft made in July. The company said that hackers tracked as Storm-0558 had been inside its corporate network for more than a month and had gained access to Azure and Exchange accounts, several of which were later identified as belonging to the US Departments of State and Commerce. Storm-0558 pulled off the feat by obtaining an expired Microsoft account consumer signing key and using it to forge tokens for Microsoft’s supposedly fortified Azure AD cloud service.

The disclosure left two of the most important questions unanswered. Specifically, how was a credential as sensitive as the consumer signing key stolen from Microsoft’s network, and how could it sign tokens for Azure, which is built on an entirely different infrastructure?

On Wednesday, Microsoft finally solved the riddles. The corporate account of one of its engineers had been hacked. Storm-0558 then used the access to steal the key. Such keys, Microsoft said, are entrusted only to employees who have undergone a background check and then only when they are using dedicated workstations protected by multi-factor authentication using hardware token devices. To safeguard this dedicated environment, email, conferencing, web research, and other collaboration tools aren’t allowed because they provide the most common vectors for successful malware and phishing attacks. Further, this environment is segregated from the rest of Microsoft’s network, where workers have access to email and other types of tools.

Those safeguards broke down in April 2021, more than two years before Storm-0558 gained access to Microsoft’s network. When a workstation in the dedicated production environment crashed, Windows performed a standard “crash dump,” in which all data stored in memory is written to disk so engineers can later diagnose the cause. The crash dump was later moved into Microsoft’s debugging environment. The hack of a Microsoft engineer’s corporate account allowed Storm-0558 to access the crash dump and, with it, the expired Exchange signing key.

Normally, crash dumps strip out signing keys and similarly sensitive data. In this case, however, a previously unknown vulnerability known as a “race condition” prevented that mechanism from working properly.

Members of the Microsoft Security Response Center wrote:

Our investigation found that a consumer signing system crash in April of 2021 resulted in a snapshot of the crashed process (“crash dump”). The crash dumps, which redact sensitive information, should not include the signing key. In this case, a race condition allowed the key to be present in the crash dump (this issue has been corrected). The key material’s presence in the crash dump was not detected by our systems (this issue has been corrected).

We found that this crash dump, believed at the time not to contain key material, was subsequently moved from the isolated production network into our debugging environment on the internet connected corporate network. This is consistent with our standard debugging processes. Our credential scanning methods did not detect its presence (this issue has been corrected).

After April 2021, when the key was leaked to the corporate environment in the crash dump, the Storm-0558 actor was able to successfully compromise a Microsoft engineer’s corporate account. This account had access to the debugging environment containing the crash dump which incorrectly contained the key. Due to log retention policies, we don’t have logs with specific evidence of this exfiltration by this actor, but this was the most probable mechanism by which the actor acquired the key.

Addressing the second mystery, the post explained how an expired signing key for a consumer account was used to forge tokens for sensitive enterprise offerings. In 2018, Microsoft introduced a new framework that worked with consumer and enterprise cloud apps. Human errors prevented a programming interface designed to cryptographically validate which environment a key should be used for from working properly.

The post continued:

To meet growing customer demand to support applications which work with both consumer and enterprise applications, Microsoft introduced a common key metadata publishing endpoint in September 2018. As part of this converged offering, Microsoft updated documentation to clarify the requirements for key scope validation—which key to use for enterprise accounts, and which to use for consumer accounts.

As part of a pre-existing library of documentation and helper APIs, Microsoft provided an API to help validate the signatures cryptographically but did not update these libraries to perform this scope validation automatically (this issue has been corrected). The mail systems were updated to use the common metadata endpoint in 2022. Developers in the mail system incorrectly assumed libraries performed complete validation and did not add the required issuer/scope validation. Thus, the mail system would accept a request for enterprise email using a security token signed with the consumer key (this issue has been corrected using the updated libraries).

In an email, a Microsoft representative said the engineer’s account was compromised using “token-stealing malware” but didn’t elaborate on how it got installed, if other corporate accounts were hacked by the same threat actor, when Microsoft learned of the compromise, and when the company drove out the intruders.

In addition to those questions are these: Wasn’t a key as sensitive as the one acquired by Storm-0558 stored in an HSM (hardware security module)? These are dedicated devices that store important information and are designed to prevent the form key acquisition that Microsoft disclosed.

In the email, the representative said the company “identity systems manage keys using a combination of HSM and software protections due to unique scale and resiliency requirements of the cloud environment.” This still didn’t explain how the attackers managed to extract the key from a device specifically designed to prevent theft.

As noted earlier, Microsoft has steadfastly resisted using the word vulnerability in describing the flaws Storm-0558 exploited to pull off the breach. Instead, the company used the word “issue.” Asked to explain what Microsoft’s definition of "issue" is and how it differs from the company’s definition of "vulnerability," the representative said: “Vulnerability is a specific term, and we would use the term vulnerability if it was appropriate. “Issue” in the blog refers to things such as misconfiguration, operator errors, or unintended byproducts of other actions.”

Will Dorman, senior principal analyst at security intelligence firm Analygence said in an online interview:

There were a few aspects that could be considered vulnerabilities:

—Race condition led to key material in a crash dump—Sounds like a vulnerability

—"mail system would accept a request for enterprise email using a security token signed with the consumer key"—definitely a vulnerability

—"able to successfully compromise a Microsoft engineer’s corporate account"—Just part of living in this world, I suppose.

Microsoft has said that roughly 25 organizations had one or more of their accounts breached in the campaign, which began on May 15 and lasted until June 16. Microsoft wasn’t aware of the mass hack until a customer tipped it off.

Microsoft has described Storm-0558 as a China-based threat actor with activities and methods consistent with espionage objectives.” The group targets a wide range of entities. They include: US and European diplomatic, economic, and legislative governing bodies, individuals connected to Taiwan and Uyghur geopolitical interests, media companies, think tanks, and telecommunications equipment and service providers.

“Storm-0558 operates with a high degree of technical tradecraft and operational security,” Microsoft wrote in July. “The actors are keenly aware of the target’s environment, logging policies, authentication requirements, policies, and procedures. Storm-0558’s tooling and reconnaissance activity suggests the actor is technically adept, well resourced, and has an in-depth understanding of many authentication techniques and applications.”

Critics calling out Microsoft for what they say is negligence connected to the breach have included a US senator and an industry CEO. They have criticized both the practices leading up to the hack and what they have said is a lack of transparency following it. Wednesday’s update goes a step in the right direction, but the company still has more work to do.