Skip to content

Thursday evening, Microsoft 365 identified a global outage affecting users accessing various Microsoft 365 applications and services. Impacted users suffered from login issues, Azure hosted virtual machines not being available, and constant loading screens in Microsoft 365 services, just to name some of the issues.

Unfortunately, to make matters worse, around the same time as the outage, CrowdStrike released an update that began blue screening devices, however, Microsoft stated the outage was unrelated to the CrowdStrike blue screen. Microsoft has resolved the issue as of posting this and continues to monitor MO821132.

Affected Services

– Microsoft Defender

– Microsoft Defender for Endpoint

– Microsoft Defender Experts

– Microsoft Intune

– Microsoft OneNote

– OneDrive for Business

– SharePoint Online

– Windows 365

– Viva Engage

– Microsoft Purview

– Microsoft Fabric

– PowerBI

– Microsoft Teams

– Microsoft 365 admin center

 

Microsoft acknowledged the issue at 7:13 PM EST stating they would have an update within 30 minutes. This was over an hour after the Exoprise CloudReady sensors first detected the outage, starting around 6:00 PM EST. Customers were able to communicate the outage to their users which prevented a flood of tickets.

MO821132 Outage Detected by Teams Sensor
MO821132 Outage Detected by Teams Sensor

Microsoft Ticket MO821132

The Microsoft ticket is available here: Users may be unable to access various Microsoft 365 apps and Services

Incident Start Date and Time

Thursday, July 18, 2024, 7:13 PM EDT

Incident End Date and Time

Thursday, July 18, 2024, 9:55 PM EDT

Scope of impact

This issue may have impacted any user attempting to use various Microsoft 365 apps and services.

Exoprise Analysis

Starting around 6:00 PM EST, CloudReady sensors started reporting errors across multiple Microsoft 365 services. During the outage, the errors reported by CloudReady varied by service but were in fact detected.

At 7:36 PM EST, Microsoft provided an update stating they had begun their investigation into the cause of the issue. They also rerouted affected traffic from affected infrastructure. The failover implemented helped stabilize usage for users while they continued to investigate the root cause of the issue. Even with the failover, CloudReady sensors continued erroring into the next day while Microsoft continued working on a resolution.

Outage Error Widget
Outage Error Widget

On July 19th, 2024 at 11:29 AM EST, Microsoft updated MO821132 stating the issue had been resolved and posted the following to twitter –

Root Cause Analysis

During the outage, the CloudReady sensors detected errors for the impacted services. Since multiple services were affected, the errors varied for each type of sensor. The screenshotted errors, from a Teams sensor, indicate that the issue was not consistent. Each run the sensor errored on triggered at different steps in the execution of the sensor.

MO821132 Outage Sensor Errors
MO821132 Outage Sensor Errors

This was also the case for all of the affected sensors, and although the errors were spread out over time, the impact to the user experience was noticeable.

Preliminary Root Cause From Microsoft

Preliminary root cause: A configuration change in a portion of our Azure backend workloads, caused interruption between storage and compute resources which resulted in connectivity failures that affected downstream Microsoft 365 services dependent on these connections.

Jul 19, 2024, 11:29 AM EDT

After an extended period of monitoring, in addition to internal validations and customer confirmations, we have declared the incident resolved. We understand the impact this incident has had on our users and greatly appreciate your organization’s patience and feedback while we worked to resolve this high priority issue in its entirety. Jul 19, 2024, 10:20 AM EDT

We’ve completed applying our mitigation actions and our telemetry is now indicating all previously impacted Microsoft 365 apps or services are operating normally. We’re now entering a period of monitoring to ensure all previously impacted apps, services, and scenarios have fully recovered.

Jul 19, 2024, 9:48 AM EDT

We’re continuing to resolve the residual impact and we’re monitoring the Microsoft 365 apps and services while they fully recover. Customers should experience incremental recovery as we recover from the remaining impact. This quick update is designed to give the latest information on this issue.

Jul 19, 2024, 8:29 AM EDT

We’re continuing to apply mitigation actions to provide relief from the residual impact affecting the remaining impacted Microsoft 365 apps and services. Our telemetry is indicating that the remaining impacted scenarios are progressing towards a full recovery and we’re closely monitoring to ensure this progress continues.

Jul 19, 2024, 7:52 AM EDT

We’re continuing to apply additional mitigations to fix the residual impact affecting the remaining impacted Microsoft 365 apps and services. Our telemetry indicates that we’re progressing towards full recovery and we’re continuing to monitor. This quick update is designed to give the latest information on this issue

Jul 19, 2024, 6:24 AM EDT

The underlying cause of the issue has been fixed and several Microsoft 365 apps and services have been restored to full functionality. Residual impact is still affecting some Microsoft 365 apps and services, and Microsoft 365 engineering are continuing to conduct additional mitigation actions to provide relief. We’re continuing to observe an increase in functionality and availability for the remaining impacted scenarios and we’re monitoring this closely to ensure we’re progressing towards full recovery. Microsoft is continuing to treat this event with the highest possible priority.

Jul 19, 2024, 5:37 AM EDT

We’re conducting additional mitigation actions to remediate impact for the remaining affected Microsoft 365 apps and services. Our telemetry indicates functionality and availability are continuing to improve across multiple scenarios, and we’re monitoring this closely to ensure we’re progressing towards full recovery. This quick update is designed to give the latest information on this issue.

Jul 19, 2024, 4:28 AM EDT

We’re continuing to see an improvement in service availability across multiple Microsoft 365 apps and services. We’re closely monitoring our telemetry data to ensure this upward trend continues as our mitigation actions continue to progress.

Jul 18, 2024, 8:21 PM EDT

We remain focused on redirecting the impacted traffic to healthy systems as we investigate the root cause. Your organization may experience relief as our mitigation efforts progress. We understand the impact that this issue may have on your organization and we’re continuing to treat this event with the highest priority.

Jul 18, 2024, 7:58 PM EDT

We’re continuing to work on rerouting the impacted traffic to alternate systems to alleviate impact in a more expedient fashion. In parallel, our investigation into the underlying cause of the problem is ongoing. This quick update is designed to give the latest information on this issue.

Jul 18, 2024, 7:36 PM EDT

We’re rerouting affected traffic out of the impacted infrastructure while we continue to investigate the cause of the issue. We’re investigating reports of issues in which users are unable to access various Microsoft 365 apps and services. We’re looking into the cause of this incident and the best mitigation pathway forward. We will provide an update within 30 minutes.

 

Simon Dion is a Success Engineer dedicated to making sure new and existing customers are properly monitoring their SAAS applications, user experience, and are getting the most out of the product.

Back To Top