Skip to content

Earlier today, Microsoft Teams suffered a global outage causing multiple issues for users. Users attempting to log in were presented with an “oops” page, while already-logged-in users were missing messages, experiencing issues with loading messages in channels and chats and preventing them from viewing or downloading media (images, video, audio, etc…)

Exoprise CloudReady sensors (proactive monitoring) first detected the outage in North America starting shortly before 11 AM EST. Customers were notified by proactive alerts and were able to communicate the degradation to their users, preventing an influx in tickets. Many customers chose alternative means of communications such as Zoom or Slack to continue their productivity.

 

Microsoft Ticket TM710344

The Microsoft ticket can be found here: TM710344 Some users may experience multiple issues with their Microsoft Teams

Live Heatmap View, Worldwide Teams Outage, Sample Exoprise Customer Screen
Live Heatmap View, Worldwide Teams Outage, Sample Exoprise Customer Screen

Latest Updates

Finally, the post incident report has been actually published. For a long time, the report was not actually available. At approximately 4:30 PM EST, the actual report was available. 

Incident Start Date and Time

Friday, January 26, 2024, at 2:55 PM UTC

Incident End Date and Time

Saturday, January 27, 2024, at 1:30 AM UTC

Root Cause

“Starting on January 24th, a regularly scheduled update was deployed into the Europe, Middle East, and Africa (EMEA) region without issue. As the update had been validated via safe change and deployment was successful in EMEA, the change was propagated to the Americas and APAC.

The rotation of capacity for the update in the Americas and APAC, combined with the introduction of dynamic scale capacity in EMEA to handle peak traffic, introduced a previously unseen issue where the requests of Teams to the Azure Key Vault service were throttled. The throttling was not the result of insufficient capacity of Azure Key Vault, but rather that the Teams service was calling a single Azure Key Vault instance supported by multiple servers, which had a throttle limit implemented at the resource level.

As more capacity was attempted to be brought into rotation for America’s peak traffic, further load was placed on Azure Key Vault resulting in the Teams service attempting recovery through redistribution of traffic. This further exacerbated the backlog of requests and introduced a secondary throttle from the Entra directory service, as the components were attempting to authenticate. Like Azure Key Vault, Entra has significant capacity but has throttling limits based on service identity making the authentication requests (in this case the Teams service).

This condition resulted in two issues for users: (1) server capacity was not able to be brought into service, degrading the chat service due to insufficient capacity and (2) users attempting to log into Teams were not able to do so.”

Teams TM710344 Outage, Availability Report
Teams TM710344 Outage, Availability Report

Exoprise Analysis

Starting around 10:30 am EST, Exoprise sensors started reporting errors signing into teams, getting messages and media content. Exoprise runs a series of sensors for its own tenant across a global infrastructure of private sites and public message sensors. We have employees working in multiple metro regions so we want to have full coverage. Here’s an example message sensor output over Friday January 26th:

Teams Errors TM710344
Teams Errors TM710344

You can see that despite Microsoft attempting recovery as early as 1PM EST, outages, slow access and a poor Teams digital experience persisted throughout the day. Even the following day, January 27th, poor performance and sporadic outages continue. Exoprise was seeing errors throughout the day on Saturday the 27th while Microsoft continued to recover infrastructure.

Root Cause Analysis

Early during the outage, Exoprise sensors coalesce around a common message from Teams infrastructure:

errorCode=appLoadOrchestrationPlan_StartConversationStoreOrchestrationStep&errorMessage=App%20loading%20failed:%20app%20appLoadOrchestrationPlan_StartConversationStoreOrchestrationStep:%20StartConversationStoreOrchestrationStep-WebClientDataLayerClientHandler

Which basically translates into a Teams application loading error. Approximately, 1 and 1/2 hours later Microsoft starts to reveal the underlying cause:

Our review of service telemetry indicates a portion of database infrastructure that facilitates multiple APIs is experiencing a networking issue, resulting in impact.

Which is reflected in the underlying error messages from the application. Later in the day, Microsoft details the struggle to get the application back online:

Our failover operation did not provide the anticipated relief to end users in North and South America regions, and we are now working to optimize traffic patterns as part of the mitigation effort.

Even 19 hours later, Exoprise is still aware of the outages and struggle to remediate traffic around the issue.

Preliminary Root Cause From Microsoft

Preliminary root cause: While initial indications suggested a networking issue within the Microsoft Teams service infrastructure, we conducted further analysis and identified a combination of factors which lead to the incident. The Post Incident Review will provide additional details and the preliminary PIR will be published by Monday, January 29, 2024 at 9:00 PM EST

 

January 26, 2024 at 5:58 PM EST

Title: Some users may experience multiple issues with their Microsoft Teams

User Impact: Users may experience multiple issues with their Microsoft Teams.

More Info: Affected scenarios include, but aren’t limited to:

– Users performing a cold boot may not able to log into teams and will see an “oops” page

– Users logging in or unlocking their devices after some time may see missing messages

– Users may fail to load messages in channels and chats

– Users are unable to view or download their media (images, videos, audio, call recordings, code snippets)

– Some messages may experience delays being sent

– Call Recordings might take longer to appear in user’s OneDrive for Business and SharePoint Online – Users may be unable to load previous Copilot history, or new history is not written

– Bots may be unable to download attachments – Sending and receiving read receipt notifications may be delayed

– Anonymous users may be unable to join meetings

– Teams connectors for Power Automate/Power Apps may experiencing errors

Current Status: We’re continuing to apply mitigations across the affected infrastructure and our telemetry is showing additional improvement in the user experience, though many customers are still affected by this issue. We’re also working to apply fixes to address individual affected Teams features in parallel while our broader remediation strategy is ongoing. We’re evaluating any and all additional workstreams that will allow us to reduce the impact to those customers that are still affected. We understand the impact an issue like this can have on your organization, and we appreciate your partnership and patience as we work to remediate this issue. Scope of impact: This issue can potentially impact any Microsoft Teams user in the scenarios outlined in the More info section.

January 26, 2024 at 5:02 PM EST

Title: Some users may experience multiple issues with their Microsoft Teams

User Impact: Users may experience multiple issues with their Microsoft Teams.

More Info: Affected scenarios include, but aren’t limited to:

– Users performing a cold boot may not able to log into teams and will see an “oops” page

– Users logging in or unlocking their devices after some time may see missing messages

– Users may fail to load messages in channels and chats

– Users are unable to view or download their media (images, videos, audio, call recordings, code snippets)

– Some messages may experience delays being sent

– Call Recordings might take longer to appear in user’s OneDrive for Business and SharePoint Online – Users may be unable to load previous Copilot history, or new history is not written

– Bots may be unable to download attachments – Sending and receiving read receipt notifications may be delayed

– Anonymous users may be unable to join meetings

– Teams connectors for Power Automate/Power Apps may experiencing errors

Current Status: Our network and backend service optimization efforts are ongoing and we’re monitoring internal telemetry to confirm that our efforts are effectively reducing the impact to customers. We’re continuing to scale up our services and identify new workstreams intended to remediate this problem more rapidly. Although many customers remain impacted by this issue, we’re seeing a reduction in errors and an increase in availability. We understand the impact an issue like this can have on your organization, and we appreciate your partnership and patience as we work to remediate this issue. Scope of impact: This issue can potentially impact any Microsoft Teams user in the scenarios outlined in the More info section.

January 26, 2024 at 4:00 PM EST

Title: Some users may experience multiple issues with their Microsoft Teams

User Impact: Users may experience multiple issues with their Microsoft Teams.

More Info: Affected scenarios include, but aren’t limited to:

– Users performing a cold boot may not able to log into teams and will see an “oops” page

– Users logging in or unlocking their devices after some time may see missing messages

– Users may fail to load messages in channels and chats

– Users are unable to view or download their media (images, videos, audio, call recordings, code snippets)

– Some messages may experience delays being sent

– Call Recordings might take longer to appear in user’s OneDrive for Business and SharePoint Online – Users may be unable to load previous Copilot history, or new history is not written

– Bots may be unable to download attachments – Sending and receiving read receipt notifications may be delayed

– Anonymous users may be unable to join meetings

– Teams connectors for Power Automate/Power Apps may experiencing errors

Current Status: Our failover operation did not provide the anticipated relief to end users in North and South America regions, and we are now working to optimize traffic patterns as part of the mitigation effort. We’re applying configuration changes across the affected network infrastructure to reduce customer impact as quickly as possible.

January 26, 2024 at 3:02 PM EST

Title: Some users may experience multiple issues with their Microsoft Teams

User Impact: Users may experience multiple issues with their Microsoft Teams.

More Info: Affected scenarios include, but aren’t limited to:

– Users performing a cold boot may not able to log into teams and will see an “oops” page

– Users logging in or unlocking their devices after some time may see missing messages

– Users may fail to load messages in channels and chats

– Users are unable to view or download their media (images, videos, audio, call recordings, code snippets)

– Some messages may experience delays being sent

– Call Recordings might take longer to appear in user’s OneDrive for Business and SharePoint Online – Users may be unable to load previous Copilot history, or new history is not written

– Bots may be unable to download attachments – Sending and receiving read receipt notifications may be delayed

– Anonymous users may be unable to join meetings

– Teams connectors for Power Automate/Power Apps may experiencing errors

Current Status: Our failover operation did not provide the anticipated relief to end users in North and South America regions, and we are now working to optimize traffic patterns as part of the mitigation effort. We’re applying configuration changes across the affected network infrastructure to reduce customer impact as quickly as possible.

Live Heatmap View, Worldwide Teams Outage
Live Heatmap View, Worldwide Teams Outage

January 26, 2024 at 1:16 PM EST

Title: Some users may experience multiple issues with their Microsoft Teams

User Impact: Users may experience multiple issues with their Microsoft Teams.

More Info: Affected scenarios include, but aren’t limited to:

– Users performing a cold boot may not able to log into teams and will see an “oops” page

– Users logging in or unlocking their devices after some time may see missing messages

– Users may fail to load messages in channels and chats

– Users are unable to view or download their media (images, videos, audio, call recordings, code snippets)

– Some messages may experience delays being sent

– Call Recordings might take longer to appear in user’s OneDrive for Business and SharePoint Online – Users may be unable to load previous Copilot history, or new history is not written

– Bots may be unable to download attachments – Sending and receiving read receipt notifications may be delayed

– Anonymous users may be unable to join meetings

– Teams connectors for Power Automate/Power Apps may experiencing errors

Current Status: Our failover operation did not provide the anticipated relief to end users in North and South America regions, and we are now working to optimize traffic patterns as part of the mitigation effort. We’re applying configuration changes across the affected network infrastructure to reduce customer impact as quickly as possible.

See How Microsoft Teams Has Been Performing

Subscribe to our weekly newsletter to see how Microsoft 365 performed last week. Measured anonymously from 1000s locations throughout the world - it's legit. See performance and availability trends of Microsoft 365.

 

January 26, 2024 at 12:57 PM EST – Quick Update

As part of the failover process, we’re currently scaling up the alternate infrastructure so that all the traffic can be distributed to the alternate infrastructure.

This quick update is designed to give the latest information on this issue.

 January 26, 2024 at 12:13 PM EST

Title: Some users may experience multiple issues with their Microsoft Teams

User Impact: Users may experience multiple issues with their Microsoft Teams.

More Info: Affected scenarios include, but aren’t limited to:

– Users performing a cold boot may not able to log into teams and will see an “oops” page

– Users logging in or unlocking their devices after some time may see missing messages

– Users may fail to load messages in channels and chats

– Users are unable to view or download their media (images, videos, audio, call recordings, code snippets)

– Some messages may experience delays being sent

– Call Recordings might take longer to appear in user’s OneDrive for Business and SharePoint Online – Users may be unable to load previous Copilot history, or new history is not written

– Bots may be unable to download attachments – Sending and receiving read receipt notifications may be delayed

– Anonymous users may be unable to join meetings

– Teams connectors for Power Automate/Power Apps may experiencing errors

Current Status: Our failover operation did not provide the anticipated relief to end users in North and South America regions, and we are now working to optimize traffic patterns as part of the mitigation effort. We’re applying configuration changes across the affected network infrastructure to reduce customer impact as quickly as possible.

January 26, 2024 at 11:55 AM EST – Quick Update

Whilst we’re investigating the underlying cause of the issue, we’re also reviewing the related monitoring alerts to determine all the different impacted features, and will update the ‘More info’ section to reflect these.

This quick update is designed to give the latest information on this issue.

January 26, 2024 at 11:28 AM EST

Title: Some users may experience multiple issues with their Microsoft Teams

User Impact: Users may experience multiple issues with their Microsoft Teams.

More Info: Affected scenarios include, but aren’t limited to:

– Users performing a cold boot may not able to log into teams and will see a “oops” page.

– Users logging in or unlocking their devices after some time may see missing messages.

Current Status: Our review of service telemetry indicates a portion of database infrastructure that facilitates multiple APIs is experiencing a networking issue, resulting in impact. We’re continuing our investigation to isolate the underlying cause of the networking issue and develop remediation actions.

January 26, 2024 at 10:42 AM EST

Title: Some users may experience multiple issues with their Microsoft Teams

User Impact: Users may experience multiple issues with their Microsoft Teams.

More Info: Affected scenarios include, but aren’t limited to:

– Users performing a cold boot may not able to log into teams and will see a “oops” page.

– Users logging in or unlocking their devices after some time may see missing messages.

Current Status: Our review of service telemetry indicates a portion of database infrastructure that facilitates multiple APIs is experiencing a networking issue, resulting in impact. We’re continuing our investigation to isolate the underlying cause of the networking issue and develop remediation actions.

Team Exoprise represents multiple people in the engineering, sales and marketing department here at Exoprise. It takes a village.

Back To Top