The Cloudflare Outage: A Security Learning Opportunity
An intermittent outage at Cloudflare on Tuesday briefly knocked many of the Internet's top destinations offline. Some affected Cloudflare customers were able to pivot away from the platform temporarily so that visitors could still access their websites. But security experts say doing so may have also triggered an impromptu network penetration test for organizations that have come to rely on Cloudflare to block many types of abusive and malicious traffic.
Outage Details
The outage began around 6:30 EST/11:30 UTC on Nov. 18, and affected a significant number of high-profile websites. While Cloudflare services eventually returned, the company's status page acknowledged an 'internal service degradation' during this time.
Security Implications
Aaron Turner, a faculty member at IANS Research, suggests that this outage presents an opportunity for Cloudflare customers to reassess their web application firewall (WAF) logs. He notes that while Cloudflare's WAF is effective in blocking many types of attacks, it might reveal vulnerabilities when not in use.
Identifying Security Gaps
Turner explains that some companies may have been overly reliant on Cloudflare for security features such as SQL injection protection and bot blocking. During the outage, these features were bypassed, highlighting potential weaknesses in internal security practices.
Threat Analysis
Nicole Scott, senior product marketing manager at Replica Cyber, refers to the incident as a 'free tabletop exercise.' She emphasizes the importance of examining both external traffic and internal behavior during such events.
Questions for Reflection
- What was turned off or bypassed (WAF, bot protections, geo blocks), and for how long?
- What emergency DNS or routing changes were made, and who approved them?
- Did people shift work to personal devices, home Wi-Fi, or unsanctioned SaaS providers?
- Did anyone stand up new services, tunnels, or vendor accounts 'just for now'?
- Is there a plan to unwind those changes, or are they now permanent workarounds?
- For the next incident, what’s the intentional fallback plan, instead of decentralized improvisation?
Postmortem Analysis
In a postmortem published Tuesday evening, Cloudflare CEO Matthew Prince explained that the outage was caused by an unintended change to one of their database systems' permissions. This led to a larger-than-expected feature file being propagated across their network.
Impact and Lessons Learned
The outage affected roughly 20 percent of websites using Cloudflare's services, highlighting the vulnerability of relying on single cloud providers. Security experts advise organizations to diversify their infrastructure by spreading WAF and DDoS protection across multiple zones and using multi-vendor DNS.