This Week In Security: Cloudflare Wasn’t DNS, BADAUDIO, And Not A Vuln

You may have noticed that large pieces of the Internet were down on Tuesday. It was a problem at Cloudflare, and for once, it wasn’t DNS. This time it was database management, combined with a safety limit that failed unsafe when exceeded.

Cloudflare’s blog post on the matter has the gritty details. It started with an update to how Cloudflare’s ClickHouse distributed database was responding to queries. A query of system columns was previously only returning data from the default database. As a part of related work, that system was changed so that this query now returned all the databases the given user had access to. In retrospect it seems obvious that this could cause problems, but it wasn’t predicted to cause problems. The result was that a database query to look up bot-management features returned the same features multiple times.

That featurelist is used to feed the Cloudflare bot classification system. That system uses some AI smarts, and runs in the core proxy system. There are actually two versions of the core proxy, and they behaved a bit differently when the featurelist exceeded the 200 item limit. When the older version failed, it classified all traffic as a bot. The real trouble was the newer Rust code. That version of the core proxy threw an error in response, leading to 5XX HTTP errors, and the Internet-wide fallout. Continue reading “This Week In Security: Cloudflare Wasn’t DNS, BADAUDIO, And Not A Vuln”