Scalable Request Handling: An Odyssey (Part 1)

Published in

BuzzFeed Tech

12 min readJan 30, 2018

BuzzFeed’s architecture is fairly flat and boring (which is a good thing), it indicates that we’ve chosen technology, as well as architectural patterns, which are tried and tested.

This post is part one of a three part series reviewing a section of BuzzFeed’s architecture, related to handling user requests, and understanding a bit more about what this architecture looked like before compared to now and why it matters.

We’ll begin by reviewing how content is requested, understanding the performance implications of our architecture, how a CDN can help with these issues as well as looking at evolving our architecture to support something ultimately more maintainable and accessible to our engineers.

So strap yourself in, grab a ☕ and let’s get started…

One of the primary functions for any website is the ability to serve content based on a given user request. The complexity behind the process of request handling will change significantly depending on the type of website users are interacting with and the service architecture surrounding that website.

Let’s begin by inspecting a HTTP request for a BuzzFeed resource and understanding the various headers that are sent back to the client once the request is made.

Inspecting a HTTP request

In order to make a HTTP request, we require a tool that speaks the HTTP protocol. Below is an example using curl to request the BuzzFeed home page:

curl --head https://www.buzzfeed.com/

A HTTP request is made up of a ‘resource’ that you wish to reach (this could be a file or a web page, for example) as well as some ‘headers’ that allow you to provide additional context for the request being made. When the server handling a request sends back a response, the content is exposed via a ‘body’ field, while there is a separate set of response ‘headers’ that provide additional information to the client that made the request.

If you’re a web developer, then there’s a good chance you’ll find yourself inspecting HTTP headers a lot during your day-to-day work (i.e. HTTP headers are great for debugging).

There are browser based tools for making HTTP header modifications (as well as viewing the response headers) but I generally find that working with curl — in a terminal — to be the most efficient approach. For example, if I wanted to view only the response headers that have cache information within them, then I would execute the following command:

curl --silent --head https://www.buzzfeed.com/ | sort | grep -i 'cache'

What this command does is it makes a silent (i.e. no progress bar or errors are shown) HEAD request and then sorts the output and finally filters out every header except the ones that have the word ‘cache’ used as either part of its name, or as part of the value.

The output from such a command could look something like the following:

Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
X-Cache-Hits: 1
X-Cache: HIT
X-Served-By: cache-lcy19251-LCY

Below is a snippet of the HTTP request headers we use when making a request for the BuzzFeed home page via curl (these are generated by the curl tool after it has parsed our given command):

GET / HTTP/1.1
Host: www.buzzfeed.com
Accept: */*
User-Agent: curl/7.51.0

First we state that, for this request, we wish to GET a resource located at the path / using a specific version of the HTTP protocol (version 1.1). We also specify that the Host server we wish to connect with should be www.buzzfeed.com.

We then state that we’ll Accept any type of response, and that it doesn’t matter what that response is (it could be plain text or HTML, or anything for that matter). Finally, we add some additional information about ourselves (User-Agent) and in this case we are indicating to the server handling this request that we are using the curl tool.

A snippet of the HTTP response headers, from a BuzzFeed server, might look something like:

HTTP/1.1 200 OK
Age: 55
Cache-Control: no-cache, no-store, must-revalidate
Connection: keep-alive
Content-Length: 305559
Content-Type: text/html; charset=utf-8
Pragma: no-cache

There’s a lot more information contained in the real response headers, but I’ve omitted them for brevity. One important thing to pay attention to when looking at this snippet of the response headers is that the first line is informing us that the server we communicated with was also using the HTTP protocol version 1.1 and that the ‘status’ for our request was considered to be a successful one (i.e. 200 OK).

The purpose of these response headers is to help clients (a client being either curl or a web browser etc) to know how to handle not only this request, but any future requests for the same resource. For example, if the request was made by a web browser it would be able to inspect the Cache-Control header in order to determine if it was able to cache a copy of the response for a set period of time, thus reducing the number of requests needed to be made to the origin server handling the request. For a full list of the possible HTTP response headers, see this reference.

Interestingly, you’ll see from the earlier response headers that there were Cache-Control and Pragma HTTP headers being set and both of them indicated that the origin server does not want its content to be cached by the client (i.e. the values assigned to those headers were: no-cache, no-store, must-revalidate and no-cache). Yet we can still see some ‘custom’ response headers (such as X-Cache) which suggest the content is indeed being cached? We’ll revisit why this is later on. But for now, let’s move onto considering performance.

Latency performance concerns

BuzzFeed is a global brand, and our users are distributed across the world. When a request is made for our content, we need to ensure that the content is delivered in a timely matter.

Our content is currently served out of the US east region, so what happens if a request is made by a user based in the UK? In this scenario a concern would be the possible latency penalty our users in the UK would have to pay in order to request content from our servers.

To resolve this particular performance concern BuzzFeed utilises a CDN (Content Delivery Network) to help us geographically distribute our content to servers located at key regions across the globe.

When a user makes a request for BuzzFeed content the request is first routed to a POP (Point of Presence) nearest to their location and then within that POP they’ll be directed to a specific server instance (known as an edge node). The CDN is responsible for the replication of our cached content at these POPs.

Overloading specific service layers

BuzzFeed has historically relied very heavily upon one particular layer of our infrastructure to handle the majority of our routing requirements. That layer being our CDN.

Our particular CDN provider (Fastly) provides us with the ability to programmatically define behavioural logic ‘at the edge’ using a C based programming language called VCL. This language allows us to inspect and manipulate incoming requests and to control how they are handled and where those requests should be directed (to learn more about Fastly’s implementation, read this).

Fastly helps to protect us from heavy traffic patterns and problematic scenarios such as DDoS attacks. Fastly also allows us to easily keep our content fresh by providing an API that let’s us purge our stale data across all our edge nodes within milliseconds (this is done either by specifying a ‘range’ of items to purge, or by providing a specific identifier ‘key’ which can purge one or more items from our cache).

Clem H., Director of Engineering, has talked previously about moving to Fastly and how it has improved not only BuzzFeed’s ability to better understand and monitor its cache, but also our ability to improve client latency, cache hit ratio and our ability to scale to meet the growing demands on BuzzFeed’s infrastructure.

Earlier, when looking at the BuzzFeed home page HTTP request you’ll remember that we noticed the response headers were suggesting the content should not be cached and yet at the same time was being cached. This actually makes sense now that we understand the caching is being handled by our CDN instead of the client. By way of a special header our services provide to Fastly we can indicate exactly how we want our content to be cached by Fastly as well as when/where it expires or is purged.

This is one of the extra benefits of utilising a CDN such as Fastly. Otherwise, we’d be at the mercy of clients — i.e. web browsers and downstream proxy servers — caching our content, and if we needed to purge that cache we could potentially have a much harder time doing so and also doing so in a timely fashion (e.g. a breaking news story that needs to be seen immediately).

Architecture considerations

So up until mid-2016, what was BuzzFeed’s service architecture? Well, at a very high level, our original architecture looked something like the following (overly simplified) diagram:

Placing a CDN in front of your “origin” servers is fairly standard practice nowadays and can help not only with distributing your content, but also preventing unwanted or dangerous traffic…

*CDN: “Don’t worry BuzzFeed, we got your back.”*

In BuzzFeed’s case, our origin servers are individual microservices for handling specific parts of our website. For example, our home page is served by one microservice while our article pages are served by a separate microservice, and so on for the various types of content that we provide.

At this point in time the situation we found ourselves in was one where we had bottlenecked a core part of our logic (that being the CDN logic responsible for interrogating incoming requests and providing an appropriate response) and in doing so we were failing to separate specific behaviours which would end up restricting our ability to scale, as well as provide better consistency and security around future changes to that logic.

In order to improve upon what we had, we decided to extract out this problematic aspect of our system into a new service layer. We’ll cover the details of why that decision was made, but first we need to understand the inherent thought process involved when thinking about creating a new service.

Considerations for a new routing service

Creating a new service isn’t cheap (both in the sense of cost and overhead), so we need to be sure that doing so is the right decision. When deciding upon whether to define a new service, there are a few fundamental questions that need to be answered, such as:

What should it do?
Why does it need to exist?
How will it help us?
How will it evolve?

Let’s consider each of these questions in turn, with regards to what ended up becoming BuzzFeed’s “Site Router” service…

What should it do?

Remember, the architecture was originally designed to proxy requests between multiple possible origins like so:

The concept behind the new “Site Router” service was simple, but both effective and essential in supporting our ability to scale:

The service should allow engineers (of all skill levels) to be able to easily and simply create new routing logic, which directs incoming traffic to the relevant origin servers responsible for handling those requests.

It should be a highly scalable solution, as well as being feature rich and secure. As far as being “highly scalable” we want to be sure that if there are issues with our CDN provider (e.g. we lose our cache), then the Site Router is able to handle many thousands of requests per second.

To be “feature rich” means we intend for engineers to have a lot of control over the routing behaviours (these features will be covered in the next installment of this series).

Finally, to be “secure” means we’re able to properly test the routing logic engineers are implementing, in order to ensure we’re not breaking user or system expectations (and by moving the routing logic out from VCL means there’s less opportunity for engineers to break our caching behaviours which are configured at the CDN layer).

Why does it need to exist?

BuzzFeed’s technical architecture is designed and built to support our users at scale.

As with most large scale systems, we lean heavily on our ability to cache content that has been sourced from our origin servers, and to serve stale content in those rare times when a part of our system misbehaves or is unhealthy.

We utilise many different tools, services and applications in order to support our scalability requirements, and one of the key components of the overall system is the CDN layer (which acts as our front door).

So with all this good stuff happening at the CDN layer, what was the problem?

The issue was twofold:

We wanted greater security (I’ll explain more about what this means in the next section)
We wanted to extract the complicated VCL logic from the CDN layer.

The primary reason for wanting to extract the routing code from the CDN was that it was difficult to manage, was lacking test coverage around specific logic (and itself was hard to validate) and ultimately this meant we had a hard time preventing engineers from making unsafe changes that could end up negatively affecting our caching strategies and/or causing incoming requests to be routed to the wrong origin server.

How will it help us?

We have a lot of VCL logic within our CDN and if our CDN provider (Fastly) were using a standard version of the Varnish HTTP accelerator, then we could have solved part of the problem (i.e. testing the service more concretely and its various behaviours) by setting up a local instance of Varnish and running our VCL logic through it.

But as that wasn’t possible, by extracting the routing logic from the VCL and providing a clear and clean configuration model in its place, this helps engineers to focus more on making their services available quickly, without having to worry about performance best practices or have concerns around breaking our existing caching strategies. By providing a configuration model we abstract away those issues.

We would provide greater confidence in the service by defining abstractions that allowed engineers to write simple tests (both at the unit, integration and smoke test level) that verify the routing logic is valid and functioning as expected.

From the point of view of security: we would be able to ensure all our microservices were private and sitting behind internal load balancers within a protected VPC network, instead of having to be public services just so Fastly could direct traffic to them.

The Site Router would sit inside its own cluster within the same internal network so it could freely communicate with the other containerised services that we have. We would configure Site Router to accept traffic only from Fastly, thus reducing our attack surface.

By extracting routing this way, we could incorporate custom behaviors and firewalls, or even hook in other third-party SaaS providers.

Ultimately what we ended up with was a high-level architecture looking a little bit like this:

How will it evolve?

The evolution of the service would (and did) happen organically. Our intentions were to look into the possibility of open-sourcing the Site Router code base so it can be easily dropped into an existing system architecture (although this will likely require some modifications to allow decoupling from the BuzzFeed platform — but it is definitely a possibility).

As for our internal consumers (i.e. other BuzzFeed engineers) we’ve seen incredible uptake of the Site Router service. We’ve received and managed the feedback we received and implemented many new features beyond what we had originally envisioned. It has been wonderful reaching out and helping multiple teams across the tech organisation to realise the power and flexibility of the Site Router, and we expect this to continue.

Conclusion

That’s it for part one of this three part series. To recap, we looked at:

What it means to make a HTTP request.
How we handle the performance concerns (i.e. latency) inherent with making HTTP requests from distributed points across the globe (which come from the fact that BuzzFeed is a global brand).
The what, why and architectural design of our routing service layer.
Probably most importantly: we discussed the thought process and considerations that goes into deciding if a new service is even the right solution.

In part two of this series we’ll look in more detail at the design of the Site Router service layer, including its configuration interface, as well as examples of some of the key features.

Come join in!

By the way, if any of this sounds interesting to you (the challenges, the problem solving, and the tech), then do please get in touch with us. We’re always on the lookout for talented and diverse people, and we’re pro-actively expanding our teams across all locations around the globe.

We’d love to meet you! ❤️

To keep in touch with us here and find out what’s going on at BuzzFeed Tech, be sure to follow us on Twitter @BuzzFeedExp where a member of our Tech team takes over the handle for a week!

BuzzFeed Tech

Scalable Request Handling: An Odyssey (Part 1)

Inspecting a HTTP request

Latency performance concerns

Overloading specific service layers

Architecture considerations

Considerations for a new routing service

What should it do?

Why does it need to exist?

How will it help us?

How will it evolve?

Conclusion

Come join in!

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in BuzzFeed Tech

Written by Mark McDonnell

No responses yet