Architecting Trust in Client-Side Analytics, Part One
How BuzzFeed designed a system that expects and handles the introduction of new requirements and the potential for failure.
Last year many of us here at BuzzFeed spent a lot of time and effort overhauling just about every aspect of the analytics tracking in our flagship iOS app. Getting this stuff right is deceptively difficult and it’s something we were able to do only by first getting it wrong.
What follows is my attempt to summarize the central challenges of implementing analytics tracking on a mobile client and how we overcame them. I’ll focus on some general themes that are relevant to any platform, while diving into more technical details about our iOS/Swift-based solution. Part 2 will get into how we unit test this system and show some examples of it in action.

Getting to know the problem
Analytics data about how our apps are being used is extremely important to us at BuzzFeed. We use this data to monitor the health of our owned and operated platforms, to measure the success of new features and content, and to power a rapid learning cycle. On the client-side, we are responsible for the integrity of this data from the time a user generates an event, until we successfully send that data to our data providers, such as Google Analytics (GA) and Quantcast. To make the problem more complex, we expect these data providers to change over time, as well as the events we’re monitoring.
Because this data is so important and the demands on the client are so dynamic over time, we’ve been thinking a lot about the trust we have in our data. BuzzFeed’s mobile app users are typically our most engaged users, so any flaws in the analytics tracking logic will have an outsized impact. If one of our apps ships a bug that affects the analytics data we are sending, we will lose trust in all of the data. It can be very difficult, sometimes even impossible, to recover lost data on the backend. So while it is important from a product perspective to determine what to track, it’s up to us as developers to get the how right. If we are to trust it, our analytics system needs to be extremely safe and robust.
The hidden nature of analytics can make spotting these bugs much more difficult than spotting the kinds of user-facing issues your engineers and QA team might be used to looking for. That is why it is important that in addition to unit testing everything, you also provide the tools your team needs to help find the inevitable bugs.
Architecting trust
In building a trustworthy client-side analytics system, we’ll have three central concepts, or models: the observer, the receipt, and the event.
Separating providers
The observer is essential for decoupling the generic events that the app dispatches and the provider-specific details derived from those events. Keeping provider-specific logic in dedicated observers helps us write SOLID code. In the next post I’ll discuss how we take advantage of this to write less fragile and DRYer tests.
Each provider we want to send analytics data to will have its own observer responsible for: listening for events, sending data off to the appropriate provider, and sending back receipts. Essentially, it’s transforming a platform agnostic event data structure into the data structure that a particular provider needs. The receipt an observer returns is optional because some providers won’t care to observe a particular event, in which case they will not return a receipt.
protocol EventObservable {
associatedtype EventType
func track(event: EventType) -> ReceiptProtocol?
}
Keep your receipts
This concept of the receipt is central to our system. Anytime we report an analytics event to a provider, we will generate a receipt for that event, e.g. if we have a sessionStart
event that is reported to both GA and Quantcast, we will generate two receipts. We can then use those receipts to validate the event in two ways: 1) to unit test that we’re sending the appropriate data to GA and Quantcast for that sessionStart
event, without needing to mock those providers’ APIs, and 2) we can display the receipts in-app for our internal builds so we can manually validate them without having to sniff network traffic.
Your receipt will want to strongly type some things for easier identification and inspection (time the event occurred, provider name, etc.), but most of the provider-specific details (the “screenName” for a GA screen view for instance) will need to get packed into an arbitrary dictionary since the structure of data will vary from provider to provider and event to event. Here’s what our receipts look like:
protocol ReceiptProtocol {
var eventName: String { get }
var providerName: String { get }
var timestamp: Date { get }
var metadata: [String: Any] { get }
}
Generic Event Data
Some providers require certain fields to be sent along with the event data. The important thing to keep in mind here is that the data requirements vary wildly based on both the event type and the provider.
Because these requirements vary, we use an enum with associated values (rather than a single rawValue type). This gives us the type safety we need, while also getting flagged by the compiler when we add a new Event
case that is not handled by any of our observers (assuming you never use a default
clause). Here’s what that enum might look like:
enum Event {
case sessionStart
case screenView(ScreenType)
case share(URL, String)
}
The ScreenType mentioned above isn’t terribly important, but here’s an example of what it might look like:
enum ScreenType {
case home
case settings
}
The details aren’t important here, but you’ll notice there’s no “screen name” specified yet. Those are the sort of provider-specific details that the Event
doesn’t need to know about and we’ll leave to each observer to deal with.
Putting it all together
Now that we’ve gone over the central concepts of observer, receipt, and event, we can combine them for a simple, testable, and QA-able example. We have two events, sessionStart
and screenView
. For now, we'll make a single GA observer that only cares about reporting the screenViews
Provider-specific data
GA will typically want a specific string value that represents the name of the screen we’re on, so we need a way to get that from a ScreenType
(the associated data type of the screenView event). We want to do that while keeping this data isolated from other provider-specific data we might want for other provider integrations, so we’ll use an extension:
private extension ScreenType {
var gaScreenName: String {
switch self {
case .home:
return "home"
case .settings:
return "settings"
}
}
}
It’s important that we did this separately as opposed to say making those strings the rawValue of the ScreenType
. If we were to do that, we’d be coupling the reporting GA does to our generic event data, making it more difficult to integrate with other providers. Essentially, “home” is the appropriate screen name for GA, but it might be, for instance, that another provider requires a specific Int
to identify the screen being viewed.
Creating our observer
Now that we have the data we need to report on screen views, we can build our first observer.
struct GoogleAnalyticsObserver: EventObservable {
func track(event: Event) -> ReceiptProtocol? {
switch event {
case let .screenView(screenType):
trackScreenView(for: screenType.gaScreenName)
case .sessionStart:
// Don’t use default: so that when new events are added
// we are forced to decide what to do with them here.
break
}
}
private func trackScreenView(for name: String) {
// Implementation details of this are not important
}
}
Planning for more providers
Because we fully expect to introduce more observers later (for instance a QuantcastObserver for sessionStart
events), we’re going to make a single controller that will own all these observers. Users will tell it to track an event and it will be in charge of passing it down to its own observers.
class AnalyticsController<EventType> {
private var observers = [ObserverThunk<EventType>]()
private var eventReceipts = [ReceiptProtocol]() func track(event: EventType) {
for observer in observers {
if let receipt = observer.track(event: event) {
// Prepend for default reverse chron
eventReceipts.insert(receipt, atIndex: 0)
}
}
}
func add<U: EventObservable where U.EventType == EventType>(observer: U) {
let thunk = ObserverThunk(observer)
observers.append(thunk)
}
}
You can see that this is pretty straightforward, other than the usage of a thunk to deal with some generics details I won’t go into — just create an AnalyticsController
, add your observers, then send events you want tracked.
let analyticsController = AnalyticsController<Event>()
analyticsController.add(observer: GoogleAnalyticsObserver())
analyticsController.track(event: .sessionStart)
analyticsController.track(event: .screenView(.home))
One important detail to note is that we kept track of the eventReceipts in a private variable. This will be relevant in the following post when we’ll come back to those receipts and how we use them to help us QA and unit test our system. And then we’ll follow up with some code samples of how this was actually implemented in our app.