A man and woman standing and looking at the iPad's screen. A woman using phone while sitting on the table. A cat and laptop lying on the table

How we built Pooled Plans

Dan here! I’m a product manager on the team at US Mobile.

This blog explores how we solved a complex technical problem in the mobile virtual network operator (MVNO) space: Pooled Plans that let our subscribers buy prepaid data and share it among multiple phones.

🛠 Pooled Plans is in private beta. We’ll launch this product for the general public in early April. Sign up below for updates!

Motivating the problem

As a hybrid network operator, end users come to us because of our flexible, high-value plans and differentiated user experience. Pooled Plans extend that flexibility and value to any use case involving multiple phones: families, groups of friends, roommates, small businesses, and more. Eventually, we’ll support hybrid-device and IoT configurations where shared data is more convenient and easy to manage than large sets of individual plans.

When users sign up with us, we register their devices with the mobile network operator, or MNO. These are the large companies that actually own the cellular network infrastructure in the United States. MNOs give you access to point-and-click software, but we use APIs to interact with their platforms programmatically.

Once a device gets activated and used in the real world, MNOs’ systems produce usage records that encompass atomic data about every act of network usage. That means every time you send a text message, make a phone call, or use 4G LTE or 5G data, those events get recorded by the carrier. As you can imagine, these records are personal and sensitive. Rest assured – internally, we rigorously manage access control so that only permissioned employees can access this data. Further, we cannot in any way read users’ texts, listen to calls, or even know what websites and apps get used.  The data we receive only covers usage volume so we can understand network usage and manage costs internally.

Let’s focus on data sessions. If you scroll through Instagram when you’re not on WiFi, a bit later that day there’s a usage record generated that includes metadata like the session duration, timestamp, and quantity of data downloaded or uploaded. Again, we wouldn’t actually know you were on Instagram!

We decided to simplify our first Pooled Plans offering by giving all lines Unlimited Talk and Text. This meant metering of data immediately became the central technical problem we needed to solve to support the feature.

And, remember – because Pooled Plans are meant for groups of users, it’s not as simple as counting the data usage of a single phone. We needed to design a data metering system that would maintain a streaming total across an arbitrary number of phones, and take action on mandated business logic when that total reaches a certain amount.

Design constraints

We knew there were several constraints and imperatives.

Real-time metering

Centrally, the metering had to be as real-time as possible. Any phone in a Pooled Plan—and there could be hundreds, or thousands!—could be using data at the same time. We needed to be able to track and alert the account owner immediately if their pool were entirely consumed. That would let them take action to top up their Pooled Plan with more prepaid data.

In-house metering

We could have used our MNO partner’s prepaid metering, where individual lines are allotted “buckets” of service (a certain amount of talk, text, and data) after which the line gets service cut off. However, we wanted our Pooled Plans only to consider the total usage by all member lines. Individual buckets as the constitutive part in the back end would fail if they cut off service too soon for individual lines.

We decided to use our MNO’s postpaid platform and billing mode because it offered a richer feature set and superior end-user experience. This platform is, in fact, intrinsically unlimited! End users use whatever they want, and then US Mobile is billed by the MNO. Therefore, we needed to develop our own metering. This would let us prevent further data use via programmatic intervention once the end user consumes what they’ve paid for.

Scalability

As a third constraint, we required a scalable architecture that would accommodate tens of millions of calculations a day. Each device can generate numerous records throughout a day, each of which can include multiple rows corresponding to individual data sessions. Our metering computation service therefore would need to scale as hundreds of thousands of devices join our platform.

Reproducibility

We needed to be able to look back and understand what happened in any given pool. This historical transparency is fundamental for IT operations in businesses where there’s accountability for connectivity costs.

Our product vision for Pooled Plans extends to complete historical discoverability and advanced reports about what’s happened in a Pooled Plan, when, and by who. This is essential for our business customers who are managing fleet evolution with morphing person-device assignments.

How we ingest usage data

We’ve long had a legacy metering engine. Early in 2020, we rebuilt it from scratch, creating a near-real-time ingestion service to pull usage records from our MNO and store them in the cloud. This major technical advancement enabled in-house metering of cellular usage.

We query our MNO’s servers and download large sets of new record files multiple times each hour. When we receive those new records, we disaggregate the raw files, which contain numerous individual data sessions, into single rows. We append all those individual data session rows into a large persistent store in BigQuery. This dataset functions as a comprehensive historical record of all usage on our network, line by line. With this raw data, we can now build any metering capabilities we want. We can also derive unique insights about cellular usage in America – did you know an average person uses 1.4 GB of LTE data per month on their phone?

There are some interesting chronology caveats along the way. For example, the MNO we work with makes newly produced usage records available to us at no more than a 15-minute frequency. Yet just because a record is newly exposed to us doesn’t mean the entries in that record correspond only to data sessions that occurred in the last 15 minutes! There might be a delay between when the usage occurred and when it gets registered by the MNO’s recording systems. Thus, we determined that the timestamp at which each data session occurred—not the ingest timestamp at which we finally received the record—would be the operant time parameter for downstream metering.

How we prepare the data

💡 So far, a user has accessed mobile data, the network has recorded that data session, and we’ve found out about the record and recorded it in our raw cloud datastore.

But now we need to find out: should we count that data use against one of our subscriber’s Pooled Plans? To decide that, we need to determine whether that device belongs to a pool. Remember, many of our subscribers are on individual phone plans, not pools.

Determining pool membership

Let me expose another tricky chronology problem now. In imagining this product, we felt users needed to be able to add or remove lines from a Pooled Plan at any time. So say this happened:

  1. A data usage event occurred at 8:30 AM.
  2. The device was removed from a pool at 8:50 AM.
  3. We received and processed the data usage at 9:10 AM.

Clearly, at this point we can’t just look up whether the device is currently in a pool! It’s not. But, the usage did occur while the line was still in the pool. So we do want to count this usage against the pool’s balance.

But just as clearly, we can’t crudely look up whether the device was ever in this pool, either! If that’s not clear, imagine instead that this happened:

  1. A device was removed from a pool first.
  2. Next, the device used data.
  3. Last, we received and processed that data usage.

It would not be fair—or accurate—to our user to count that data against the pool, because the device was no longer on the pooled plan at that point!

The key insight here is that the data usage can only count against the pool’s balance if it occurred during a period in which the pool-device association was active, not terminated.

Designing an apt data structure

Discovering this sequencing condition led us to design a single collection in our database, MongoDB, that would serve as a source of truth, a complete history of pool membership. Having these data is a basic prerequisite for accurate, timely, future-oriented and historical metering. Our business needs these to manage its operational costs just as much as our users need these to understand and manage their usage patterns, forward- or backward-looking.

We opted for an append-only structure. Each document encapsulates an association of a device with a pool, represented by a foreign key.

association : {
    phoneNumber :
    poolId :
    startDate :
    endDate : // Nullable
} 

Whenever a user adds a device to a Pooled Plan, we create a document. Each document always including a startDate for that association.

But what if a line is removed from a pool? We do not delete this document. That would destroy our historical record! Instead, we write an endDate.

As a consequence of this data structure, adding the same line to the same Pooled Plan means we get a new document, not interfering with the original. By the way, it also means that if the device got added to a new pool, we wouldn’t have any problem metering usage accurately, because we stored the poolId foreign key.

How we transform usage data into metering

How we built Pooled Plans
Schematic of ETL pipeline

💡 So far, we’ve constructed a source of truth that lets us determine if a device was in a Pooled Plan at any given time. We also have a long list of every time a device on our network used data.

Filtering usage records

At this point, we designed an event-driven data enrichment cascade. Once we can validate if a data session should be counted against a pool, we’ll be able to complete the metering pipeline.

Upon receiving a new usage record, we pop a message onto a queue which is consumed by an Enrichment service. That service queries the above MongoDB collection to see if the device was in any pool at the time the data was used – and if yes, determines which pool. We do this for every new record. Conceptually, we can think of this step as “filtering out” all device usage that isn’t relevant to Pooled Plans.

This means we now locally have a list of all pools for which an updated balance needs to be calculated.

Recording and aggregating usage

Now, we pop a message to a new topic, to which our Usage Aggregation service listens. That service does a couple things.

  1. It drops any extraneous columns from the original usage record, retaining only the necessary metadata. It writes to a new BigQuery table, called the Pool Ledger, enriching those records with the unique ID of the pool against which the usage is to be counted. The Pool Ledger is a complete historical record of data use in our Pooled Plans.
  2. It forwards a message to a new queue, containing a list of pools and their aggregated debits. Say there were 14 usage records belonging to a single pool in this batch. We sum them all up into a single negative delta.

Completing the pipeline

The last step of the pipeline, the Pool Metering Engine, calculates each pool’s new balance by subtracting the received delta from the pool’s prior balance.

Thus, the balance we maintain in our database is semi-real-time, fully reproducible and auditable, and calculated using an in-house data pipeline. This meets all of our technical and business requirements, as we defined at the outset.

Closing thoughts

We used a set of tools to accomplish the back end support for this product, including GCP PubSub topics, Cloud Functions, RabbitMQ, and a couple of our own hosted services on GKE.

In this blog, I’ve focused on the technical design of the data pipeline. There is so much more to say about the user experience we have created and the way we positioned our offering to satisfy diverse persona needs. In the future, I’d love to preview the advanced analytics and machine learning capabilities that are on our roadmap for Pooled Plans.

But more simply, this pipeline supports the fundamental concept of pooled plans, which is that a set of devices draw from the same pool of data, some using more and some less.

I learned a lot from working on this feature with Sajeesh Kumar, our Director of Engineering. I’m so grateful to him for his patience in answering my many questions along the way of designing this data architecture and molding it to fit with the business logics we needed to support.

He came up with all the important creative insights about how to achieve it. One day back in October he came to work and said, “So I’ve been thinking about how we will do this,” and then he laid it out in Miro as we talked. Our final implementation was essentially identical to his initial insight.

I hope you enjoyed reading this exploration of our technical design! US Mobile is currently hiring engineers, product managers, and designers. Please get in touch to join our team!

How we built Pooled Plans