The Great Shift - Edge Computing & Fast Data Processing

The cloud is going through a great tectonic shift. Put simply, we're now entering a new phase of the cloud where the main assumptions underlying its architecture inhibit it from solving emerging problems with data processing. The biggest shift in how we create data and use it is happening hidden in plain sight. The shift is from seeing and using data like its a historical record (seeing the world in the past tense), to seeing and using data as a dynamic set of events happening continuously in the now (seeing the world in the present tense instead of past tense) anywhere in the world.

The cloud is built on the model of centralization - collect data from everywhere and centralize it in one big pile in one place so that we can do something useful with it. But getting data into one place is hard because to move data takes time and energy to move it (The time cost often called latency and the energy cost referred to as bandwidth).

Data has time, location, concurrency, and actuation value

Data has time value - it tends to decay in value over time if not acted on. Data has location value when seen through the lens of ownership, privacy and data regulation. Data has concurrency value in situations where it's changing fast because of the actions of a lot of people on a common piece of data (like in auctions where the price value of something changes quickly), and finally data has actuation value - its only useful if acted on immediately (like in placing a buy order or sell order with a stock).

So take the types of data that have a mix of time, location, concurrency, and actuation value and move it to a centralized cloud model and you quickly realize the folly. The data becomes less valuable by the time it's moved and stored, it loses context by being moved, it can't be changed as quickly because all the moving around gets in the way of changing it, and finally by the time you act on it - it might be too late (your order doesn't fill in stock trades costing you profits). Therefore something has to change or evolve, that makes new assumptions about the real-time world in order for it to work. Fast Data and the centralized cloud are like a race car meeting a wall at peak speeds. Lots of projectile debris and a wreck of twisted metal and rubber.


 So how then do we solve this problem of fast data?

Geo-distributed Fast Data processing is the answer, edge computing is its architecture

Much as the cloud is the architecture of Big Data, Fast Data needs its own architecture - an edge architecture that eschews centralization and instead embraces distribution and decentralization. By adopting edge computing, we place data processing closer to the producers and consumers of data to reduce the cost of time and energy to move the data to a centralized location.

This kind of edge architecture is not an extension of what worked in the cloud, it has to be built on the philosophy, physics, and mathematics of geo-distribution and latency. You cannot simply take the primitives of data processing on the cloud-like eventually consistent object storage (S3 on AWS), clustered file storage (EBS, EFS or even lustre for that matter), high coordination overhead using consensus protocols (looking at you Paxos and Raft) and put it on the edge and expect it to work at scale (100s of locations). It simply won't work because none of these systems were designed to provide high levels of accuracy (database consistency) at rapid rates of data change (ordering, serializability, and linearizability guarantees) across intercontinental distances and a large number of locations.

Will the Edge Native Architecture for Fast Data problems please stand up?

A real edge-native architecture is one that is co-ordination free (eschew consensus because it breaks down under its own weight), provides high levels of accuracy on rapid rates of data change and allows developers to work with data as events and not mere state changes. This edge native architecture is event-driven, reactive to the real world and most importantly geo distributed at scales of 100s of locations worldwide so as to be no more than 20 milliseconds away from most people, devices and real-world context. The real edge is a brand new architecture, built on purpose specific components that solve the problems of dynamic network behavior, accurate data synchronization without the use of consensus.

Macrometa - the planet scaling, geo-distributed, Fast Data cloud

Macrometa is a geo distributed, fast data cloud platform that brings real-time, geo-distributed, fast data services for this new world of time, location, concurrency, and actuation sensitive problems. As a serverless edge cloud - it provides the programming primitives for building low latency, request-response and event-driven applications for fast data. Primitives such as

  • A globally distributed cloud - cross region, multi-cloud by default (25 locations today across AWS, GCP, Telecom and colocation data centers, 200 locations by end of 2020)
  • geo distributed - all data is ingested, processed & actuated at the closest edge location to either the producer or consumer of data (or both) in real-time.
  • A modern NoSQL streaming database with interfaces to stream and process Key-Value, Documents, Graphs, Geolocation, time series
  • Strong and adaptive distributed consistency (accuracy) to handle various distributed concurrency scenarios
  • A compute runtime that lets you write event-driven code as pipelines that run with data locality across a set of user-defined locations.

Macrometa is first and foremost a platform for developers - its meant for people who want to solve the type of fast data problems that only edge architecture can solve. In building Macrometa - we have focused on providing the right abstractions to hide the complexity of distributed programming, accuracy and correctness. You don't need to know distributed databases or concurrent programming to write real-time, event-driven apps and APIs with Macrometa. You simply consume our APIs and let our platform do the orchestration, distribution, and scheduling of your data and code. You can sign up for a free tier account for developers here and start writing apps for the edge in June.

Final note - Beware of edge washing

The edge is a real opportunity to solve hard problems that the cloud cannot or will not solve. The edge is hot and exciting space for new ideas, startups, and breakthrough business models. And we can expect that along with all the new interest will come inevitably, the great hordes of legacy tools vendors - rebranding their old school big data, storage and container products/services as "Now available with exciting new edge capabilities". Folks, we've seen that movie before when every on-prem system vendor suddenly transformed in cloud-native with a single slide in powerpoint. The bar for edge computing is pretty damn high - geo-distributed, fast data processing with low latency, real-time accuracy across 100s of global locations is not going to happen on the old cloud platforms.