Designing a distributed system requires quite some work, it’s a complex task that requires teams effort and a lot of good software design.

But before you start digging at this monumental task you should first be sure that in order to solve your problem a distributed system is actually needed, because in a distributed system, as we will see, we use multiple layers to abstract away complexity, this requires us, if we are lucky, to maybe use a completely new technology or provide our own if none exists.

So my advice would be to start simple, solve the actual problem, but leave enough wiggle room in your solution so that it is easier down the road to scale when needed.

What is a distributed system?

I like to view a distributed system as a collection of different modules that work togheter seamingly in order to achieve complex tasks. The system also scales well when more resources are added to it and the addition or removal of resources is easily managed.

A distributed system must be transparent to its users i.e. to them a simple interface should be presented and they should not be aware of the system’s intricacies. Imagine that you should know all the details of Google’s search engine in order to make a simple query, it would be a nightmare.

Some examples of distributed systems would be: BitTorrent, DNS, RAID, Apache’s Cassandra and Hadoop.

A clear distinction should be made between a distributed system and a decentralized system even though they may look similar. Distributed systems can be in fact centralized, they can be governed by single entities while a decentralized system can’t be governed by a single entity’s policies.

The need for a distributed system

As I said earlier it is important to know if you need such a system, because you might end up designing and writing a lot of code specific for this kind of system instead of actually focusing your energy on solving the real problem. Some good questions to start with would be:

  • Do I need data redundacy/replication?
  • Do I need to work with very large workloads?
  • Do I need to scale easily as more resources are added?
  • Do I need to work with many modules as transparently as possible?
  • Do my modules require complex coordination?

If you can answer positively to some of these then you might want to invest some time in a distributed system design.

I need a distributed system

You decided that you need to build such a system but you don’t know where to start then it is important to know that a distributed system is just a collection of layers and modules. In the remaining part of this post we’ll try to uncover some of the most common layers and understand if and where they are needed.

Communication is essential

A distributed system has to communicate with its many modules in order to achieve the task it was designed to. Without communication there would be limited scalability, limited redundacy, limited coordination.

Naturally we can split this topic into two:

  • message based communication
  • remote procedure calls

Message based communication

In a message based system each module communicates with its peers via messages, think of them as letters. These letters have a well defined format and each letter usually indicates an action that the peer at the other end should perform.

This approach usually is quite fast. All you have to do is define a message for a specific action and send it to the other peer and maybe wait for a response. The main problem with it is that it is quite verbose, the programmer must explicitly fill in the message’s fields, send it to the peer via a socket or maybe do some inproc magic then wait for the response.

In the end you have a lot of freedom but having to deal everytime with these details can be time consuming especially for the end user/programmer.

You can opt for something like ZMQ if you don’t want to bother with details such as endianness, retrial in case of error and supporting different transport mediums. And even then you still have to deal with verbosity.

Remote Procedure Calls (RPC)

We as programmers like to hide complexity and instead offer abstractions which allows us to work with higher level concepts. RPCs try to do just that, hide all that ugly complexity of message based communication behind an abstraction that integrates nicely into a programming language.

As an example let’s consider the following code and suppose that our database lives maybe in a different process or a cluster of machines.

// Service definition: class Database {
public:
    Data query(Query query);
    void insert(Data data);
};

// Our code:
Database db;
Data data = db.query(query);

// Modify <data>
Database.insert(data);

In a message based system we have to worry about all the details, but all of them can be abstracted away. In an RPC system we define an interface for a service, the Database in our example, then we use some sort of transpiler to translate that difinition from the Interface Description Language into client stubs and servers stubs. Those stubs are in turn responsible for all the nitty-gritty details.

Providing such a system has its own difficulties: you need to think again about messages, marshalling/unmarshalling objects between different programming languages and CPU architectures, handling concurrent calls and automatic stub generation possible for multiple languages.

Fortunately for us there are frameworks like gRPC which simplify a lot of this process. You need some infrastructure in place i.e. to be able to run the protoc transpiler but it shouldn’t be that hard.

Threads, Processes, Machines

It is important to define at which scale your solution will work. Ideally the solution should be transparent no matter the granularity level and by granularity I mean threads, processes, machines.

But if we design a solution for a product that will always live in a single machine then it is no point to offer support for let’s say messages across networks. Knowing the scale at which we want to operate can help us reduce unnecessary development effort, it is no point in working for something that won’t be used ever.

For something heavily multithreaded queues that allow concurrent access should be enough to pass messages (or pointers to these messages provided that the message will not be also accessed concurrently). RPC is also easy at this level.

For a process based system we can look at shared memory to provide the same kind of queues. If you are on Linux then POSIX provides good support for this via their message queue, but Unix Domain sockets are also good. RPC becomes a bit harder if we have to do it ourselves, but we still get access to the whole system’s resources such as the storage.

When peers are on different machines we loose access to having that same environment as we do get on previous architectures. We have to deal with networking and all its issues, different CPU architectures, higher latencies etc, maybe resource caching comes into play etc.

As you can see the complexity increases the further we go from our basic unit: a thread.

Naming

We can employ two strategies when dealing with naming our resources:

  • a flat naming scheme
  • a hierarchical naming scheme

Flat naming scheme

With this approach we don’t have hierarchy within the names. Let’s say we use UUIDs to address resources, there is no hierarchy that can be deduced from this, just a random looking ID that we have to look into some database or maybe a Distributed Hash Table (DHT) in order to know where our resource is located.

The flat naming scheme is good for machines, as we can design it to be fast from a processing and look-up time perspective, but it is a pain when users have to deal with it directly. Just imagine how fun would be to access Youtube using just IP addresses or a file by its inode number … not much fun.

The gist of it is if the information is to be used by machines only then go flat all the way.

Hierarchical naming scheme

This scheme as the name implies allows a form of hierarchy to be encoded in the name. In a file system it is easy to access a file or some website page by a path. It is ordered, grouped and the names make sense thus it feels natural to us humans.

Of course we still have to take care of the name-resource mapping. For example the DNS system makes use of the so called nameservers which can synchronize between themselves in order to retrieve the IP address given the name.

Time

Keeping track of time is one of the most important aspects in distributed systems, from cryptography, to expiring tokens, to databases … time keeping is everywhere.

Architectures that live only on one system can rely on a monotonic clocks provided by the Operating System to synchronize themselves. But in a system spread across multiple timezones ensuring that all the clusters are in sync it is of utmost importance. Companies like Meta and Google even have their own atomic clocks, that’s how important time keeping is.

But if you don’t have access to an atomic clock don’t worry, there are alternatives to that. Probably the most important one is NTP or Network Time Protocol, a hierarchical distributed system where multiple devices (GPS, atomic clocks, radio clocks) broadcast their measured time to multiple layers of computers known as Stratums. Clients can connect via NTP to these pools and receive time measurments to further synchronize their own clocks.

Coordination

Election

Distributed systems sometimes must self coordinate and it usually involves a system/process that is elected to coordinate all the other processes such as in the case of Distributed Commit. One of the easiest algorithm to understand is the Bully algorithm. Another interesting one is Raft which is quite used in production.

The idea is that we use election algorithms to let a dynamic system choose a coordinator from a bunch of other nodes. Usually there’s one coordinator in a cluster and the other nodes act as followers which are instructed by the coordinator what to do and when to do it. An election algorithm should perform well even in the presence of failure. To achieve that we can always have the coordinator ping the followers regularily and when a follower stops receiving those pings it can start a new election session.

Security

I’ll keep this short: don’t roll your own crypto software, it’s too easy to mess things up, even projects like OpenSSL sometimes get it wrong. If your system has to deal with sensitive data make sure it is encrypted, both on the storage medium and also while in transit. Make use of certificates to prevent forgery. Stay up to date when it comes to vulnerabilites as the library you’re using might be compromised.

Conclusion

I want to emphasize that I did not touch on all possible topics related to Distributed Systems, it is simply a topic too large to cover in a few pages. The subjects touched are some of the roadblocks I had to think of when dealing with a distributed system. To develop an understanding I highly recommend browsing through Distributed Systems book and also on arxiv’s Distributed, Parallel, and Cluster Computing section.

Cheers!