This chapter discusses the purpose of event streams, how to process them, and some key differences between them and batch processing. Message brokers process event streams, and there are two […]
Category: Designing Data-Intensive Applications
Chapter 10: Batch processing
This chapter starts by discussing Unix tools and pipes and how they are composed to solve larger problems. Later, the same principles are derived to explain how MapReduce and other […]
Chapter 9: Consistency and Consensus
This chapter, as the title says, touches on both consistency and consensus from different angles. The first concept it dishes out is about linearizability, a consistency model that tries to […]
Chapter 8: The Trouble with Distributed Systems
In distributed systems, things will be wrong and the systems engineers should be creating mechanisms to avoid that a failure in components affects the whole availability of the system. We […]
Chapter 7: Transactions
Many things can go wrong with systems, such as: The software or hardware might fail at the middle of an operation Network failures can cause unexpected cut offs in the […]
Chapter 6: Partitioning
Partition is the technique of splitting the data, or shard, across many database instances. It is mainly intended to reduce the load on single nodes, so the developer must ensure […]
Chapter 5: Replication
As mentioned in the introduction of part II, the reasons to replicate data are: Reducing latency by leaving data close to users Allow users to continue their work even if […]
Part II: Distributed Data
This part of the book talks about data that lives in multiple nodes. An application that needs to be distributed usually has to do so due to some factors, such […]
Chapter 4: Encoding and Evolution
One thing is sure, the application will change over time and the data schema underlying it certainly will too. Changing data schema, sometimes an ALTER table, or adding a new […]
Chapter 3: Storage and Retrieval
Databases, at the most fundamental level, should do two things well with your data: store and be able to retrieve it later. The simplest database possible, considering a key-value one, […]