Wednesday, September 6, 2017

Book: Designing Data-Intensive Applications

 

Was reading this during a long flight. Interesting review of the design decisions behind applications dealing with much (very much!) data.

Martin Kleppmann, Designing Data-Intensive Applications, 2017.

Contents:

Foundations of Data Systems
Chapter 1 Reliable, Scalable, and Maintainable Applications
    Thinking About Data Systems
    Reliability
    Scalability
    Maintainability
Chapter 2 Data Models and Query Languages
    Relational Model Versus Document Model
    Query Languages for Data
    Graph-Like Data Models
Chapter 3 Storage and Retrieval
    Data Structures That Power Your Database
    Transaction Processing or Analytics?
    Column-Oriented Storage
Chapter 4 Encoding and Evolution
    Formats for Encoding Data
    Modes of Dataflow
    Distributed Data
Chapter 5 Replication
    Leaders and Followers
    Problems with Replication Lag
    Multi-Leader Replication
    Leaderless Replication
Chapter 6 Partitioning
    Partitioning and Replication
    Partitioning of Key-Value Data
    Partitioning and Secondary Indexes
    Rebalancing Partitions
    Request Routing
Chapter 7 Transactions
    The Slippery Concept of a Transaction
    Weak Isolation Levels
    Serializability
Chapter 8 The Trouble with Distributed Systems
    Faults and Partial Failures
    Unreliable Networks
    Unreliable Clocks
    Knowledge, Truth, and Lies
Chapter 9 Consistency and Consensus
    Consistency Guarantees
    Linearizability
    Ordering Guarantees
    Distributed Transactions and Consensus
    Derived Data
Chapter 10 Batch Processing
    Batch Processing with Unix Tools
    MapReduce and Distributed Filesystems
    Beyond MapReduce
Chapter 11 Stream Processing
    Transmitting Event Streams
    Databases and Streams
    Processing Streams
Chapter 12 The Future of Data Systems
    Data Integration
    Unbundling Databases
    Aiming for Correctness
    Doing the Right Thing