Confirm Download

Simultaneous file downloading works best in FireFox, Chrome, and Safari browsers. Please keep this window open until all downloads are complete. Some customers prefer using file download managers for Chrome or Firefox to better manage large and multiple file downloads.

Cancel Download

Distributed Systems in One Lesson

    • Welcome  (Free)

      In this segment, I explain what you can expect from this video course.
    • 00:10:41

    • Read Replication  (Free)

      At scale, a single database server will always fall down. We can begin scaling the system by replicating data to subordinate servers that share the load.
    • 00:04:57

    • Sharding

      Read replication helps get load off of the master, but what happens when a single master can't accommodate write traffic anymore? We slice up the data and create many masters.
    • 00:08:45

    • Consistent Hashing

      When we know we're building a distributed database, approaches like read replication and sharding begin to seem like hacks. Consistent hashing provides a more elegant solution to the problem of distributed data.
    • 00:19:27

    • CAP Theorem

      A theorem proved in just 2002 explains why we can't have it all when designing distributed data stores.
    • 00:12:08

    • Distributed Transactions

      Single-master databases popularly offer robust transactional capabilities, but these are hard to scale to distributed systems. Let's review ACID transactions and see why they are difficult to distribute.
    • 00:16:54

    • Distributed Computation Introduction

      Once we've distributed our data among many independent computers, we will want to perform computations over that data. In this segment, we review the common approaches to the task.
    • 00:04:31

    • Map Reduce

      MapReduce has ruled distributed computation since around 2005. In this segment, we look at it as an abstract computation pattern apart from any specific implementation.
    • 00:12:24

    • Hadoop

      Hadoop has been a tremendously successful implementation of the MapReduce pattern, all but spawning the BigData industry. This segment is a lightning overview of core Hadoop.
    • 00:22:53

    • Spark

      Spark is a relative newcomer to the world of distributed computing, but the present-day software architect working with Big Data can't afford not to know it. It presents a more flexible programming model and has fewer opinions about storage than Hadoop.
    • 00:18:22

    • Storm

      Hadoop and Spark are great at running computations over stationary data, but what about events? Storm is optimized to perform real-time computations over a stream of event data.
    • 00:16:27

    • Lambda Architecture

      We can't build a single system that meets all possible demands for completeness and latency, so why not build two systems and ask them to process the same event stream? This is the Lambda Architecture.
    • 00:08:35

    • Synchronization

      By definition, distributed systems lack a single clock. Therefore, the concept of "the same time" becomes a serious challenge. We look at several approaches to this problem.
    • 00:04:19

    • Network Time Protocol

      The Network Time Protocol is a distributed time-keeping protocol and set of worldwide infrastructure components. It is a ubiquitous component in desktop and server operating systems alike, and provides "good-enough" synchronization adequate for most cases.
    • 00:08:09

    • Vector Clocks

      When "good enough" synchronization isn't good enough, vector clocks offer a formally valid means of establishing the sequence of a series of distributed events.
    • 00:12:26

    • Distributed Consensus: Paxos

      Getting a distributed system to agree on a new proposal is a nontrivial problem, and is formally impossible in almost all practical cases. Nevertheless, the Paxos algorithm serves as the backbone of distributed consensus.
    • 00:16:32

    • Messaging Introduction

      Messaging is a time-honored integration pattern that allows architects to couple events in one subsystem to another subsystem--loosely. It helps reduce the impact of subsystem unavailability and differential computational cost between production and consumption.
    • 00:06:24

    • Kafka

      Apache Kafka is a messaging system designed from the ground up as a distributed system. This brief overview will show you Kafka's basics, from internal architecture to programming model.
    • 00:16:13

    • Zookeeper  (Free)

      Zookeeper is a distributed in-memory database that sits at the heart of many open-source distributed systems. In this segment, we'll look at its data model and explore some common uses cases.
    • 00:18:56

    • Wrap-Up

      A review of the material we've covered in the rest of the course.
    • 00:06:12

Distributed Systems in One Lesson

  • Publisher: O'Reilly Media
  • Released: June 2015
  • Run time: 4 hours 8 minutes

Simple tasks like running a program or storing and retrieving data become much more complicated when you do them on collections of computers, rather than single machines. Distributed systems have become a key architectural construct, but they affect everything a program would normally do.

Using a series of examples taken from a fictional coffee shop operation, this video course with Tim Berglund helps you explore five key areas of distributed systems, including storage, computation, timing, communication, and consensus. You’ll also learn about some distributed programming paradigms.

If you’re an experienced developer looking to sharpen your architectural skills—particularly with regard to big data—this is one course you shouldn’t miss.

  • Dive into the five main problems areas in distributed systems—storage, computation, messaging, timing, and consensus
  • Understand key challenges that emerge in each of these areas as you move from single-processor to a distributed architecture
  • Discover one or more common open-source products that address each problem area

Tim Berglund is a full-stack generalist and passionate teacher who loves coding, presenting, and working with people. He’s the founder and principal software developer at August Technology Group, a technology consulting firm focused on the JVM. Tim is an international speaker and co-presenter of the bestselling McCullough and Berglund on Mastering Git (O’Reilly).

About the O’Reilly Software Architecture Series

Clearing a path from developer to architect and enriching that path once you arrive. Software architecture is a fast-moving, multidisciplinary subject where entire suites of "best practices" become obsolete practically overnight. No single path or curriculum exists, and different types of architecture—application, integration, enterprise—require different subject emphasis. Whether you’re at the outset of a career as an architect or in the midst of such a career, series editor Neal Ford has curated this collection of tools and guides for aspiring and seasoned architects alike.