x

Confirm Download

Simultaneous file downloading works best in FireFox, Chrome, and Safari browsers. Please keep this window open until all downloads are complete. Some customers prefer using file download managers for Chrome or Firefox to better manage large and multiple file downloads.

Cancel Download

Scalable Machine Learning

      • Some Machine Learning Background  (Free)

        We will go through general concepts behind machine learning like fitting a model to data, and some basic models like logistic regression or SVMs, and discuss what kind of algorithms are typically used in "small scale" problems.
      • 00:12:29

      • Algorithms for Large Scale Learning  (Free)

        We will review common approaches to learn in a large scale setting. We mostly focus on supervised learning and will discuss algorithms like stochastic gradient descent, and higher order methods. This segment should give you a general idea of the different approaches which exist and what their main ideas are.
      • 00:20:09

      • Overview of Hadoop and Current Big Data Systems

        We will cover the basic architecture of Hadoop based systems, and newer data flow based systems, and finish off with some comments with respect to real time, and streaming architecture patterns like the lambda or kappa architecture.
      • 00:14:00

      • How Programming for Data Flow Differs

        We will take a look at how algorithms have to be implemented to work well with data flow systems like Spark, and contrast that with the programming in more conventional environments where one has variables, ordered arrays, and so on. We will put a focus here on tasks like implementing machine learning algorithms, which frequently deal with vectors and matrix data.
      • 00:16:11

      • Basic Spark

        In this segment, we will do some first steps in Spark, working with the Resilient Distributed Dataset structures. This segment also introduces some concepts of Scala to people who are new to the language.
      • 00:19:13

      • Working with Vectors and Matrices in Spark

        We will discuss the options to deal with vectorial data in Spark and live code a simple implementation of least squares regression. Here we will also go beyond the console and set up a simple sbt-based Scala project, and do some coding in the IDE.
      • 00:34:53

      • A Brief Tour of Spark ML

        We will take a look at the machine learning libraries which come with Spark, show their basic structure, and show how to run basic machine learning tasks. This segment will cover both the older mllib, but also the newer ml part of the library.
      • 00:29:40

      • Approximation is the Key

        We discuss how approximation is not only important in an optimization context, but actually has many other aspects, for example -- to compress feature spaces, or approximate counting.
      • 00:15:34

      • Practical Big Data

        We will discuss different aspects which hold for data analysis in general, and big data in particular, but which are not often covered explicitly when one is focusing on methods. In particular, we will cover the role of evaluation, feature extraction, and model selection computing costs.
      • 00:06:56

      • Size vs. Complexity

        We will close the video with some general considerations on what the relationship is between the amount of available data, and the complexity of the learning problem. As it turns out, just having a lot of data doesn't often mean that all of the data is strictly necessary. We will also discuss what makes data complex.
      • 00:05:06

      • Summary

        We will go through the main lessons from the video one more time, and give some hints on what to look into to get further into the topic.
      • 00:02:53

Scalable Machine Learning

  • Publisher: O'Reilly Media
  • Released: December 2015
  • Run time: 3 hours 8 minutes

Machine-learning expert Mikio Braun moves budding data scientists into the world of big data with this overview of how to do complex data analysis at scale. You'll learn the general concepts behind machine learning, compare small scale and large scale data analysis algorithms, and review the basics of the architectures used in large-scale distributed processing. You'll then explore the use of Spark programming for data flow systems,and the many uses of approximation. Braun also outlines evaluation, feature extraction, and model-selection computing costs in big data analysis. The video closes with a discussion of the relationship between the amount of available data and the complexity of the learning problem.

  • Review machine learning concepts such as fitting a model to data
  • Learn core concepts behind large scale algorithms like stochastic gradient descent
  • Review the architectures used in Hadoop-based systems and data flow systems
  • Explore resilient distributed dataset structures, vectors, and matrices using Spark
  • Review Sparks’s machine libraries and how to run basic machine learning tasks
  • Understand the use of approximation in optimization and compressing feature spaces
  • Learn what makes data “complex”

Mikio Braun is a data scientist researcher, a start-up entrepreneur, and the on-going creator of jblas, the open source library for fast linear algebra in Java. He has a Ph.D. in Computer Science, and works at Zalando.