Strata + Hadoop World 2016 - Singapore

      • A smarter ecosystem through big data analytics - Wei Keong Ng (Fusionex)

        Elevating the intelligence of the whole ecosystem of things is imperative to ensure the proliferation of IoT devices doesn't result in disparate, detached machinery. Wei Keong Ng explains how Fusionex intends to achieve this by leveraging robust, intuitive BDA solutions with real-time capabilities.
      • 00:12:38

      • Information at the speed of thought - Prakash Nanduri (Paxata)

        Today, we are able to collect, manage, and query data with very advanced data processing frameworks like Apache Hadoop in both on-premises and cloud deployments, yet turning data into trustworthy information is one of the toughest challenges facing businesses. Prakash Nanduri explains how to deal with the challenge of data swamps of #deplorable data.
      • 00:03:44

      • Real-time intelligence gives Uber the edge - M. C. Srivas (Uber)

        M. C. Srivas covers the technologies underpinning the big data architecture at Uber and explores some of the real-time problems Uber needs to solve to make ride sharing as smooth and ubiquitous as running water, explaining how they are related to real-time big data analytics.
      • 00:17:53

      • Taking personalization personally - Sara Watson (Tow Center for Digital Journalism)

        Most consumer-facing personalization today is rudimentary and coarsely targeted at best, and designers don’t give users cues for how they are meant to interact with and interpret personalized experiences and interfaces. Sara Watson makes the case for personalization signals that give context to personalization and expose levers of control to users.
      • 00:24:00

      • The ACID revolution - Vijay Narayanan (Microsoft)

        Vijay Narayanan explains how rapid advances in algorithms, the cloud, the Internet of Things, and data are driving unimaginable breakthroughs in every human endeavor, across agriculture, healthcare, education, travel, smart nations, and more.
      • 00:12:11

      • The new dynamics of Big Data - Amr Awadallah (Cloudera, Inc.)

        Since its inception, big data solutions have best been known for their ability to master the complexity of the volume, variety, and velocity of data. But as we enter the era of data democratization, there’s a new set of concerns to consider. Amr Awadallah discusses the new dynamics of big data and explains how a renewed approach focused on where, who, and why can lead to cutting-edge solutions.
      • 00:08:50

      • Dealing with device data - Mark Madsen (Third Nature)

        In 2007, a computer game company decided to jump ahead of competitors by capturing and using data created during online gaming, but it wasn't prepared to deal with the data management and process challenges stemming from distributed devices creating data. Mark Madsen shares a case study that explores the oversights, failures, and lessons the company learned along its journey.
      • 00:42:08

      • Act on insight with the IoT - Devin Deen (enterprise IT) and Dnyanesh Prabhu (SKY TV nz)

        Embedding operational analytics with the IoT enables organizations to act on insights in real time. Devin Deen and Dnyanesh Prabhu walk you through examples from Sky TV and NZ Bus—two businesses that iteratively developed their analytic capabilities integrating the IoT on Hadoop, allowing people and process changes to keep pace with technical enablement.
      • 00:39:59

      • Evolving from RDBMS to NoSQL + SQL - Jim Scott (MapR Technologies, Inc.)

        Everyone is talking about data lakes. The intended use of a data lake is as a central storage facility for performing analytics. But, Jim Scott asks, why have a separate data lake when your entire (or most of your) infrastructure can run directly on top of your storage, ​minimizing or ​eliminating the need for data movement, separate​ processes and​ clusters​,​ and ETL?
      • 00:32:08

      • What's your data worth? - John Akred (Silicon Valley Data Science)

        The unique properties of data make it difficult to assess its overall value using traditional valuation approaches. John Akred discusses a number of alternative approaches to valuing data within an organization for specific purposes so that you can optimize decisions around its acquisition and management.
      • 00:44:13

      • Case studies of business transformation through big data - John Kreisa (Hortonworks)

        The opportunity to harness data to impact business is ripe, and as a result, every industry, every organization, and every department is going through a huge change, whether they realize it or not. John Kreisa shares use cases from across Asia and Europe of businesses that are successfully leveraging new platform technologies to transform their organizations using data.
      • 00:40:08

      • The fallacy of the subject-matter expert - Chris Neumann (The Engineer & The Designer)

        For decades, business intelligence companies have strived to make their products easier to use in the hope that they could finally reach the mythical subject-matter expert—that wondrous individual who would change the course of the company if only she had access to the data she needed. Drawing on his real-world experience, Chris Neumann asks, "What if the subject-matter expert doesn’t exist?"
      • 00:26:59

      • Fast deep learning at your fingertips - Nir Lotan (Intel)

        Nir Lotan describes a new, free software tool based on existing deep learning frameworks that enables the fast and easy creation of deep learning models and incorporates extensive optimizations that provide high performance on standard CPUs.
      • 00:32:21

      • Web-scale machine learning on Apache Spark - Jason (Jinquan) Dai (Intel)

        Jason Dai shares his experience building web-scale machine learning using Apache Spark—focusing specifically on "war stories" (e.g., in-game purchase, fraud detection, and deep leaning)—outline best practices to scale these learning algorithms, and discuss trade-offs in designing learning systems for the Spark framework.
      • 00:32:20

      • A survey of time series analysis techniques for sensor data - Rajesh Sampathkumar (The Data Team)

        One challenge when dealing with manufacturing sensor data analysis is to formulate an efficient model of the underlying physical system. Rajesh Sampathkumar shares his experience working with sensor data at scale to model a real-world manufacturing subsystem with simple techniques, such as moving average analysis, and advanced ones, like VAR, applied to the problem of predictive maintenance.
      • 00:41:46

      • Deep learning at scale - Mateusz Dymczyk (H2O.ai)

        Deep learning has made a huge impact on predictive analytics and is here to stay, so you'd better get up to speed with the neural net craze. Mateusz Dymczyk explains why all the top companies are using deep learning, what it's all about, and how you can start experimenting and implementing deep learning solutions in your business in only a few easy steps.
      • 00:41:31

      • Robust stream processing with Apache Flink - Aljoscha Krettek (data Artisans)

        Aljoscha Krettek offers a very short introduction to stream processing before diving into writing code and demonstrating the features in Apache Flink that make truly robust stream processing possible. All of this will be done in the context of a real-time analytics application that we'll be modifying on the fly based on the topics we're working though.
      • 00:42:43

      • Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 1

        Apache Spark is written in Scala. Hence, many—if not most—data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs.
      • 00:33:27

      • Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 2

        Apache Spark is written in Scala. Hence, many—if not most—data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs.
      • 00:52:24

      • Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 3

        Apache Spark is written in Scala. Hence, many—if not most—data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs.
      • 00:36:18

      • Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 4

        Apache Spark is written in Scala. Hence, many—if not most—data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs.
      • 00:37:51

      • The business case for Spark, Kafka, and friends - John Akred (Silicon Valley Data Science)

        Spark is white-hot, but why does it matter? Some technologies cause more excitement than others, and at first the only people who understand why are the developers who use them. John Akred offers a tour through the hottest emerging data technologies of 2016 and explains why they’re exciting, in the context of the new capabilities and economies they bring.
      • 00:42:18

      • Scala and the JVM as a big data platform: Lessons from Apache Spark - Dean Wampler (Lightbend)

        The success of Apache Spark is bringing developers to Scala. For big data, the JVM uses memory inefficiently, causing significant GC challenges. Spark's Project Tungsten fixes these problems with custom data layouts and code generation. Dean Wampler gives an overview of Spark, explaining ongoing improvements and what we should do to improve Scala and the JVM for big data.
      • 00:44:07

      • Apache Beam: A unified model for batch and streaming data processing - Dan Halperin (Google)

        Apache Beam (incubating) defines a new data processing programming model evolved from more than a decade of experience building big data infrastructure within Google. Beam pipelines are portable across open source and private cloud runtimes. Dan Halperin covers the basics of Apache Beam—its evolution, main concepts in the programming mode, and how it compares to similar systems.
      • 00:42:24

      • Organizing the data lake - Mark Madsen (Third Nature)

        Building a data lake involves more than installing and using Hadoop. The goal in most organizations is to build multiuse data infrastructure that is not subject to past constraints. Mark Madsen discusses hidden design assumptions, reviews design principles to apply when building multiuse data infrastructure, and provides a reference architecture.
      • 00:44:50

      • Next-generation data governance - Clara Fletcher (Accenture)

        Implementing a data governance strategy that is agile enough to take on the new technical challenges of big data while being robust enough to meet corporate standards is a huge, emerging challenge. Clara Fletcher explores what next-generation data governance will look like and what the trends will be in this space.
      • 00:26:32

      • Modern telecom analytics with streaming data - Ted Dunning (MapR Technologies)

        Modern telecommunications are alphabet soups that produce massive amounts of diagnostic data. Ted Dunning offers an overview of a real-time, low-fidelity simulation of the edge protocols of such a system to help illustrate how modern big data tools can be used for telecom analytics. Ted demos the system and shows how several tools can produce useful analytical results and system understanding.
      • 00:38:37

      • Evolution of big data analytics - KC Wong (Fusionex)

        Organizations used to store information in separate silos. As a result, searching for the data you needed was a difficult affair. KC Wong explores big data analytics (BDA) platforms that can produce what you need in a much shorter timeframe and are even intelligent enough to present exactly what you need for greater efficiency, productivity, and profits.
      • 00:28:52

      • A stream-first approach to drive real-time applications - Ted Dunning (MapR Technologies)

        Ted Dunning explains how a stream-first approach simplifies and speeds development of applications, resulting in real-time applications that have significant impact. Along the way, Ted contrasts a stream-first approach with existing approaches that start with an application that dictates specialized data structures, ETL activities, data silos, and processing delays.
      • 00:38:45

      • Stopping your data lake from becoming a swamp - Steve Jones (Capgemini)

        Garbage in, garbage out—this truism has become significantly more impactful for big data as companies have moved away from traditional schema-based approaches to more flexible and dynamic file system approaches. Steve Jones explains how to add governance, schema evolution, and the industrialization required to deliver true enterprise-grade big data solutions.
      • 00:42:21

      • Evolving beyond the data lake - Jim Scott (MapR Technologies, Inc.)

        Everyone is talking about data lakes. The intended use of a data lake is as a central storage facility for performing analytics. But, Jim Scott asks, why have a separate data lake when your entire (or most of your) infrastructure can run directly on top of your storage, ​minimizing or ​eliminating the need for data movement, separate​ processes and​ clusters​,​ and ETL?
      • 00:42:42

      • OpenStreetMap for urban resilience - Yantisa Akhadi (Humanitarian OpenStreetMap Team)

        The use of maps in disaster response is evidently important. Yantisa Akhadi explores how to use OpenStreetMap (OSM), the biggest crowdsourced mapping platform, for safer urban environments, drawing on case studies from several major cities in Indonesia where citizen and government mapping has played a major role in improving resilience.
      • 00:30:26

      • Government open data: Tales from a deep dive into CKAN - Audrey Lobo-Pulo (Phoensight)

        In early 2016, a team set out to score the usability of government open data across 5 countries. What was to be a small-scale project giving a data-driven picture of the supply side of open data grew into a lengthy, all-consuming quest to decipher the depths of government CKAN repositories. Audrey Lobo-Pulo shares the team's findings and explores the future possibilities of open data.
      • 00:29:19

      • Fast cars, big data: The Internet of Formula 1 Things - Asit Parija (MapR Technologies)

        Modern cars produce data. Lots of data. And Formula 1 cars produce more than their fair share. Asit Parija presents a demo of how data streaming can be applied to the analytics problems posed by modern motorsports. Although he won't be bringing Formula 1 cars to the talk, Asit demonstrates a physics-based simulator to analyze realistic data from simulated cars.
      • 00:30:26

Strata + Hadoop World 2016 - Singapore

  • Publisher: O'Reilly Media
  • Released: December 2016
  • Run time: 62 hours 47 minutes

The 2016 Strata + Hadoop World conference in Singapore was an amazing rojak experience in the best sense of the term. It provided a wide-ranging collection of the globe’s most insightful information about big data and machine learning and how these technologies are reshaping the world’s businesses, institutions, and governments. This video compilation gives you a complete recording of each of the conference’s 73 sessions, 8 tutorials, and 10 keynotes covering topics like big data in retail, finance, and telecommunications; real-time IoT analytics; recommendation algorithms; high-efficiency AI and ML distributed systems; ubiquitous computing; collaboration; peer analytics; Apache Beam; Apache Flink; structured streaming; and much, much more. Get this video and you’ll enjoy the opportunity to learn from 122 of the world's best data engineers and data scientists working in Asia and elsewhere at data-centric companies such as Singtel, ShopBack, IHI, Lazada, Mediacorp, Qunar, Cloudera, MapR, Microsoft, Cisco, Teradata, BT Group, Google, IBM, Qubole, StarHub, SKY TV, Capgemini, and NTT Data. So Singapore, so rojak!

  • Gain total access to all 73 sessions, 8 tutorials, and 10 keynotes: almost 80 hours of material
  • John Akred (Silicon Valley Data Science) on how to develop a modern enterprise data strategy
  • Qiaoliang Xiang (Shopback) on handling 25M e-commerce products with Hadoop-related tools
  • Dean Wampler (Lightbend) on the core features of Scala necessary to write Spark code
  • Rebecca Tien Yu Lin (is-land Systems) on big data solutions in the semiconductor industry
  • Haoyuan Li (Alluxio Co-Creator) on Alluxio use cases at Alibaba, Baidu, and elsewhere
  • Sean Owen (Cloudera) on doing full Python development on the Hadoop stack at Hadoop scale
  • Vivian Peng (Médecins Sans Frontières) on designing human emotion into data visualizations
  • Get 24 sessions on becoming a data science company, data science technology, and data analytics
  • Get 12 sessions related to Apache Spark, including the highly popular 8-hour Spark Camp tutorial
  • Get 10 sessions on IoT and intelligent real-time applications; and 9 sessions on ML and AI
  • Get multiple sessions on Hadoop use cases; VR and visualization; and security and data law