Confirm Download

Simultaneous file downloading works best in FireFox, Chrome, and Safari browsers. Please keep this window open until all downloads are complete. Some customers prefer using file download managers for Chrome or Firefox to better manage large and multiple file downloads.

Cancel Download

Strata Conference New York 2011: Video Compilation

      • Jumpstart - Opening Remarks: The Harsh Light of Data - Alistair Croll  (Free)

        The business of running businesses is getting a wake-up call. From flattened hierarchies to an explosion of information, Big Data is reinforcing some old business assumptions and upending others. If youve got a business degree, and you havent figured out how to apply Big Data to administration, youre vulnerable. In this opening session, Strata chair Alistair Croll sets the stage for JumpStart with an overview of business in a data-driven economy.
      • 00:12:01

      • Jumpstart - Big Data, Big Legal Impact - Nolan Goldberg  (Free)

        Emerging technologies such as big data can change both the scope and practical impact of the law. For example, a Court in California recently held for the first time that a zip code was personally identifiable information within the meaning of the states Song Beverly Credit Card Act. Prior case law treated zip codes as identifying only groups of people at best and therefore zip codes were not personally identifiable information and requesting zip codes at the point of sale did not appear to be prohibited. The California Court, however, was forced to depart from precedent after acknowledging that easily accessible de-anonymization technologies now enabled retailers to use zip codes to determine exact home addresses. Accordingly, the well established practice of asking for zip codes at points of purchase needed to stop (and a flood of new lawsuits followed). Putting the above example aside, Big Data technologies have the potential to require changes in important areas of the law, such as intellectual property, e-discovery, data ownership and privacy. This session will explore both the potential impact of big data on the law generally and some of the specific legal issues of concern when monetizing big data.
      • 00:28:10

      • Jumpstart - What Kinds of People and Processes are Needed for Data Management and Analytics? - Cathy O'Neil  (Free)

        Data management teams need strong cloud computing and database management skills, and proficiency with tools like Hadoop, mapreduce jobs and SQL queries. The analysts need to be deep thinkers and creative modelers with experience in machine learning and financial modelingideally both. Being model- and data-driven means overnight data-crunching to produce daily data reports, which often lead to more overnight questions. There are also inherent difficulties in talking to clients and internal stakeholders when inherently unstable data and statistics are key tools for decision-making. From a process standpoint, we need start asking new kinds of questions that Big Data is opening up for the first time. Speaker Cathy ONeill will use her unique experience in finance, which is the field that is the most developed in terms of modeling, to explain how she sees todays business world as relatively unsophisticated and 'spoiled for data.' Shell explain various techniques that financial analysts employ to improve models, and reconsider the practice of A/B testing in a model-driven world.
      • 00:37:51

      • Jumpstart - Big Data, Stupid Decisions: The Importance of Measuring the Right Thing - Panagiotis Ipeirotis  (Free)

        The abundance of data made it easy to collect and analyze data from online user behavior and discussions. Although it is encouraging to see businesses using such data and making data-driven decisions, we often see decisions based on flawed analyses, or simply using data that measure the wrong things. Panos will illustrate cases, where simple reading of available data points into one conclusion, while a deeper study uncovers a different truth. Examples will be drawn from online reputation systems, online reviews, crowdsourcing, and other case studies.
      • 00:32:18

      • Jumpstart - Situation Normal, Everything Must Change - Simon Wardley

        Life is all about competition whether nation states, organizations or individuals. This competition invokes a process of evolution and whilst the future cannot be predicted with any certainty, patterns are emerging. Two of the latest patterns include the use of ecosystems in the warfare between organizations and how open source is increasingly becoming deployed as a tactical weapon. This talk will explore these patterns, the underlying process of evolution, the new management practices that are appearing and the importance of big data in this new environment.
      • 00:28:57

      • Jumpstart - Applying Lean Methods to Fat Companies - Hiten Shah

        Hiten Shah is the founder of KISSmetrics, which uses data to help online businesses make better business decisions. A recognized authority on data-driven marketing and entrepreneurship, he founded marketing consultancy ACS and Crazy Egg, an analytics tool that visualizes users experience on a website. Now with KISSmetrics he is building a data driven solution for to help online businesses make better business decisions. Hiten advises a variety of startups, including iSocket,LaunchRock, Lolapps, Recurly,SlideShare, SocializedHR and Sponge.
      • 00:35:28

      • Jumpstart - NBA (Next Best Action) for MBAs - James Kobielus

        Customers are the reason youre in business. To earn their continued love and loyalty, you must strive to enhance their experiences across all touchpoints. To earn a larger share of their pocketbooks, you must also target them with offers that suit their specific profiles, needs, and propensities. To these ends, leading-edge organizations have implemented next best action (NBA) technologies, such as Big Data analytics, within their multichannel customer relationship management programs. In this session, Forrester senior analyst James Kobielus will provide a vision, case studies, ROI metrics, and guidance for business professionals evaluating applications of NBA in their organizations.
      • 00:42:47

      • Jumpstart - The New Marketing - Jennifer Zeszut

        Few disciplines in business have changed as much as marketing. The cost of transmitting a message has dropped to zero, and the flow of content has flipped from one-way broadcast to two-way interaction. Consumers expect genuine interactions; marketers need to scale their numbers efficiently while still targeting with laser precision. In this session, Jennifer Zeszut, a 15-year veteran of marketing at brands like Proctor and Gamble, eBay, and Cost Plus who honed her tech skills at Razorfish and launched sentiment analysis pioneer Scout Labs, looks at the new marketing in a world of numbers and analysis.
      • 00:21:10

      • Jumpstart - People Analytics: Using Data to Drive HR Strategy and Action - Kathryn Dekas

        Google does HR by the numbers. The companys ambitious Project Oxygen initiative compiled management data across managers and employees, then analyzed and applied its findings to make teams more effective. In this session, Kathryn Dekas, a manager in Googles People Analytics department and Project Manager for Googles annual employee survey, looks at what it means to use data to drive decisions and action in the HR function, and examples of how this process can transform organizations.
      • 00:38:33

      • Jumpstart - Creating Your Transparency Strategy in the Age of Wikileaks - Michael Nelson

        At many companies, the total amount of data processed each year will double in less than two years. And the need to share that datawith employees and with partners, vendors, and customersis growing. The odds that corporate information will leak is growing rapidly; weve already seen dramatic examples from the US State Department, and a number of visible banks. Most executives first reaction to such leaks is to lock things down: Make sure this doesnt happen to me. But at some leading-edge companies, CIOs are realizing that their job is no longer about information technology; rather, its about how the information itself moves within the organization. By asking what needs to be protected? and what are the upsides of sharing? theyre reconsidering the idea of leakage. Theres never been a better time to reconsider transparency and talk about strategic leaking. In this session, Michael Nelson whose career has taken him from the White House to the boardrooms of the Fortune 500looks at the naked corporation and what information can do when it flows intentionally between companies and their ecosystems. His new report for the CSC Leading Edge Forum Research examines how radical transparency can be a powerful business tool, with companies sharing the previously unthinkablesalary, pricing, project roadmaps, and more.
      • 00:35:16

      • Jumpstart - Data Science and Building Data Teams - DJ Patil

        Given the importance of data to critical decision making, the demand for Data Scientists is at an all time high. Yet, what does it really mean to be a Data Scientist and to build a team of Data Scientists? Its a new world with startups to well established organizations struggling daily to recruit and structure their teams to best leverage those that can extract valuable insights from data. Unfortunately most organizations dont realize all the places where a good Data Science Team can make an impact. In this talk well walk through different organizational structures, how to empower, and key technologies required to establish a great Data Science team.
      • 00:37:05

      • Jumpstart - Closing Thoughts: The Data Imperative - Alistair Croll

        What does a data MBA do differently? Wrapping up JumpStart, well consider where businesses can find the low-hanging fruit, and how corporate culture will change as analytical thinking becomes the norm rather than the exception. As we face the data imperative, well look at what it means to run a data-driven organization from the boardroom to the mailroom.
      • 00:01:03

      • Strata Summit - Big Data: The Next Frontier - Michael Chui

        McKinseys influential Big Data report has helped define and explain the opportunity created by the torrent of data flowing daily through business. Michael Chui outlines the big picture of data innovation, challenges and competitive advantage.
      • 00:25:49

      • Strata Summit - Computing the World - Stephen Wolfram

        How does one take thousands of data domains, and tens of thousands of models and algorithms, and make it so that anyone can get answers to their natural language questions? Stephen Wolfram, the creator of Wolfram|Alpha and Mathematica, will describe how this works in Wolfram|Alpha and what the paradigm of computational knowledge is now making possible.
      • 00:24:35

      • Strata Summit - Lies, Damned Lies, and the Data Scientist - Monica Rogati

        When it comes to big data insights, how do you know youre asking the right questions? Hiring data scientists is a good start were seeing their growth both on LinkedIn and at LinkedIn. But even data scientists are not immune from the myriad of hidden pitfalls that keep your key insights out of sight. Drawing from a deceptively simple exercise that Ive used to haze dozens of data scientists on their first day, I will discuss the good, the bad and the ugly lessons weve learned about asking the right questions, denominators and being a data skeptic.
      • 00:08:53

      • Strata Summit - Juice Data Viz Contest winner

        OReilly Medias Julie Steele showcases the finalists and Best in Show winner from the Strata Vizathlon, a data visualization contest produced in partnership with Juice Analytics. (Note: the contest has already closed).
      • 00:04:12

      • Strata Summit - Managing IT Operations and Analyzing Big Data With HPCC Systems From LexisNexis - Armando Escalante

        HPCC Systems from LexisNexis Risk Solutions offers a proven, data-intensive supercomputing platform designed for the enterprise to solve big data problems. HPCC Systems offers a consistent data-centric programming language, two processing platforms and a single architecture for efficient processing. Customers, such as financial institutions, insurance carriers, insurance companies, law enforcement agencies, federal government and other enterprise-class organizations leverage the HPCC Systems technology through LexisNexis products and services. This keynote is sponsored by LexisNexis
      • 00:03:52

      • Strata Summit - UN Global Pulse - Robert Kirkpatrick

        The time has come for policymakers to begin using innovative technologies to analyze data exhaust, in order to protect communities from multiple slow-onset crises that threaten to reverse hard-won progress in human development.
      • 00:10:33

      • Strata Summit - Designing for Human Sensors, Not Human Barcodes - Cory Doctorow

        As they saying goes, If youre not paying for a product, then youre the product. But the world is a better place when human beings are enlisted as sensorsas distributed mechanisms for mapping, understanding, and connecting the world and the data it generatesand not treated as barcodes, mere bits of data to be read, logged and analyzed. This is a huge blind-spot in contemporary network service design, abetted by our own inability to correctly price privacy disclosures. There are tools, servicesand yes, even analyticswaiting to be invented and productized which will make our services into something better than Skinner boxes that train us to undervalue our data and our privacy.
      • 00:19:57

      • Strata Summit - Generating Stories from Data - Kristian Hammond

        As the world of data expands, the challenge of understanding it expands as well. Narrative Science is addressing this challenge with a software platform that uses data to drive the generation of compelling narratives that tell the stories contained within it. The technology tells the stories that are hidden in the numbers.
      • 00:09:14

      • Strata Summit - Transparency and Strategic Leaking - Michael Nelson

        Theres never been a better time to reconsider transparency and talk about strategic leaking. In this talk, Michael Nelsonwhose career has taken him from the White House to the boardrooms of the Fortune 500looks at the naked corporation and what information can do when it flows intentionally between companies and their ecosystems. His new report for the CSC Leading Edge Forum Research examines how radical transparency can be a powerful business tool, with companies sharing the previously unthinkablesalary, pricing, project roadmaps, and more.
      • 00:10:46

      • Strata Summit - The New Corporate Intelligence - Sean Gourley

        Disruptive technology shapes the world, defining political, military, financial, and commercial opportunities and threats. Whether originating in academic research, in National Labs, or in privately held or public companies, these technologies can emerge with explosive impact, creating and destroying value. Yet there are few tools to track these innovationsat a global scale and at a pace that keeps up with the rate of change. What if corporate strategists could literally draw a map to find growth opportunities? A technique called semantic clustering analysis makes this possible. When applied to technology entities worldwide, this analysis can reveal not only which innovation areas are thick with competition, but also where in the market there are opportunities, or white spaces, ripe for innovation. The result is a data-driven visual tool that can be used to drive corporate innovation strategy.
      • 00:11:06

      • Strata Summit - Big Data: Keep it Small, Stupid! - James Kobielus

        Big Data can become an unmanageable business burden if youre not careful. As your companys analytics initiatives rapidly grow, youre going to max out your IT budget if you dont keep the data as compact, compressed, and storage-efficient as possible. Just as critical, your users will find all the information far too massive to wade through if you dont deliver targeted subsets to their tablets, smartphones, and other devices for speedy consumption. In this session, Forrester senior analyst James Kobielus will help you understand how to keep your companys data as small and nimble as practical while scaling it out into the petabytes.
      • 00:21:14

      • Strata Summit - Sponsored Session: LexisNexis HPCC Systems Finds Potential Fraud and Collusion in Health Care - Bill Fox and Jo Prichard

        Case study will focus on the customer challenge, the data pieces leveraged and the results found. HPCC Systems from LexisNexis Risk Solutions offers a proven, data-intensive supercomputing platform designed for the enterprise to solve big data problems. HPCC Systems offers a consistent data-centric programming language, two processing platforms and a single architecture for efficient processing. Customers, such as financial institutions, insurance carriers, insurance companies, law enforcement agencies, federal government and other enterprise-class organizations leverage the HPCC Systems technology through LexisNexis products and services. This session is sponsored by LexisNexis
      • 00:39:34

      • Strata Summit - Big Data In Banking - Moderated by: Abhishek Mehta - Panelists: Roy E. Lowrance, Richie Prager, and Allen Weinberg

        Coming off the worst financial crisis of our generation, an entire industry (and the global economy) is not just rebuilding the very foundations of a robust banking system, but also rethinking the new normal for Banking. Data has rapidly emerged as a key lever for this reboot. Hear from some of the leading business practitioners in the industry: How tectonic shifts driven by Big Data technologies, emerging Business Models, and Macro Socio-Economic conditions is unleashing an unprecedented redesign of the Global Financial System? and Who are the emerging BIg Data players at the forefront of this shift?
      • 00:23:57

      • Strata Summit - Big Data Analytics Is Changing the World, and Your Business - Bill Cook

        The most important question to ask about big data is: whats in it for my organization? Does it create net-new economic value, increase enterprise agility, improve collaboration, or enable new efficiencies? And if so, in which scenarios? What are the implications for people, process and technologies in any given organization? In this keynote, Bill Cook, President of EMC Greenplum, will provide insight into todays business opportunity with big data analytics and three key steps to consider as you begin your journey. This keynote is sponsored by EMC Greenplum
      • 00:11:39

      • Strata Summit - Atmospheric Analytics - Michael Ferrari

        This is not your fathers weather forecast. Businesses across the commercial spectrum can be positively and negatively affected by weather conditions in deep and sometimes unanticipated ways. Whether they are surprise acute weather events (ie., weather black swans) or prolonged patterns that slowly enhance or curtail product demand, it is hard to find an industry that does not have some sort of operational or financial exposure to the atmosphere. Hear how companies in sectors ranging from banks managing commodity risk to home centers staging seasonal demand driven products are are analyzing weather in different ways to get ahead, and how they can tap into the unexploited possibilities of the treasure trove of government maintained weather data.
      • 00:17:14

      • Strata Summit - Why Dirty Data Loves A Crowd - Ariel Seidman

        We all need clean, useful, validated databut most of us dont have an easy way to gather it and the data we do have is messy. Over the last year Ive spent my time hanging with crowdscrowds of tech savvy, mobile-device enabled modern day distributed workers who are changing the way we should think about acquiring and managing data. These workers can convert a Masterpiece like Moby Dick into an artful expression by translating it into emojior they can walk across the nation validating in-store beer pricing, product placement and restaurant times. Hear some of the numbers behind these crowds and how startups like Gigwalk are changing the way we work.
      • 00:10:24

      • Strata Summit - The Business of Illegal Data: Innovation from the Criminal Underground - Marc Goodman

        While businesses around the world struggle to understand the how to profit from the information revolution, one class of enterprise has successfully mastered the challengeinternational organized crime. Globally crime groups are rapidly transforming themselves into consumers of big data. Lessons in how organized crime and terrorists are innovatively consuming both illegal and open source data will be presented.
      • 00:15:18

      • Strata Summit - Towards a Global Brain - Tim O'Reilly

        At the same time as were seeing breakthrough after breakthrough in artificial intelligence, were also seeing the fulfillment of the vision of Vannevar Bush, JCR Licklider, and Doug Engelbart that computers could augment human information retrieval and problem solving. AI turned out not to be a matter of developing better algorithms, but of having enough data. The key applications of the web combine machine learning algorithms with techniques for harnessing the collective intelligence of users as captured in massive, interlinked cloud databases. Bit by bit, this is leading us towards a new kind of global brain, in which we have met the AI, and it is us. We and our devices are its senses, our databases are its memory, its habits, and even its dreams. This global brain is still a child, but as its parents, we have a responsibility to think about how best to raise it. What should we be teaching our future augmented selves? How can we make the emerging global consciousness not only more resilient, but more moral?
      • 00:22:38

      • Strata Conference - A Profusion of Exoplanets: NASA's Kepler Mission - Jon Jenkins

        The Kepler Mission began its science observations just over two years ago on May 12, 2009, initiating NASAs first search for Earth-like planets. Initial results and light curves from Kepler are simply breath-taking, including confirmation of the first unquestionable rocky planet, Kepler-10b, and Kepler-11b, a system of 6 transiting planets orbiting one Sun-like star. Kepler released light curves for the first 120 days of observations for over 150,000 target stars on February 2, 2011, and announced the identification of over 1235 planetary candidates, including 68 candidates smaller than 1.25 Earth radii, and 54 candidates in or near the habitable zone of their parent star. An astounding 408 candidates orbiting 170 stars as planetary systems were found. Dr. Jenkins will discuss how much weve learned over the 24 months about the instrument, the planets and the stars.
      • 00:14:29

      • Strata Conference - 9/11 and The Weight of Data - Jer Thorp

        Almost every piece of data is tethered to something in the real world. When we work with numbers, we are often able (and willing) to ignore the real world objects and systems that these numbers represent. In this presentation, Jer Thorp will discuss his work with namesdesigning an arrangement algorithm for the 9/11 Memorial in Manhattan. Hell walk through collaborative processes, admit to a series of failures and ultimately show how humans and software can combine to solve extraordinary problems.
      • 00:16:43

      • Strata Conference - Simplifying Big Analytics for the Business - Randy Lea

        The opportunity exists for organizations in every industry to unlock the power of iterative, big data analysis for new applications such as digital marketing optimization and social network analysis that improve the bottom line. Big data analysis is not just the ability to analyze large volumes of data, but also the ability to analyze more varieties of data and perform more complex analysis than is possible with more traditional technologies. But it doesnt have to be as complicated as it sounds. This session will show you how you can bring the science of data to the art of business and empower more business users and analysts to operationalize insights and drive results. Youll see examples of how data science is applied by making emerging analytic technologies more accessible to businesses and easily managed by enterprise architects across retail, financial services, and media companies. This keynote sponsored by Aster Data
      • 00:10:38

      • Strata Conference - What is a Career in Big Data? - John Rauser

        Quantitative Engineer? Business Intelligence Analyst? Data Scientist? The data deluge has come upon us so quickly that we dont even know what to call ourselves, much less how to make a career of working with data. This talk examines the critical traits that lead to success by looking back to what may be the first act of data science.
      • 00:17:35

      • Strata Conference - Doing Good With Data: Data Without Borders - Jake Porway and Drew Conway

        Data scientists and technology companies are rapidly recognizing the immense power of data for drawing insights about their impact and operations, yet NGOs and non-profits are increasingly being left behind with mounting data and few resources to make use of it. Data Without Borders seeks to bridge this data divide by matching underserved NGOs with pro bono data scientists so that they can collect, manage, and analyze their data together in the service of humanity, creating a more open environment for socially conscious data and bringing greater change to the world.
      • 00:12:29

      • Strata Conference - First, Firster, Firstest - Mark Madsen

        History seems irrelevant in the software world, particular when dealing with lots of information. It isnt. Information explosions are not new. Theyve happened repeatedly throughout human history. A little looking will turn up prior incarnations of information management patterns and concepts that can be repurposed using todays technologies. The first person to conceive of something is usually not the first. Theyre the first to re-conceive at a point where the current technology caught up to someone elses idea. Were at a point today where many old ideas are being reinvented. Come hear why looking to the past beyond your core field of interest is worthwhile.
      • 00:17:07

      • Strata Conference - Announcement of the Winner of the First Heritage Health Prize Progress Prize - Richard Merkin

        Dr. Richard Merkin, President and CEO of Heritage Provider Network, is pleased to announce the winner of the first $3 million dollar Heritage Health Progress Prize. Responding to our countrys $2 trillion dollar health care crises, Dr. Merkin created, developed and sponsored the $3 million dollar Heritage Health Prize for predictive modeling to save more than $30 billion in avoidable hospitalizations. It is the largest predictive modeling prize in the world, larger than the Nobel Prize for Medicine and the Gates Prize for Health. Dr. Merkin is genuinely excited to bring new minds to the healthcare table with the prize and believes data miners hold great potential for not only bringing a winning algorithm, but also to grab the attention of data miners globally and raise awareness about competitive innovation, changing the world through healthcare delivery. Dr. Merkin will present the top two teams with $50,000 in the first progress prize, split as $30,000 and $20,000.
      • 00:02:56

      • Strata Conference - Health Empowerment through Self-Tracking - Anne Wright

        The BodyTrack project has interviewed a number of people who have improved their health by discovering certain foods or environmental exposures to avoid, or learning other types of behavioral changes. Many describe greatly improved quality of life, overcoming in some cases chronic problems in areas such as sleep, pain, gastrointestinal function, and energy levels. In some cases, a doctor or specialists diagnosis led to treatment which mitigated symptoms (e.g. asthma or migraine headache), but where discovery of triggers required self-tracking and self-experimentation. Importantly, the act of starting to search for ones sensitivities or triggers appears to be empowering: people who embarked on this path changed their relationship to their health situation even before making the discoveries that helped lead to symptom improvement. The BodyTrack Project is building tools, both technological and cultural, to empower more people to embrace an investigator role in their own lives. The core of the BodyTrack system is an open source web service which allows users to aggregate, visualize, and analyze data from a myriad of sourcesphysiological metrics from wearable sensors, image and self-observation capture from smart phones, local environmental measures such as bedroom light and noise levels and in-house air quality monitoring, and regional environmental measures such as pollen/mold counts and air particulates. We believe empowering a broader set of people with these tools will help individuals and medical practitioners alike to better address health conditions with complex environmental or behavioral components.
      • 00:14:51

      • Strata Conference - Big Data, Big Opportunity - Ken Bado

        Big Data is more than just volume and velocity. MarkLogic CEO Ken Bado will address why complexity is the key gotcha for organizations trying to outflank their competition by managing Big Data in real time. Learn how winners today are using MarkLogic to manage the complexity of their unstructured information to drive revenue and results.
      • 00:05:42

      • Strata Conference - Short URLs, Big Data: Learning About the World in Realtime - Hilary Mason

        The flow of data across the social web tells us what people, around the world, are paying attention to at any given moment. Understanding this flow is both a mathematical and a human problem, as we develop and adapt techniques to find stories in the data. Come hear about the expected and the surprises in the bitly data, as well as generalized techniques that apply to any realtime data system.
      • 00:12:40

      • Strata Conference - Calling for a New Paradigm: Machines Plus Humans - Arnab Gupta

        Its a well-known dichotomy: man versus machineand, depending on whos doing the talking, good (human) versus evil (machine). Today, as technology continues to evolve and machines are capable of ever more advanced processes and functions, the dichotomy is becoming even more pronounced. Look no further than IBMs Watson, an advanced artificial intelligence machine that squared off against Jeopardys best human contestants in 2011and won.
      • 00:16:29

      • Strata Conference - The Human Dimension: Organizational and Social Challenges of Business Analytics - John Lucker

        Herbert Simon once wrote that the central concern of administrative theory is with the boundary between rational and nonrational aspects of human social behavior. Simons comment is especially pertinent to the still-emerging field of business analytics. The human dimension of business analytics might facetiously be called the disciplines dark matter: it looms large while tending to remain hidden from view. In many and diverse domains, human experts must make decisions that require weighing together disparate pieces of information and are made repeatedly. Unfortunately, we are not very good at this. We rely on mental heuristics (rules of thumb), which as psychological research shows, have surprising biases that limit our ability to make truly objective decisions. The implication is society is replete with inefficient markets and business processes that can be improved with business analytics. Analytics projects are often bedeviled or simply stopped in their tracks by challenges emanating from organizational culture, misunderstanding of statistical concepts, and discomfort with probabilistic reasoning. Compounding these challenges is the fact that data scientists often speak a different language from the business domain experts that they are charged to help. In our experience, these challenges can be among the most difficult ones faced in an analytics project, and are ignored at ones peril. This talk will provide a number of case studies and vignettes; relate these examples to relevant ideas from the decision sciences; and offer practical tips for achieving organizational buy-in.
      • 00:36:41

      • Strata Conference - Marketing with Data - Joseph Adler

        Marketing is the art of telling potential customers or users about products or services that they might find useful. Some technology people might look down on marketing as a dirty, but necessary, part of running a company. Thats unfortunate, because marketing is one of the most interesting and valuable things that you can do with data. At LinkedIn, we look at marketing as a recommendation problem, not a sales problem. Our goal is to help our users get the most benefit from our service. We use a lot of data and technology to market our own services. To do this, we use a variety of big data systems: recommendation engines, data processing, and content delivery. We rely on a team of marketing professionals, designers, engineers, and data scientists. We approach marketing scientifically, and constantly test new hypotheses to learn how to market better. In this talk, Im going to describe LinkedIns approach to personalized marketing, using the story of the award-winning Year in Review email message. Ill talk about how we come up with ideas, how we test new ideas, and how we quickly turn ideas into scalable production processes. And finally, Ill talk about Tickle, the Hadoop based system that we built to generate and prioritize marketing email messages.
      • 00:35:30

      • Strata Conference - Data Prediction Competitions: What Archimedes and Roger Bannister Can Teach Us about the Business of Data - Jeremy Howard

        Crowdsourcing big data might sound like a randomly generated selection of buzz words, but it turns out to represent a powerful leap forward in the accuracy of predictive analytics. As companies and researchers are fast discovering, data prediction competitions provide a unique opportunity for advancing the state of the art in fields as diverse as astronomy, health care, insurance pricing, sports ratings systems and tourism forecasting. This session will focus not simply on the mechanics of data prediction competitions, but on why they work so effectively. As it turns out, the why boils down to a couple of simple propositions, one associated with Archimedes and the other with world record-breaking sprinter Roger Bannister. Those propositions are not unique to the world of data science, but, as this session will show, have a particularly compelling application to it.
      • 00:39:04

      • Strata Conference - Big Data, Emergency Management and Business Continuity - Jeannie Stamberger

        Information technology has been meeting disaster head on with new software, crowdsourcing inputs, and mapping tools gaining incredible potential since the Haiti earthquake. How big data really fits into benefiting disaster response both from a humanitarian relief and business continuity side has yet to mature. I will discuss needs (filtering, interfaces, real-time data processing) specifically for the unique sociological and extreme environment constraints in professional disaster response, and untapped potential for business continuity.
      • 00:35:08

      • Strata Conference - Extracting Microbial Threats From Big Data - Robert Munro

        Until now, no organization has succeeded in the task of tracking every global outbreak and epidemic. The necessary information is spread across too many locations, languages and formats: a field report in Spanish, a news article in Chinese, an email in Arabic, a text-message in Swahili. Even among open data, simple key-word or white-list based searches tend to fall short as they are unable to separate the signal (an outbreak of influenza) from the noise (a new flu remedy). In a project called EpidemicIQ, the Global Viral Forecasting Initiative has taken on the challenge of tracking all outbreaks. We are complementing existing field surveillance efforts in 23 countries with a new initiative that leverages large-scale processing of outbreak reports across a myriad of formats, utilizing machine learning, natural language processing and microtasking coupled with advanced epidemiological analysis.
      • 00:29:12

      • Strata Conference - HunchWorks: Combining Human Expertise and Big Data - Chris van der Walt, Dane Petersen, and Sara Farmer

        Global Pulse is a United Nations innovation initiative that is developing a new approach to crisis impact monitoring. One of the key outputs of the project is HunchWorks, a place where experts can post hypothesesor hunchesthat may warrant further exploration and then crowdsource data and verification. HunchWorks will be a key global platform for rapidly detecting emerging crises and their impacts on vulnerable communities. Using it, experts will be able to quickly surface ground truth and detect anomalies in data about collective behavior for further analysis, investigation and action. The presentation will open with an introduction by Chris van der Walt (Project Lead, Global Pulse) to the problem that HunchWorks is being designed to address: How to detect the emerging impacts of global crises in real-time? A short discussion of the design thinking behind HunchWorks will follow plus an overview of the HunchWorks feature set. Dane Petersen (Experience Designer, Adaptive Path) will then discuss some of the complex user experience design challenges that emerged as the team started to wrestle with developing HunchWorks and the approaches used to address them. Sara Farmer (Chief Platform Architect, Global Pulse) will follow up with a discussion of the technology powering HunchWorks, which is based on autonomy, uncertain reasoning and human-machine team theories, and is designed to to allow users and automated tools to work collaboratively to reduce the uncertainty and missing data issues inherent in hunch formation and management. The presentation will conclude with 10 minutes of Q&A from the audience.
      • 00:41:20

      • Strata Conference - Creating a Fact-based Decision Making Culture in Organizations - Amaresh Tripathy

        Analytical culture is the last mile problem of organizations. More data and analytics frequently lead to decision ambiguity. Insights are either not actionable and when they are actionable, they are not widely adopted at an operational level. There has been a lot of emphasis on technology and data quality aspects of analytics; however without the analytical culture most organizations will not be able to take advantage of the benefits. After partnering with more than 100 client organizations as a consultant, from small point solution pilots to deploying large decision support systems, I have developed a series of principles which I think are critical to create and foster an analytical culture. I want to introduce the framework and highlight the organizational principles with some real life war stories.
      • 00:31:56

      • Strata Conference - Gaining New Insights from Massive Amounts of Machine Data - Jake Flomenberg and Denise Hemke

        Many enterprises are being overwhelmed by the proliferation of machine data. Websites, communications, networking and complex IT infrastructures constantly generate massive streams of data in highly variable and unpredictable formats that are difficult to process and analyze by traditional methods or in a timely manner. Yet this data holds a definitive record of all activity and behavior, including user transactions, customer behavior, system behavior, security threats and fraudulent activity. Quickly understanding and using this data can provide added value to a companies services, customer sat, revenue growth and profitability. This session examines the challenges and approaches for collecting, organizing and deriving real-time insights from terabytes to petabytes of data, with examples from Salesforce.com, the nations leading enterprise cloud computing company.
      • 00:36:51

      • Strata Conference - Why MongoDB Was Created: What I Wish I Knew at DoubleClick - Dwight Merriman

        As CTO of DoubleClick, we scaled to serve 400,000 ads/second. We developed and used many custom data stores long before nosql was a buzzword. Over the years, Ive seen companies Ive worked with struggle with both scalability and agility. Writing the first lines of MongoDB code in 2007, we drew upon these experiences building large scale, high availability, robust systems. We wanted MongoDB to be a new kind of database that tackled the challenges we were trying to solve at DoubleClick. This session will focus on internet infrastructure scaling and also cover the history and philosophy of MongoDB.
      • 00:49:51

      • Strata Conference - Dedupe, Merge, and Purge: The Art of Normalization - Tyler Bell and Leo Polovets

        Big Noise always accompanies Big Data, especially when extracting entities from the tangle of duplicate, partial, fragmented and heterogeneous information we call the Internet. The ~17m physical businesses in the US, for example, are found on over 1 billion webpages and endpoints across 5 million domains and applications. Organizing such a disparate collection of pages into a canonical set of things requires a combination of distributed data processing and human-based domain knowledge. This presentation stresses the importance of entity resolution within a business context and provides real-world examples and pragmatic insight into the process of canonicalization.
      • 00:39:16

      • Strata Conference - Bringing the Rest of the World Into Your Data Warehouse - Philip Kromer

        Youve collected a ton of data and your team is busily crunching numbers and coming to conclusions but are they the right ones? You can only know with the right context and you cant get context working in a silo. We invite you to bring the rest of the world into your data warehouse. Dont worry, itll add more value than it takes and instead of working on the data, you can work on your vision. In this talk, well allay your fears of open data, demonstrate the difference between making decisions with and without context and show you other neat things that happen when you share.
      • 00:32:53

      • Strata Conference - Entities, Relationships, and Semantics: The State of Structured Search - Daniel Tunkelang, Andrew Hogue, Breck Baldwin, Evan Sandhaus, and Wlodek Zadrozny

        Structured search improves the search experience through the identification of entities and their relationships in documents and queries. This panel will explore the current state of structured and semi-structured search, as well as exploring the open problems in an area that promises to revolutionize information seeking. The four panelists work on some of the worlds largest structured search problems, from offering users structured search on Googles web corpus to building a computing system that defeated Jeopardy! champions in an extreme test of natural language understanding. They work on the data, tools, and research that are driving this field. They are all excellent researchers and presenters, promising to offer a informative and engaging panel discussion.
      • 00:39:55

      • Strata Conference - Big (Bad) Data - Elizabeth Charnock

        Whether you believe the hype around Big Data or not, the amount of information accruing throughout large organizations is getting more profound every day. And its not simply a question of volume; of equal concern is the variety of data. There are emails, IMs, tweets, Facebook updates and the fastest-growing category of data: video. This variety makes it difficult to generate an apples-to-apples comparison of data from a single individual or entity. Combine this with the fact that experts think that there is no such thing as clean data, and you have a growing problem. This is why it is better to focus on understanding digital character. As with individuals, electronic data has character. That character helps to disambiguate the relationship between one piece of data and another. This is particularly important given that because communication is more fragmented than ever, it makes relevance more difficult to ascertain. Digital character is similar to individual character in the real world; particularly in the sense that character emerges over time. Does one embarrassing photo or comment on Facebook define an individuals lifetime character? Cant everyone recollect an email they wish they had never sent? Just as in the real world, digital character requires a large enough body of work to make an accurate character judgment. Elizabeth Charnock, CEO of Cataphora and author of E-Habits, will discuss the pitfalls of Bad Data, and how it manifests itself in the interaction between a male stripper and a Harvard professor.
      • 00:26:50

      • Strata Conference - Agile Clouds for Big Data: Empowering the Data Scientist - Richard McDougall

        This talk will address the question of how to enable a much more agile data provisioning model for business units and data scientists. Were in a mode shift where data unlocks new growth, and almost every Fortune 1000 company is scrambling to architect a new platform to enable data to be stored, shared and analyzed for competitive advantage. Many companies are finding that this shift requires major rethinking of how systems should be architected (and scaled) to enable agile, self-service access to critical data.
      • 00:34:11

      • Strata Conference - Big Data and Big Analytics: SciDB Is not Hadoop - Paul Brown

        SciDB is an emerging open source analytical database that runs on a commodity hardware grid or in the cloud. SciDB natively supports: An array data model a flexible, compact, extensible data model for rich, highly dimensional data Massively scale math non-embarassingly parallel operations like linear algebra operations on matrices too large to fit in memory as well as transparently scalable R, MatLab, and SAS style analytics without requiring code for data distribution or parallel computation Versioning and Provenance Data is updated, but never overwritten. The raw data, the derived data, and the derivation are kept for reproducibility, what-if modeling, back-testing, and re-analysis Uncertainty support data carry error bars, probability distribution or confidence metrics that can be propagated through calculations Smart storage compact storage for both dense and sparse data that is efficient for location-based, time-series, and instrument data We will sketch the design of SciDB and talk about how its different from other proposals, and why that matters. We will also put out some early benchmarking data and present a computational genomics use case that showcase SciDBs massively scalable parallel analytics.
      • 00:42:50

      • Strata Conference - Optimizing Scarce Resources Using Real-time Decision Making - Alasdair Allan

        In the last few years the ubiquitous availability of high bandwidth networks has changed the way both robotic and non-robotic telescopes operate, with single isolated telescopes being integrated into expanding smart telescope networks that can span continents and respond to transient events in seconds. At the same time the rise of data warehousing has made data mining more practical, and correlations between new and existing data can be drawn in real time. These changes have led to fundamental shifts in the way astronomers pursue their science. Astronomy, once a data-poor science, has become data-rich.
      • 00:40:28

      • Strata Conference - Navigating the Data Pipeline - Tim Moreton

        At the heart of every system that harnesses big data is a pipeline that comprises collecting large volumes of raw data, extract value from it through analytics or data transformations, then delivering that condensed set of results back outpotentially to millions of users. This talk examines the challenges of building manageable, robust pipelinesa great simplifying paradigm that will help participants looking to architect their own big data systems. Ill look at what you want from each of these stagesusing Google Analytics as a canonical big data example, as well as case studies of systems deployed at LinkedIn. Ill look at how collecting, analyzing and serving data pose conflicting demands on the storage and compute components of the underlying hardware. Ill talk about what available tools do to address these challenges. Ill move on to consider two holy grails: real-time analytics, and dual data center support. The pipeline metaphor highlights a challenge in deriving real-time value from huge datasets: Ill explore what happens when you compose multiple, segregated platforms into a single pipeline, and how you can dodge the issue with a fast and slow two-tier architecture. Then Ill look at how you can figure dual data center support into the design, particularly important for highly available deployments on the cloud. In summary, this talk will present a useful metaphor for architecting big data systems, and describe using deployed examples how to go about fitting together the tools available to fit a range of settings.
      • 00:33:07

      • Strata Conference - The Accidental Chief Privacy Officer - Jim Adler

        The first generation of chief privacy officers were typically attorneys, charged with the formulation and enforcement of privacy policies. Times have changed. Given the speed and complexity of technology, the privacy policy is necessary but hardly sufficient. Because we live much of our lives in public, both online and offline, the Internet is transforming the anonymity of our cities into the familiarity of small towns. Privacy is deeply ingrained within the technology that manages this personal data. The products and services driving this transformation must consider privacy from the earliest design sessions. Todays engineer CPO, and Im one, must deeply involve themselves with the technology and product design process to bake-in privacy. This new breed of CPO is comfortable in an engineering scrum, product focus group, reviewing pending regulations, or analyzing A/B test results. They have the historical awareness, frontier spirit, regulatory caution, technical chops, and innovators curiosity to work through the toughest data issues. The promise of the engineer CPO is that products, not only safeguard privacy, but compete on it.
      • 00:40:16

      • Strata Conference - Journey or Destination: Using Models to Explore Big Data - Ben Gimpert

        A data scientist working in isolation could train a predictive model with perfect in-sample accuracy, but only an understanding of how the business will use the model lets her balance the crucial bias / variance trade-off. Put more simply, applied business knowledge is how we can assume a model trained on historical data will do decently with situations we have never seen. Models can also reveal predictors in our data we never expected. The business can learn from the automatic ranking of predictor importance with statistical entropy and multicollinearity tools. In the extreme, a surprisingly important variable that turns up during the modeling of a big data set could be the trigger of an organizational pivot. What if a movie recommendation model reveals a strange variable for predicting gross at the box office? My presentation introduces exploratory model feedback in the context of big (training) data. I will use a real-life case study from Altos Research that forecasts a complex system: real estate prices. Rapid prototyping with Ruby and an EC2 cluster allowed us to optimize human time, but not necessarily computing cycles. I will cover how exploratory model feedback blurs the line between domain expert and data scientist, and also blurs the distinction between supervised and unsupervised learning. This is all a data ecology, in which a model of big data can surprise us and suggest its own future enhancement.
      • 00:35:46

      • Strata Conference - Data as the Building Block at Foursquare - Justin Moore

        Foursquare stores and processes everything from check-ins to screen views using a combination of home grown and open source tools. This talk covers an overview of our stack, highlighting specific examples of how, and why, it grew to what it is today and continues with the many ways that this infrastructure is employed.
      • 00:20:54

      • Strata Conference - Humble Pie: Helping the Guardian Chart Big Stories Through Small Details - Alastair Dant

        Nowadays, major news events prompt millions of responses online. Every message passing through the internet has a voice. Aggregate analysis and visualization helps us see the roar of the crowd. The Guardian first explored this last year with an award-winning graphic that replays World Cup games , condensing 90 minutes of tweets into 90 seconds of interactive animation. By juxtaposing match events with surges in word popularity, viewers can relive the ripples of human reaction passing through Twitter. Asked to apply similar techniques to the News International saga, we partnered with Datasift to capture and display public responses during key events in the story. This talk steps through the process of recording, processing and displaying a large volume of tweets which enabled a small team to build complex pieces of interactive content at newsroom speeds. Above all, the presentation will aim to portray the delicate balance of design, data and storytelling at the heart of interactive news content.
      • 00:48:01

      • Strata Conference - The Charts You Want Might Not Be the Charts You Need - Irene Ros

        Data visualization is an important communication medium in personal and public conversation spheres. Its wide use in entertainment and business settings alike has encouraged the creation of tools and frameworks that allow anyone to create visualizations and share them with their audience. While these tools offer tried and true visualization metaphors they also pose risks such as missing important data points or creating meaningless visuals. This talk will introduce the concept of responsible data visualization in the context of two distinct uses: exploration and narrative. Using personal and industry examples to show best and worst practices in each approach, this talk will offer practical suggestions to bringing data visualization into ones data workflow.
      • 00:36:46

      • Strata Conference - Data Science from the Perspective of an Applied Economist - Scott Nicholson

        Economists utilize a data analysis toolkit and intuition that can be very helpful to Data Scientists. In particular, econometric methods are quite useful in disentangling correlation and causation, a use case not well-handled by standard machine learning and statistical techniques. This session will cover examples of econometric methods in action, as well as other economics-related insights. Think of it as a crash-course in basic econometric intuition that one receives during a PhD in Economics (I received my PhD from Stanford in 2008). Why econometrics? The difference between econometrics and statistics is that statistical modeling is more concerned with fit, and econometric modeling is more concerned with properly estimating the coefficients in a regression. Getting the right (consistent & unbiased) estimates means that the analyst can more effectively measure how a change in one variable can strongly predict (or cause) a change in the dependent variable. These techniques can help solve problems in social/web data that previously were only solvable using future data collection from randomized multivariate experiments. To do this, the analyst first develops an intuition for whether or not there is a source of endogeneity in the regression. This largely is determined by the relationship between the predictors and the error term in the regression. Once the source of the endogeneity is understood, econometric techniques like fixed/random effects and instrumental variables can be quite useful. The type of data that is collected and available is key to the extent to which the power of these techniques can be used. [I might also go into some other techniques, but these are the most useful] The methods will be presented in a way so that a non-technical person can understand the basic intuition, and also so that a practitioner can apply the methods in the future. Examples will be provided. For panel data econometrics, we will discuss the example of how to identify actions taken early on by a LinkedIn member that are predictive of their future engagement with the product, a problem that is difficult due to the confounding of correlation and causation. For instrumental variables techniques, we will discuss how to use random variation in the weather to say cool things about politics, economics, and web usage. In addition to the discussion of applied econometric techniques, there may also be time for economics-related data insights. Currently we are developing unemployment rate prediction models using time-series econometrics as well as indexes to measure changes in the supply/demand for talent across regions and industries.
      • 00:35:34

      • Strata Conference - 1M. 10M. 100M. Data! - Monica Rogati

        How do data infrastructure, insights and products change when your user base grows by orders of magnitude? When should you move your user-facing data product off your laptop? (hint: now!) Does your data offer insights about the world at large, or is it just mirroring your early adopters? In this talk, I will share some of the data scaling lessons weve learned at LinkedIn, recount war stories (and close calls!) and document the evolution of the data scientist.
      • 00:39:42

      • Strata Conference - Chart Wars: The Political Power of Data Visualization - Alex Lundry

        Political campaigns and causes have added another powerful weapon to their messaging arsenal: graphs, charts, infographics and other forms of data visualization. Over just the last year, Barack Obama urged voters to distribute and share a bar graph of job losses, a line graph of labor costs by a New York Times columnist prompted an official graphical response from the government of Spain, and an organizational chart of a health care reform bill became the subject of a Congressional investigation in the United States. To be sure, a good graph has been used as an advocacy tool for years, but only recently, with the rise of the Internet, blogs, hardware and software advances, and freely available machine readable data, political data visualizations have exploded into political discourse. Conveying objective authority, yet the product of dozens of subjective design decisions, political infographics imply hard truths despite their inherently editorial nature. This talk, given by a political data scientist who has built persuasive data visualizations for political organizations, will dissect some of the most extraordinary and powerful examples of political data visualization used over the last election cycle, focusing upon the methods that make them work so well.
      • 00:44:06

      • Strata Conference - Designing Data Visualizations: Telling Stories With Data - Noah Iliinsky

        This is a talk aimed at people who know their data, and want to learn how to visualize it most effectively. If you have data, a need for answers, and a blank page, this is a great place to start. Well start briefly addressing the value of visualization, and discuss the differences between visualization for analysis and presentation. From there well figure out what story to tell with your visualization by examining the holy visualization trinity: - your goals - your customers needs - the shape of your data Once the story has been selected, we need to construct it. Well discuss key considerations to make good choices about: - selecting appropriate data - selecting appropriate axes - visually encoding the data Well end with a brief discussion of some current tools, and look at some classic and innovative visualization examples.
      • 00:38:56

      • Strata Conference - Big Data Use Cases in the Cloud - Peter Sirota and Justin Moore

        By pairing the elasticity and pay-as-you-go nature of the cloud with the flexibility and scalability of Hadoop, Amazon Elastic MapReduce has brought Big Data analytics to an even wider array of companies looking to maximize the value of their data. Each day, thousands of Hadoop clusters are run on the Amazon Elastic MapReduce infrastructure by users of every sizefrom University students to Fortune 50 companiesexposing the Elastic MapReduce team to an unparalleled number of use cases. In this session, we will contrast how three of these users, Amazon.com, Yelp, and Etsy, leverage the marriage of Hadoop and the cloud to drive their businesses in the face of explosive growth, including generating customer insights, powering recommendations, and managing core operations.
      • 00:27:40

      • Strata Conference - Data Environmentalism - Trevor Hughes

        Data fuels 21st century business and society. Thanks to the rapid pace of innovation and widespread adoption of information technologies, data has become both a strategic asset and a potentially crippling liability. As consumers grow increasingly concerned about the stewardship of their data, policymakers, academics and advocates around the world are questioning boundaries and considering risks: What is private and what is not? How should organizations explain what theyre doing with data? What should happen when data is stolen or misused? And, in an era of globalization, how do we manage the diverse social and legal expectations? These questions are urgent in the current business climate where trust in our most basic institutions has been eroded. As organizations cope with growing tension between innovation, privacy and security, they are discovering that appropriate use and protection of data has broad impact on their reputations and bottom linesa new, holistic ethos of data environmentalism is necessary.
      • 00:46:04

      • Strata Conference - Hazarding a Guess: Ethical, Legal, and Policy Issues in Analytics and Big Data Applications - Solon Barocas, Betsy Masiello, and Jane Yakowitz

        Analytics can push the frontier of knowledge well beyond the useful facts that already reside in big data, revealing latent correlations that empower organizations to make statistically motivated guessesinferencesabout the character, attributes, and future actions of their stakeholders and the groups to which they belong. This is cause for both celebration and caution. Analytic insights can add to the stock of scientific and social scientific knowledge, significantly improve decision-making in both the public and private sector, and greatly enhance individual self-knowledge and understanding. They can even lead to entirely new classes of goods and services, providing value to institutions and individuals alike. But they also invite new applications of data that involve serious hazards. This panel considers these hazards, asking how analytics implicate: Privacy What are the privacy concerns involved in the kinds of inferences and applications that analytics enable? Are these concerns sufficiently well understood and accounted for? Autonomy What are the ethical stakes of applications that draw on analytic findings to selectively (and perhaps inadvertently) influence or limit individuals choices or decision-making? Fairness If organizations rely on certain discoveries to set criteria for unequal treatment or access, do analytics implicate questions of fairness and due process? More specifically, what if organizations draw on analytics to individualize risks or engage in adverse selection or cream skimming? Fragmentation Do attempts to personalize and customize goods and services (including media content) to individuals on the basis of inferred preferences shield individuals from certain views and issues and thus undermine social belonging and the functioning of the public sphere? The panel will also debate the appropriate response to these issues, reviewing the place of norms, policies, legal frameworks, regulation, and technology.
      • 00:45:59

      • Strata Conference - Apache Cassandra 1.0: Ready for the Enterprise - Jonathan Ellis

        The Apache Cassandra database has added many new enterprise features this year based on the real-world needs of companies like Twitter, Netflix, Openwave, and others building massively scalable systems. Apache Cassandra addresses a wide variety of real-time big data needs. Capable of tracking transactions in financial markets or the actions of millions of users in massively multiplayer games, Cassandra handles the demands of large volume applications and data streams. Whether its storing billions of emails or backing up terabytes of files, Cassandra can store large amounts of data and scale near-infinitely. In todays information age, Cassandra excels at storing and serving massive amounts of data at low-latency from geolocation data to server performance metrics, and more. This talk will cover the motivation and use cases behind features such as secondary indexes, Hadoop integration, SQL support, bulk loading, and more. This session is sponsored by DataStax
      • 00:43:12

      • Strata Conference - MapReduce for the Rest of Us: Unlocking Data Science for the Business User - Tasso Argyros

        MapReduce, Hadoop, and other NoSQL big data approaches open opportunities for data scientists in every industry to develop new data-driven applications for digital marketing optimization and social network analysis through the power of iterative, big data analysis. But what about the business user or analyst? How can they unlock insights through standard business intelligence (BI) tools or SQL access? The challenge with emerging big data technologies is finding staff with the specialized skill sets of the data scientist to implement and use these solutions. Business leaders and enterprise architects struggle to understand, implement, and integrate these big data technologies with their existing business processes and IT investments and provide value to the business. This session will explore a new class of analytic platforms and technologies such as SQL-MapReduce which bring the science of data to the art of business. By fusing standard business intelligence and analytics with next-generation data processing techniques such as MapReduce, big data analysis is no longer just in the hands of the few data science or MapReduce specialists in an organization! Youll learn how business users can easily access, explore, and iterate their analysis of big data to unlock deeper sights. See example applications with digital marketing optimization, fraud detection and prevention, social network and relationship analysis, and more. This session is sponsored by Aster Data
      • 00:31:21

      • Strata Conference - How Thomson Reuters Finds a Needle in Many Haystacks within Seconds - Steve Jackson

        How do you efficiently and effectively search the worlds leading collection of legal content 2.2 billion documents then quickly zero in on exactly what you need, all in a matter of seconds? Thomson Reuters Professional, built a Big Data information management architecture to do just this for their clients. WestlawNext gives legal professionals comprehensive, specialized content plus unique search technologies and tools that help them find, understand and apply the law and legal concepts in the service of their clients. Learn how Thomson Reuters manages and processes a variety of very large and diverse data sources to quickly publish timely, trusted, and relevant information to their clients. This session sponsored by Informatica
      • 00:33:10

      • Strata Conference - Gaining Adoption Through Data Visualization - Lee Feinberg

        Sophisticated data analytics is a great thing. But great analytics is only valuable if people use it. The worst thing is a great analysis filled with answers sitting on the shelf going unused. In this session you will learn how to present and show analytics in highly compelling ways. You'll learn how to use it as a cultural change-agentand how you must shift to a data marketing mindset to make it all happen. This session is sponsored by Tableau Software
      • 00:38:10

      • Strata Conference - How Hadoop is Revolutionizing Business Intelligence and Advanced Data Analytics - Amr Awadallah

        The introduction of Apache Hadoop is changing the business intelligence data stack. In this presentation, Dr. Amr Awadallah, chief technology officer at Cloudera, will discuss how the architecture is evolving and the advanced capabilities it lends to solving key business challenges. Awadallah will illustrate how enterprises can leverage Hadoop to derive complete value from both unstructured and structured data, gaining the ability ask and get answers to previously un-addressable big questions. He will also explain how Hadoop and relational databases complement each other, enabling organizations to access the latent information in all their data under a variety of operational and economic constraints. This session is sponsored by Cloudera
      • 00:41:05

      • Strata Conference - Telecom Network Switches: Big Value from Big Data - Jim Falgout

        Telecom network switches, network servers and other equipment generate and store large amounts of data every day. The data is mainly used for billing and network operations, If utilized fully, this data can have an enormous impact on network operations and overall profitability. Many communications service providers (CSP) do not have the tools to mine this data quickly and deeply enough to realize its value. Tools are being used that are usually home grown and not scalable. Valuable information is being lostinformation that could be used to predict network issues rather than respond to them after the fact. The alternative of a full analytic database can be cost-prohibitive. By applying big data tools and predictive analytics upstream of the database, CSPs can move from reactive to proactive use of the data. Network quality problems can be identified in minutes rather than days. By analyzing all the data, analytics tools can pinpoint root cause and suggest corrective actions. Finding and fixing these issues more quickly leads to higher call quality, more profitable service and increased customer satisfaction. This session is sponsored by Pervasive
      • 00:32:25

      • Strata Conference - LexisNexis: Reinventing New Business with Big Data - Ron Avnur and Mark Rodgers

        Ron Avnur, SVP Engineering, MarkLogic, and Mark Rodgers, Sr. Director of Product Engineering, LexisNexis will reveal how LexisNexis is rebuilding its business platform to handle Big Data in real-time. LexisNexis is renowned for the technical solutions it has been building for 40+ years. It is well aware of the challenges of Big Data as it has gathered a huge amount of content. Avnur will explain how Big Data and unstructured information is slowly overtaking organizations. Rodgers will discuss the challenges LexisNexis faced as a global organization that was building new products to remain on the cutting edge of Big Data. Together, Avnur and Rodgers will give a brief overview of the technical implementation that enabled LexisNexis to address those challenges. Finally, Rodgers will detail the business benefits LexisNexis is experiencing as a result of its new Big Data business platform. This session is sponsored by MarkLogic
      • 00:38:43

      • Strata Conference - Big Data Revolution: Benefit from MapReduce Without the Risk - Ted Dunning

        Map-reduce and Hadoop provide new scaling opportunities for analyzing data. As a result organizations are beginning to analyze and derive business value from large amounts of data that, in many cases, were previously simply being discarded. In some cases such as on-line advertising, the ability to analyze these previously impenetrable volumes of data have disrupted entire industries such as is the case with on-line advertising. Such green field opportunities are rare, however, and few companies can afford to build an entirely new analytics pipeline. Integrating big data analytics systems like Apache Hadoop into existing analytics systems can be very difficult, however, because there are huge differences in the fundamental approaches being taken to the basic problems of how data should be accessed and analyzed. These differences are exactly what makes these new technologies hugely effective, but they are also what makes integration between conventional and new approaches so difficult. This talk will provide detailed descriptions of how to use new technologies to Get data into and out of the Hadoop cluster as quickly as possible Allow real-time components to easily access cluster data Use well-known and understood standard tools to access cluster data Make Hadoop easier to use and operate Capitalize on existing code in map-reduce settings Integrate map-reduce systems into existing analytic systems These descriptions will be taken from real-life customer situations. Each will describe the problems faced and the solutions that solved these problems. This session is sponsored by MapR Technologies
      • 00:36:38

      • Strata Conference - Beyond BI Transforming Your Business with Big Data Analytics - Steven Hillion

        Do you use all the information you should when you make your most important decisions? Is your organization prepared to go beyond BI to enable breakthrough insights and decisions that transform the way you do business? Increasingly organizations realize that data intensive predictive analytics is a necessary tool for a company to compete and succeed even if the organization has already deployed a full-blown BI and DW stack. Armed with advanced analytics insights, business users can make well-informed decisions to support their organizations tactical and strategic goals and create competitive advantage. Steven Hillion, VP of EMC Greenplums Data Analytics Lab lends insight into emerging technologies to take advantage of the big data opportunity and how big data challenges todays BI architectures and approaches to data management. This session is sponsored by EMC Greenplum
      • 00:35:48

      • Strata Conference - Big Data Architectures 2.0: Beyond the Elephant Ride - Vineet Tyagi

        Businesses today are moving beyond the buzz and experimentation with batch processing options of Hadoop and MapReduce, stretching the limits for cutting edge performance & scalability. This session will talk about emerging trends of a new generation of NoHadoop (Not Only Hadoop) architectures for future proof big data scalability and prepare you for life beyond the elephant ride! This session is sponsored by Impetus Technologies, Inc.
      • 00:43:28

Strata Conference New York 2011: Video Compilation

  • Publisher: O'Reilly Media
  • Released: October 2011
  • Run time: 48 hours 30 minutes

The Best of Strata New York 2011:

Learn the tools for making data work with this complete video series

Demand has skyrocketed for data scientists proficient in the technologies for gleaning insight and utility from big data. At O’Reilly’s Strata New York Conference in September 2011, developers and data professionals learned about the best tools and technologies for everything from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

This video compilation gives you access to every session

Tune into hardcore technical sessions on parallel computing, machine learning, and interactive visualizations. Examine case studies from finance, media, healthcare, and technology. And get provocative reports from experts and innovators. Join Ken Bado (MarkLogic CEO), Alistair Croll, (Bitcurrent Founder), Mark Madsen (Third Nature CEO), and 75 other presenters as they explore new methods to make data work.

Included among the 30 sessions you’ll receive in this video package:


  • Bringing the Rest of the World into Your Data Warehouse—Philip Kromer (Infochimps)
  • Journey or Destination: Using Models to Explore Big Data—Ben Gimpert (Altos Research)
  • Taming Data Logistics: the Hardest Part of Data Science—Ken Farmer (IBM)


  • When Elephants Mate: Will Hadoop Transform Banking?—Abhishek Mehta (Tresata)
  • Extracting Microbial Threats from Big Data—Robert Munro (EpidemicIQ, Global Viral Forecasting)
  • Do it Right: Proven Techniques for Exploiting Big Data Analytics—Bill Schmarzo (EMC)


  • Designing Data Visualizations: Telling Stories with Data—Noah Iliinsky (Complex Diagrams)
  • Chart Wars: The Political Power of Data Visualization—Alex Lundry (TargetPoint Consulting)
  • How to Avoid Some Common Graphical Mistakes—Naomi Robbins (NBR)