Home

All courses

The Ultimate Hands-On Hadoop: Tame your Big Data!

Big Data

Business

Business Analytics & Intelligence

The Ultimate Hands-On Hadoop: Tame your Big Data!

Data Engineering and Hadoop tutorial with MapReduce, HDFS, Spark, Flink, Hive, HBase, MongoDB, Cassandra, Kafka + more!

4.5

(30925)

188808 enrolled on this course

Last updated 11/2021

Sundog Education by Frank Kane

₹ 3999.0

₹ 5499.0

Buy Now

30-Day Money-Back Guarantee

Lessons

Quizzes

Duration

13 Hours

Skill level

Beginner

Language

English

Certificate

Yes

Full lifetime access

Yes

Description

The world of
Hadoop
and
"Big Data"
can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!
Learn and master the most popular
data engineering
technologies in this comprehensive course, taught by a former engineer and senior manager from
Amazon
and
IMDb
. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.
Install and work with a real Hadoop installation right on your desktop with
Hortonworks
(now part of Cloudera) and the
Ambari
UI
Manage big data on a cluster with
HDFS
and
MapReduce
Write programs to analyze data on Hadoop with
Pig
and
Spark
Store and query your data with
Sqoop
,
Hive
,
MySQL
,
HBase
,
Cassandra
,
MongoDB
,
Drill
,
Phoenix
, and
Presto
Design real-world systems
using the Hadoop ecosystem
Learn how your cluster is managed with
YARN
,
Mesos
,
Zookeeper
,
Oozie
,
Zeppelin
, and
Hue
Handle streaming data in real time with
Kafka
,
Flume
,
Spark Streaming
,
Flink
, and
Storm
Spark and Hadoop developers are hugely valued at companies with large amounts of data; these are very marketable skills to learn.
Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM, Spotify, Twitter, and Yahoo! And it's not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.
This course is comprehensive, covering
over 25 different technologies
in over
14 hours of video lectures
. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.
You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines, we'll show you how to work with them too. And if you're a programmer, I'll challenge you with writing real scripts on a Hadoop system using
Scala, Pig Latin,
and
Python
.
You'll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end!
Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.
Knowing how to wrangle "big data" is an incredibly valuable skill for today's top tech employers. Don't be left behind -
enroll now!
"The Ultimate Hands-On Hadoop... was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. " - Aldo Serrano
"I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment. This course helped me achieve a far greater understanding of the environment and its capabilities. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment." - Tyler Buck
Who this course is for:
Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend "big data" at scale.
Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.

What you'll learn

Design distributed systems that manage "big data" using Hadoop and related data engineering technologies.

Use HDFS and MapReduce for storing and analyzing data at scale.

Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.

Analyze relational data using Hive and MySQL

Analyze non-relational data using HBase, Cassandra, and MongoDB

Query data interactively with Drill, Phoenix, and Presto

Choose an appropriate data storage technology for your application

Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.

Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume

Consume streaming data using Spark Streaming, Flink, and Storm

Requirements

You will need a copy of Adobe XD 2019 or above. A free trial can be downloaded from Adobe.
No previous design experience is needed.
No previous Adobe XD skills are needed.

Course Content

27 sections • 95 lectures

Expand All Sections

1-Learn all the buzzwords! And install the Hortonworks Data Platform Sandbox.

1.1-Udemy 101: Getting the Most From This Course

1.2-Tips for Using This Course

1.3-If you have trouble downloading Hortonworks Data Platform...

1.4-Warning for Apple M1 users

1.5-Installing Hadoop [Step by Step]

1.6-The Hortonworks and Cloudera Merger, and how it affects this course.

1.7-Hadoop Overview and History

1.8-Overview of the Hadoop Ecosystem

1.9-Important note

2-Using Hadoop's Core: HDFS and MapReduce

2.1-HDFS: What it is, and how it works

2.2-Alternate MovieLens download location

2.3-Installing the MovieLens Dataset

2.4-[Activity] Install the MovieLens dataset into HDFS using the command line

2.5-MapReduce: What it is, and how it works

2.6-How MapReduce distributes processing

2.7-MapReduce example: Break down movie ratings by rating score

2.8-[Activity] Install Python, MRJob, and nano

2.9-[Activity] Code up the ratings histogram MapReduce job and run it

2.10-[Exercise] Rank movies by their popularity

2.11-Note: Sorting will only work by partition.

2.12-[Activity] Check your results against mine!

3-Programming Hadoop with Pig

3.1-Introducing Ambari

3.2-Introducing Pig

3.3-Example: Find the oldest movie with a 5-star rating using Pig

3.4-[Activity] Find old 5-star movies with Pig

3.5-More Pig Latin

3.6-[Exercise] Find the most-rated one-star movie

3.7-Pig Challenge: Compare Your Results to Mine!

4-Programming Hadoop with Spark

4.1-Why Spark?

4.2-The Resilient Distributed Dataset (RDD)

4.3-[Activity] Find the movie with the lowest average rating - with RDD's

4.4-Datasets and Spark 2.0

4.5-[Activity] Find the movie with the lowest average rating - with DataFrames

4.6-[Activity] Movie recommendations with MLLib

4.7-[Exercise] Filter the lowest-rated movies by number of ratings

4.8-[Activity] Check your results against mine!

5-Using relational data stores with Hadoop

5.1-What is Hive?

5.2-[Activity] Use Hive to find the most popular movie

5.3-How Hive works

5.4-[Exercise] Use Hive to find the movie with the highest average rating

5.5-Compare your solution to mine.

5.6-Integrating MySQL with Hadoop

5.7-Cheat sheet for the following lecture

5.8-[Activity] Install MySQL and import our movie data

5.9-[Activity] Use Sqoop to import data from MySQL to HFDS/Hive

5.10-[Activity] Use Sqoop to export data from Hadoop to MySQL

6-Using non-relational data stores with Hadoop

6.1-Why NoSQL?

6.2-What is HBase

6.3-[Activity] Import movie ratings into HBase

6.4-[Activity] Use HBase with Pig to import data at scale.

6.5-Cassandra overview

6.6-If you have trouble installing Cassandra...

6.7-[Activity] Installing Cassandra

6.8-[Activity] Write Spark output into Cassandra

6.9-MongoDB overview

6.10-[Activity] Install MongoDB, and integrate Spark with MongoDB

6.11-[Activity] Using the MongoDB shell

6.12-Choosing a database technology

6.13-[Exercise] Choose a database for a given problem

7-Querying your Data Interactively

7.1-Overview of Drill

7.2-[Activity] Setting up Drill

7.3-[Activity] Querying across multiple databases with Drill

7.4-Overview of Phoenix

7.5-[Activity] Install Phoenix and query HBase with it

7.6-[Activity] Integrate Phoenix with Pig

7.7-Overview of Presto

7.8-[Activity] Install Presto, and query Hive with it.

7.9-[Activity] Query both Cassandra and Hive using Presto.

8-Managing your Cluster

8.1-YARN explained

8.2-Tez explained

8.3-[Activity] Use Hive on Tez and measure the performance benefit

8.4-Mesos explained

8.5-ZooKeeper explained

8.6-[Activity] Simulating a failing master with ZooKeeper

8.7-Oozie explained

8.8-[Activity] Set up a simple Oozie workflow

8.9-Zeppelin overview

8.10-[Activity] Use Zeppelin to analyze movie ratings, part 1

8.11-[Activity] Use Zeppelin to analyze movie ratings, part 2

8.12-Hue overview

8.13-Other technologies worth mentioning

9-Feeding Data to your Cluster

9.1-Kafka explained

9.2-[Activity] Setting up Kafka, and publishing some data.

9.3-[Activity] Publishing web logs with Kafka

9.4-Flume explained

9.5-[Activity] Set up Flume and publish logs with it.

9.6-[Activity] Set up Flume to monitor a directory and store its data in HDFS

10-Analyzing Streams of Data

10.1-Spark Streaming: Introduction

10.2-[Activity] Analyze web logs published with Flume using Spark Streaming

10.3-[Exercise] Monitor Flume-published logs for errors in real time

10.4-Exercise solution: Aggregating HTTP access codes with Spark Streaming

10.5-Apache Storm: Introduction

10.6-[Activity] Count words with Storm

10.7-Flink: An Overview

10.8-[Activity] Counting words with Flink

11-Designing Real-World Systems

11.1-The Best of the Rest

11.2-Review: How the pieces fit together

11.3-Understanding your requirements

11.4-Sample application: consume webserver logs and keep track of top-sellers

11.5-Sample application: serving movie recommendations to a website

11.6-[Exercise] Design a system to report web sessions per day

11.7-Exercise solution: Design a system to count daily sessions

12-Learning More

12.1-Books and online resources

12.2-Bonus Lecture: More courses to explore!

Course Categories

Popular Courses

The Ultimate Hands-On Hadoop: Tame your Big Data!

Description

What you'll learn

Requirements

Course Content

You May Like