Big Data And Hadoop



Introduction

Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Hadoop Cluster- Single and Multi node, Hadoop 2.x, Flume, Sqoop, Map-Reduce, PIG, Hive, HBase, Zookeeper, Oozie etc. will be covered in the course.

Why Bigdata Hadoop?

Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers. In his report Big Data in Big Companies, IIA Director of Research Tom Davenport interviewed more than 50 businesses to understand how they used big data. He found they got value in the following ways:

  1. Cost reduction: Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data – plus they can identify more efficient ways of doing business.

  2. Faster, better decision making: With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately – and make decisions based on what they’ve learned.

  3. New products and services: With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. Davenport points out that with big data analytics, more companies are creating new products to meet customer's needs.




Understanding Big Data and Hadoop
  • Big Data
  • Limitations and Solutions of existing Data Analytics Architecture
  • Hadoop
  • Hadoop Features
  • Hadoop Ecosystem
  • Hadoop 2.x core components
  • Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework
  • Anatomy of File Write and Read, Rack Awareness.

Hadoop Architecture and HDFS
  • Hadoop 2.x Cluster Architecture - Federation and High Availability
  • A Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Password-Less SSH
  • MapReduce Job Execution
  • Data Loading Techniques: Hadoop Copy Commands
  • FLUME
  • SQOOP

Hadoop MapReduce Framework - I
  • MapReduce Use Cases
  • Traditional way Vs MapReduce way
  • Why MapReduce
  • Hadoop 2.x MapReduce Architecture
  • Hadoop 2.x MapReduce Components
  • YARN MR Application Execution Flow
  • YARN Workflow
  • Anatomy of MapReduce Program
  • Demo on MapReduce.

Hadoop MapReduce Framework - II
  • Input Splits
  • Relation between Input Splits and HDFS Blocks
  • MapReduce Job Submission Flow
  • Demo of Input Splits
  • MapReduce: Combiner & Partitioner
  • Demo on de-identifying Health Care Data set
  • Demo on Weather Dataset

Advance MapReduce
  • Counters
  • Distributed Cache
  • MRunit
  • Reduce Join
  • Custom Input Format
  • Sequence Input Format

PIG
  • About Pig, MapReduce Vs Pig, Pig Use Cases
  • Programming Structure in Pig, Pig Running Modes, Pig components.
  • Pig Execution, Pig Latin Program,Data Models in Pig, Pig Data Types.
  • Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators
  • Pig UDF
  • Pig Demo on Healthcare Data set

HIVE
  • Hive Background, Hive Use Case, About Hive.
  • Hive Vs Pig, Hive Architecture and Components, Metastore in Hive.
  • Limitations of Hive
  • Comparison with Traditional Database.
  • Hive Data Types and Data Models, Partitions and Buckets.
  • Hive Tables (Managed Tables and External Tables)
  • Importing Data, Querying Data,Managing Outputs.
  • Hive Script,Hive UDF.
  • Hive Demo on Healthcare Data set

Advance Hive and HBase
  • Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts.
  • Hive : Thrift Server, User Defined Functions.
  • HBase: Introduction to NoSQL Databases and HBase, HBase v/s RDBMS, HBase Components, HBase Architecture, HBase Cluster Deployment.

Advance HBase
  • HBase Data Model, HBase Shell, HBase Client API, Data Loading Techniques.
  • ZooKeeper Data Model, Zookeeper Service, Zookeeper.
  • Demos on Bulk Loading, Getting and Inserting Data, Filters in HBase.

Oozie and Hadoop Project
  • Flume and Sqoop Demo
  • Oozie, Oozie Components, Oozie Workflow.
  • Scheduling with Oozie, Demo on Oozie Workflow.
  • Oozie Co-ordinator, Oozie Commands.
  • Oozie Web Console, Hadoop Project Demo



Certification Process

Once you are successfully through the project (Reviewed by a AIITC expert), you will be awarded with AIITC's Big Data and Hadoop certificate.



Discussion