+91 79073 12685

Hadoop Training

hadoop

TMCS is one of the leading Institute for Hadoop Training in Marathahalli. TMCS has got highly professional real-time Trainers, good infrastructure, and a highly organized placement cell to help the students with placements. Regarding Hadoop Training in Marathahalli, TMCS has helped a lot of students to get quality training and placement. If you are looking for professional, real-time Hadoop Training in Marathahalli please contact TMCS​

Hadoop Course Content

INTRODUCTION TO LINUX AND BIG DATA VIRTUAL MACHINE ( VM)

Introduction/ Installation of Virtual Box and the Big Data VM Introduction to Linux, Why Linux?, Windows and the Linux equivalents, Different flavors of Linux, Unity Shell (Ubuntu UI), Basic Linux Commands (enough to get started with Hadoop)

UNDERSTANDING BIG DATA

3V (Volume- Variety- Velocity) characteristics, Structured and Unstructured Data, Application and use cases of Big Data, Limitations of traditional large Scale systems, How a distributed way of computing is superior (cost and scale), Opportunities and challenges with Big Data

HDFS (THE HADOOP DISTRIBUTED FILE SYSTEM)

HDFS Overview and Architecture, Deployment Architecture, Name Node, Data Node and Checkpoint Node (aka Secondary Name Node), Safe mode, Configuration files, HDFS Data Flows (Read v/s Write)

HOW HDFS ADDRESSES FAULT TOLERANCE?

CRC Check Sum, Data replication, Rack awareness and Block placement policy, Small files problem

HDFS INTERFACES

Command Line Interface, File System, Administrative, Web Interface

ADVANCED HDFS FEATURES

Load Balancer, Dist cp (Distributed Copy), HDFS Federation, HDFS High Availability, Hadoop Archives

MAP REDUCE – 1 (THEORETICAL CONCEPTS)

MapReduce overview, Functional Programming paradigms, How to think in a MapReduce way

MAPREDUCE ARCHITECTURE

Legacy MR v/s Next Generation MapReduce, ( aka YARN/ MRv2), Slots v/s Containers, Schedulers, Shuffling, Sorting, Hadoop Data Types, Input and Output Formats, Input Splits – Partitioning ( Hash Partitioner v/s Customer Partitioner), Configuration files, Distributed Cache

MR ALGORITHM AND DATA FLOW

Word Count

ALTERNATIVES TO MR – BSP (BULK SYNCHRONOUS PARALLEL)

Adhoc querying, Graph Computing Engines

MAP REDUCE – 2 (PRACTICE) DEVELOPING, DEBUGGING AND DEPLOYING MR PROGRAMS

Stand alone mode ( in Eclipse), Pseudo distributed mode ( as in the Big Data VM), Fully distributed mode ( as in Production), MR API, Old and the new MR API, Java Client API, Hadoop data types and custom Writable

WRITABLECOM PARABLES

Different input and output formats, Saving Binary Data using Sequence Files and Avro Files, Hadoop Streaming (developing and debugging non Java MR program s – Ruby and Python)

OPTIMIZATION TECHNIQUES

• Speculative execution • Combiners • JVM Reuse • Compression

MR ALGORITHMS (NON- GRAPH)

Sorting, Term Frequency, Inverse Document Frequency, Student Data Base, Max Temperature, Different ways of joining data, Word Co- Occurrence

MR ALGORITHMS (GRAPH)

PageRank, Inverted Index

HIGHER LEVEL ABSTRACTIONS FOR MR (PIG)

Introduction and Architecture, Different Modes of executing Pig constructs, Data Types, Dynamic invokers Pig streaming Macros, Pig Latin language Constructs (LOAD, STORE, DUMP, SPLI T, etc), User Defined Functions, Use Cases

HIGHER LEVEL ABSTRACTIONS FOR MR (HIVE)

Introduction and Architecture, Different Modes of executing Hive queries, Metastore Implementations, HiveQL (DDL & DML Operations) External v/s, Managed Tables Views, Partitions & Buckets User Defined Functions, Transformations using Non Java Use Cases

COMPARISON OF PIG AND HIVE

NoSQL Databases – 1 (Theoretical Concepts), NoSQL Concepts, Review of RDBMS

Need for NoSQL, Brewers CAP Theorem, ACI D v/s BASE, Schema on Read vs. Schema on Write, Different levels of consistency, Bloom filters

DIFFERENT TYPES OF NOSQL DATABASES

Key Value, Columnar, Document, Graph

COLUMNAR DATABASES CONCEPTS NOSQL DATABASES – 2 (PRACTICE)

HBase Architecture, Master and the Region Server, Catalog tables ( ROOT and META), Major and Minor compaction, Configuration files, HBase v/s Cassandra

INTERFACES TO HBASE (FOR DDL AND DML OPERATIONS)

Java API, Client API, Filters, Scan Caching and Batching, Command Line Interface, REST API

ADVANCE HBASE FEATURES

HBase Data Modeling, Bulk loading data in HBase, HBase Coprocessors – Endpoints (similar to Stored Procedures in RDBMS), HBase Coprocessors – Observers (similar to Triggers in RDBMS)

SPARK

Introduction to RDD, Installation and Configuration of Spark, Spark Architecture, Different interfaces to Spark, Sample Python program s in Spark

INTRODUCTION TO YARN

Usecase of YARN, YARN Architecture, YARN Demo

INTRODUCTION TO OOZIE

Usecase of Oozie, Oozie Architecture, Oozie Demo

INTRODUCTION TO FLUME

Usecase of Flume, Flume Architecture, Flume Demo

INTRODUCTION TO SQOOP

Usecase of Sqoop, Sqoop Architecture, Sqoop Dem

SETTING UP A HADOOP CLUSTER USING APACHE HADOOP

Cloudera Hadoop cluster on the Amazon Cloud (Practice), Using EMR ( Elastic Map Reduce), Using EC2 ( Elastic Compute Cloud

SSH CONFIGURATION

Stand alone mode (Theory) Distributed mode (Theory), Pseudo distributed, Fully distributed

HADOOP ECOSYSTEM AND USE CASES

Hadoop industry solutions, Importing/ exporting data across RDBMS and HDFS using Sqoop Getting real- time events into HDFS using Flume , Creating workflows in Oozie Introduction to Graph processing Graph processing with Neo4J, Using the Mongo Document Database, Using the Cassandra Columnar Database, Distributed Coordination with Zookeeper

PROOF OF CONCEPTS AND USE CASES

Click Stream Analysis using Pig and Hive, Analyzing the Twitter data with Hive, Further ideas for data analysis