Big Data

Thane | Andheri | Online

Admission open : 09th July 2020

Weekdays & Weekend Batches

(Rated 4.6 based on 112 customer reviews)

Your data is safe with us, No unnecessary marketing calls!

Big data can analyzed for insights, It helps to make better decisions and strategic moves.
Big Data

Big Data


  1. Introduction to Big Data and Hadoop

    • Data Growth
    • Data Challenges(4V)
    • Why Big Data and What is Big Data
    • Overview of Big Data Tools, Different vendors providing hadoop and where it fits in the industry
    • Setting up Development environment & performing Hadoop
    • Installation on User’s laptop
      • Hadoop daemons
      • Starting stopping daemons using command line and cloudera manager
  2. Hadoop Installation

    • Linux Primer
    • Ubuntu Primer
    • Downloading Installing Apache Hadoop All In One configuration
    • HDFS commands
    • Mapreduce Program Execution
    • Exploring HDFS blocks & Meta Data
  3. Data Science

    Significance of HDFS in Hadoop
    • Features of HDFS
    • 5 daemons of Hadoop
      • Name Node and its functionality
      • Data Node and its functionality
      • Secondary Name Node and its functionality
      • Job Tracker and its functionality
      • Task Tracker and its functionality
    • Data Storage in HDFS
      • Introduction about Blocks
      • Data replication
  4. Map Reduce

    • Map Reduce Story
    • Map Reduce Architecture
    • How Map Reduce works
    • Developing Map Reduce
    • Map Reduce Programming Model
      • Different phases of Map Reduce Algorithm.
      • Different Data types in Map Reduce.
      • How Write a basic Map Reduce Program.
  5. Yarn

    • Resource Manager
    • Node Manager
    • Job Flow Sequence Revisited
    • Classical version of Apache Hadoop (MRv1)
    • Limitations of classical MapReduce
    • Addressing the scalability ,resource utilization issue and need to support different programming paradigms
    • YARN: The next generation of Hadoop's compute platform (MRv2)
    • Architecture of YARN
    • Application submission in YARN
    • Type of yarn schedulers (FIFO, Capacity and Fair)
  6. PIG

    • Introduction to Apache Pig
      • Pig architecture
    • Map Reduce Vs. Apache Pig
    • SQL vs. Apache Pig
    • Different data types in Pig
    • Modes of Execution in Pig
    • Grunt shell
    • Loading data
    • Exploring Pig
    • Latin commands
  7. HIVE

    • Hive introduction and architecture
    • Hive vs RDBMS
    • HiveQL and the shell
    • Managing tables (external vs managed)
    • Data types and schemas
    • Partitions and buckets
    • Installation
    • Hive Services, Hive Server and Hive Web Interface (HWI)
    • Meta store
    • Derby Database
    • Working with Tables
    • Primitive data types and complex data types
    • Working with Partitions
    • Hive Bucketed Tables and Sampling
    • External partitioned tables
    • Differences between ORDER BY, DISTRIBUTE BY and SORT BY
    • Log Analysis on Hive
    • Hands on Exercises
  8. HBASE

    • Architecture
    • HBase vs. RDBMS
    • Column Families and Regions
    • Write pipeline
    • Read pipeline
    • HBase commands
    • HBase Installation
    • HBase concepts
    • HBase Data Model and Comparison between RDBMS and NOSQL    
    • Master & Region Servers
    • HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture
    • Catalog Tables
  9. ZooKeeper

    • The ZooKeeper Service: Data Model
    • Operations
    • Implementation
    • Consistency
    • Sessions
    • States
  10. Flume

    • Understanding the Flume architecture and how is it different from sqoop
    • Flume Agent Setup
    • Setting up data
    • Types of sources, channels, sinks Multi Agent Flow
    • Different Flume implementations
    • Hands-on exercises (configuring and running flume agent to load streaming data from web server)
  11. Sqoop

    • What is Sqoop?
    • How its works?
    • architecture
    • Data Imports
    • Data Exports
    • Integration with Hadoop Ecosystem
  12. Apache Spark with Scala (Core Spark and Spark SQL)

    • Understanding the Spark architecture and why it is better than MapReduce
    • Working with RDD’s.
    • Hands on examples with various transformations on RDD
    • Perform Spark actions on RDD
    • Spark Sql concepts : Dataframes & Datasets
    • Hands on examples with Spark SQL to create and work with dataframes and datasets
    • Create Spark DataFrames from an existing RDD
    • Create Spark DataFrames from external files
    • Create Spark DataFrames from hive tables
    • Perform operations on a DataFrame
    • Using Hive tables in Spark


Your data is safe with us, No unnecessary marketing calls!

Thank you for contacting us !

Our Team will get in touch with you soon or call 8097057778 now to get answer for all your queries !

Like Our Facebook page to be up to date in industry !