img
Big data Hadoop

Big data is the term for a collection of large datasets that cannot be processed using traditional computing techniques. Enterprise Systems generate huge amount of data from Terabytes to and even Petabytes of information.Hadoop is one of the tools designed to handle big data. Hadoop and other software products work to interpret or parse the results of big data searches through specific proprietary algorithms and methods. Apache Hadoop is not actually a single product but instead a collection of several components.


Course Syllabus


  • What is Big Data?
  • The core technologies of Hadoop
  • How Hadoop Distributed File System (HDFS) and MapReduce work?
  • How to develop MapReduce jobs?
  • Algorithms for common MapReduce tasks
  • How to create large workflows using multiple MapReduce jobs?
  • Best practices for debugging Hadoop jobs
  • Advanced features of the Hadoop API
  • Motivation for Hadoop
  • Big Data Characteristics, Challenges with traditional system
  • Hadoop History
  • Core Hadoop Concepts
  • Hadoop Clusters, Installation and Configuration
  • How Map Reduce Works?
  • Data Type
  • Input & Output Formats
  • Hadoop Cluster Functionality
  • Cluster sizing
  • Capacity planning
  • Writing a Map Reduce Program
  • Examining Sample MapReduce program
  • The Driver Code
  • The Mapper
  • The Reducer
  • The Streaming API
  • Develop a MapReduce program
  • Hive Basics
  • HQL queries
  • Internal & External Tables
  • Partitioning
  • Buckets
  • Pig Basics
  • Loading data files
  • Writing queries with following clauses -
  • SPLIT, FILTER, JION, GROUP, SAMPLE, ILLUSTRATE etc.
  • Flume
  • SQOOP
  • Importing and Exporting data using RDBMS
  • Other options with SQOOP
  • Zookeeper
  • Oozie
  • Hbase commands