Big Data Analytics with Hadoop 3 : Build highly effective analytics solutions to gain valuable insight into your big data.

Apache Hadoop is the most popular platform for big data processing to build powerful analytics solutions. This book shows you how to do just that, with the help of practical examples. You will be well-versed with the analytical capabilities of Hadoop ecosystem with Apache Spark and Apache Flink to p...

Full description

Saved in:
Bibliographic Details
Main Author: Alla, Sridhar
Format: eBook
Language:English
Published: Birmingham : Packt Publishing, 2018.
Subjects:
Online Access:Click for online access
Table of Contents:
  • Cover; Title Page; Copyright and Credits; Packt Upsell; Contributors; Table of Contents; Preface; Chapter 1: Introduction to Hadoop; Hadoop Distributed File System; High availability; Intra-DataNode balancer; Erasure coding; Port numbers; MapReduce framework; Task-level native optimization; YARN; Opportunistic containers; Types of container execution ; YARN timeline service v. 2; Enhancing scalability and reliability; Usability improvements; Architecture; Other changes; Minimum required Java version ; Shell script rewrite; Shaded-client JARs; Installing Hadoop 3 ; Prerequisites; Downloading.
  • InstallationSetup password-less ssh; Setting up the NameNode; Starting HDFS; Setting up the YARN service; Erasure Coding; Intra-DataNode balancer; Installing YARN timeline service v. 2; Setting up the HBase cluster; Simple deployment for HBase; Enabling the co-processor; Enabling timeline service v. 2; Running timeline service v. 2; Enabling MapReduce to write to timeline service v. 2; Summary; Chapter 2: Overview of Big Data Analytics; Introduction to data analytics; Inside the data analytics process; Introduction to big data; Variety of data; Velocity of data; Volume of data; Veracity of data.
  • Variability of dataVisualization; Value; Distributed computing using Apache Hadoop; The MapReduce framework; Hive; Downloading and extracting the Hive binaries; Installing Derby; Using Hive; Creating a database; Creating a table; SELECT statement syntax; WHERE clauses; INSERT statement syntax; Primitive types; Complex types; Built-in operators and functions; Built-in operators; Built-in functions; Language capabilities; A cheat sheet on retrieving information ; Apache Spark; Visualization using Tableau; Summary; Chapter 3: Big Data Processing with MapReduce; The MapReduce framework; Dataset.
  • Record readerMap; Combiner; Partitioner; Shuffle and sort; Reduce; Output format; MapReduce job types; Single mapper job; Single mapper reducer job; Multiple mappers reducer job; SingleMapperCombinerReducer job; Scenario; MapReduce patterns; Aggregation patterns; Average temperature by city; Record count; Min/max/count; Average/median/standard deviation; Filtering patterns; Join patterns; Inner join; Left anti join; Left outer join; Right outer join; Full outer join; Left semi join; Cross join; Summary; Chapter 4: Scientific Computing and Big Data Analysis with Python and Hadoop; Installation.
  • Installing standard PythonInstalling Anaconda; Using Conda; Data analysis; Summary; Chapter 5: Statistical Big Data Computing with R and Hadoop; Introduction; Install R on workstations and connect to the data in Hadoop; Install R on a shared server and connect to Hadoop; Utilize Revolution R Open; Execute R inside of MapReduce using RMR2; Summary and outlook for pure open source options; Methods of integrating R and Hadoop; RHADOOP
  • install R on workstations and connect to data in Hadoop; RHIPE
  • execute R inside Hadoop MapReduce; R and Hadoop Streaming.
  • RHIVE
  • install R on workstations and connect to data in Hadoop.