Skip to content

Deploy a scalable Hadoop cluster using Apache Ambari for efficient big data processing. Configure Hadoop ecosystem components, ensuring security and optimization. Utilize Ambari's monitoring and management capabilities for seamless cluster administration.

Notifications You must be signed in to change notification settings

chaudharysurya14/Ambari-Hadoop_Installation

Repository files navigation

Learning how to tame the Big Data with Hadoop and related technologies

Table of Contents

Hadoop

  • Hadoop is an open source software platform for distributed storage and distributed processing of very large datasets on computer clusters built from commodity hardware
  • Why Hadoop?
    • Data is too big
    • Vertical scaling isn't an option
      • Disk seek times
      • Hardware failures
      • Processing times
    • Horizontal scaling is linear
    • You can do much more instead of just batch processing

Installation

  • Download Virtual Box from https://www.virtualbox.org/
  • Download image of Hadoop to run on Virtual Box
  • Import the image into Virtual Box
  • Once you bootup, you will have CentOS instance that has Hadoop up and running
  • We can use CLI, it also has browser interface
    • Ambari is available to easily navigate and manage different systems on Hadoop
    • Goto http://localhost:8888
  • Launch Dashboard and login to Ambari
    • Username: maria_dev
    • Password: maria_dev
  • Trouble shooting

    • Enable virtualization in your BIOS
    • Disable Hyper-V acceleration in Windows

About

Deploy a scalable Hadoop cluster using Apache Ambari for efficient big data processing. Configure Hadoop ecosystem components, ensuring security and optimization. Utilize Ambari's monitoring and management capabilities for seamless cluster administration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages