Apache Hadoop

Posted on

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.Hadoop has moved far beyond its beginnings in web indexing and is now used in many industries for a huge variety of tasks that all share the common theme of lots of variety, volume and velocity of data – both structured and unstructured. It is now widely used across industries, including finance, media, entertainment,government, healthcare, information services, retail, and other industries with Big Data requirements but the limitations of the original storage infrastructure remain.

Apache Hadoop includes a Distributed File System (HDFS), which breaks up input data and stores data on the compute nodes. This makes it possible for data to be processed in parallel using all of the machines in the cluster. The Apache Hadoop Distributed File System is written in Java and runs on different operating systems.Apache Hadoop is 100% open source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. And in today’s hyper-connected world where more and more data is being created every day, Hadoop’s breakthrough advantages mean that businesses and organizations can now find value in data that was recently considered useless.

hadoop

 Hadoop Different from Past Techniques?

  • Hadoop can handle data in a very fluid way
  • Hadoop has a simplified programming model
  • Hadoop is easy to administer
  • Hadoop is agile

Benefits of Apache Hadoop?

  • cost effective.
  •  fault-tolerant
  • flexible.
  • scalable.

Overview of Hadoop

hadoopDiagram

 

Advertisements

One thought on “Apache Hadoop

    AVON Pur Blanca Eau De Toilette said:
    February 7, 2014 at 3:26 pm

    Great informational blog! I love your writing style Awesome work Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s