A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment. Hadoop clusters are known for boosting the speed of data analysis applications. They also are highly scalable.
Normally any set of loosely connected or the tightly connected computers that work together as an single system are called Cluster. In simple words, an computer cluster used for Hadoop is called as the Hadoop Cluster.
Hadoop cluster is an special type of computational cluster designed for storing & analyzing the vast amount of un-structured data in an distributed computing environment. These clusters run on the low cost commodity computers.
Hadoop clusters are often referred to as the “shared nothing” systems because the only thing that is shared between nodes is the network that connects them. The large Hadoop Clusters are arranged in the several racks. Network traffic between different nodes in the same rack is much more desirable than the network traffic across the racks.
Hadoop clusters are comprised of three different node types namely : master nodes, worker nodes, & the client nodes. By understanding the different node types will help you plan your cluster, & configure the appropriate number & type of nodes when creating an cluster.
The three types of node groups in an Hadoop deployment are master nodes, worker nodes, & the client nodes. Master nodes oversee the following key operations that comprise the Hadoop: storing data in the Hadoop Distributed File System (HDFS) & then running parallel computations on that data by using the MapReduce.
The NameNode coordinates the data storage function (with the HDFS), while the JobTracker oversees & coordinates the parallel processing of data by using MapReduce