Hadoop is a relatively new platform as is big data itself & not many professionals are experts in it, but SQL on Hadoop simplifies access to the Hadoop framework & makes it easier to implement on current enterprise systems. Because SQL was originally developed for relational databases, it has to be modified for the Hadoop 1 model, which uses the Hadoop Distributed File System & Map-Reduce or the Hadoop 2 model which can work without either HDFS / Map-Reduce.
The different means for executing SQL in Hadoop environments can be divided into, connectors that translate SQL into a MapReduce format, push down systems that forgo batch-oriented MapReduce & execute SQL within Hadoop clusters & systems that apportion SQL work between MapReduce-HDFS clusters / raw HDFS clusters, depending on the workload.
One of the earliest efforts to combine SQL & Hadoop resulted in the Hive data warehouse, which featured HiveQL software for translating SQL-like queries into MapReduce jobs. Other tools that help support SQL-on-Hadoop include BigSQL, Drill, Hadapt, Hawq, H-SQL, Impala, JethroData, Polybase, Presto, Shark, Spark, Splice Machine, Stinger & Tez.
SQL is one of the most widely used languages to access, analyze & manipulate structured data. As Hadoop gains traction within enterprise data architectures across industries, the need for SQL for both structured & loosely-structured data on Hadoop is growing rapidly.