Apache hive Installation on single node cluster on Hadoop 1.x

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. While initially developed by Facebook, Apache Hive is now used and developed by other companies such as Netflix Amazon maintains a software fork of Apache Hive that is included in Amazon Elastic MapReduce on Amazon Web Services

Steps for installation
1. First you need to download the apache hive

you can download the latest mirror of Apache Hive from here
http://www.eu.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz

2.Untar the package which id downloaded using the following command:
sudo tar –xvzf apache-hive-1.2.1-bin.tar.gz

3. Move the extracted package to hive folder
sudo mv apache-hive-1.2.1-bin hive

4. Provide the appropriate permission for the hive folder
sudo chown –R hduser:hdgroup hive

5. Next move the folder to /usr/local
sudo mv hive /usr/local/

6.Now edit the ~/.bashrc file and add the following contents at the end
sudo vim ~/.bashrc

Add the following at the end

# Set Hive-related environment variables

export HIVE_HOME=/usr/local/hive
export HIVE_CONF=/usr/local/hive/conf
export HIVE_LIB=/usr/local/hive/lib
export HiVE_CLASSPATH=/usr/local/hive/lib
export PATH=$PATH:HIVE_HOME/bin

7. Add a file under /conf folder called "hive-site.xml"

8. Now create two directories tmp and waehouse

Create /tmp and /user/hive/warehouse on HDFS and give them full permission

bin/hadoop dfs –mkdir /tmp
bin/hadoop dfs –mkdir /user/hive/warehouse

Giving permissions

bin/hadoop dfs –chmod g+w /tmp
bin/hadoop dfs –chmod g+w /user/hive/warehouse

Learning

Search This Blog