Tuesday, September 30, 2014

Hadoop (HDFS) Command line basics

Once you get Hadoop installed you can open the a terminal (aka command line). There is a program called hadoop and we need to pass it different switches and arguments to make it do what we want. Most of the hadoop fs (file shell) commands behave like the corresponding UNIX commands. Below are some of the commands you may find useful.

As a general rule, all hadoop filesystem commands start with hadoop fs.

Referencing HDFS Paths
When accessing the HDFS filesystem you need to use the hostname and port associated with the name node. In the examples below, the host is bivm and the name node is running on port 9000. You can also just make a relative reference by leaving off the hdfs://bivm:9000/

For example, to copy a file from the local file system to the HDFS file system we could specify the full path with:

hadoop fs -put file:///home/biadmin/test.txt hdfs://bivm:9000/user/biadmin/test.txt

However, we can also do

hadoop fs -put file:///home/biadmin/test.txt test.txt

This convention applies to all hadoop fs commands.

help - To get help and see all the commands for hadoop fs
hadoop fs -help

help - Get help on a fs command
hadoop fs -help du

ls - Show the files in the current user's directory

hadoop fs -ls
hadoop fs -ls /user/biadmin(assuming the user name is biadmin)

ls - Show the files in the user directory

hadoop fs -ls /user

cp - Copy a file from local file system to HDFS
hadoop fs -cp file:///home/biadmin/test.txt hdfs://bivm:9000/user/biadmin/test.txt

put or copyFromLocal - Copy files from the local filesystem (the opposite of copyToLocal)
hadoop fs -put file:///home/biadmin/test.txt hdfs://bivm:9000/user/biadmin/test.txt
hadoop fs -copyFromLocal file:///home/biadmin/test.txt hdfs://bivm:9000/user/biadmin/test.txt

get or copyToLocal or get - Copies files from HDFS to the local filesystem (the opposite of copyFromLocal)
hadoop fs -copyToLocal hdfs://bivm:9000/user/biadmin/test.txt file:///home/biadmin/test.txt
hadoop fs -get hdfs://bivm:9000/user/biadmin/test.txt file:///home/biadmin/test.txt

tail - View the last few line of a file
hadoop fs -tail hdfs://bivm:9000/user/biadmin/test.txt

cat -View the entire contents of a file
hadoop fs -cat hdfs://bivm:9000/user/biadmin/test.txt

rm - remove a file
hadoop fs -rm hdfs://bivm:9000/user/biadmin/test.txt

du - find the size of a file
hadoop fs -du hdfs://bivm:9000/user/biadmin/test.txt

du - get the size of all files in a directory
 hadoop fs -du hdfs://bivm:9000/user/biadmin

du - get the total size of all files in a directory
hadoop fs -du -s hdfs://bivm:9000/user/biadmin

mkdir - make a new directory
hadoop fs -mkdir hdfs://bivm:9000/user/biadmin/test

Other Unix-like HDFS Commands

setRep - Sets the replication factor of a file or Sets the replication factor  of a entire tree
getMerge - Gets all files in the directories that match the source pattern and also merges and sorts them into only one file on the local filesystem

You can pipe the results of these commands to unix commands. For example, we can grep the result of the ls command.

hadoop fs -ls | grep test

The result would be something like:
-rw-r--r--   1 biadmin biadmin          5 2014-09-24 00:48 test.txt

No comments: