In order to access files from HDFS one can use various Hadoop commands from UNIX shell. Additionally, Hadoop also provides powerful Java APIs using which a programmer can write a code for accessing files over HDFS. Before we go into the more details let’s understand the terminology to access files from HDFS.
How to leverage HDFS?
HDFS is not like a normal file system. Basically it was designed to work well with Sequential read and write. That means it may not work well for the use cases where you require to perform random read and write operations on a file.
Most of the Big Data problems are following the data processing pattern, write once and read many times. So initial design of HDFS was also around this concept. Means you just can write file once on HDFS than after cannot modify that. But recent releases of Hadoop have the support of appending data to the end of the file. But random modification in any file are still not permitted.
Additionally, Hadoop design supports less number of very big files over a very large number of tiny files.
What are the operations supported by HDFS?
If we think about the kind of functionalities which HDFS should provide, below are the main points.
- It should have mostly all commands as UNIX/Linux file system
- As a distributed file system, there should be some mechanism to move file from local file system of HDFS
- Similarly, there should be some reverse mechanism to copy file from HDFS to local file system as well
Before we go into further details, make sure that you have Hadoop installed properly. If you don’t have that, follow all the steps given in my post Setup Multi Node Hadoop 2.6.0 Cluster with YARN.
Below are the list of commands for accessing HDFS
## Creates file under root directory
$ hdfs dfs -mkdir /user
## -p is used for creating multiple child directories
$ hdfs dfs -mkdir -p /user/root/backtobazics
## Creates file under user directory, if no / available before directory
$ hdfs dfs -mkdir test
## Creates empty file "emptyFile" under /user/root/backtobazics/ directory
$ hdfs dfs -touchz /user/root/backtobazics/emptyfile
## List all directories on HDFS root
$ hdfs dfs -ls /
## List all directories under user home directory
$ hdfs dfs -ls
## Recursive of List all files and sub directories
$ hdfs dfs -ls -R /user/root/backtobazics/
## Copy local files to HDFS : copy "localfile.txt" file to HDFS from local directory
$ hdfs dfs -copyFromLocal ./localfile.txt /user/root/backtobazics/
## Copy files from HDFS to local file system (. is current directory in local file system)
$ hdfs dfs -copyToLocal /user/root/backtobazics/hdfsfile.txt .
## Delete hdfsfile.txt file from HDFS
$ hdfs dfs -rm /user/root/backtobazics/hdfsfile.txt
## Recursively Delete backtobazics directory
$ hdfs dfs -rm -r /user/root/backtobazics
I’ll cover the examples of Java API for accessing files from HDFS in my upcoming posts.