How to access files from HDFS?

In order to access files from HDFS one can use various Hadoop commands from UNIX shell. Additionally, Hadoop also provides powerful Java APIs using which a programmer can write a code for accessing files over HDFS. Before we go into the more details let’s understand the terminology to access files from HDFS.

How to leverage HDFS?

HDFS is not like a normal file system. Basically it was designed to work well with Sequential read and write. That means it may not work well for the use cases where you require to perform random read and write operations on a file.

Most of the Big Data problems are following the data processing pattern, write once and read many times. So initial design of HDFS was also around this concept. Means you just can write file once on HDFS than after cannot modify that. But recent releases of Hadoop have the support of appending data to the end of the file. But random modification in any file are still not permitted.

Additionally, Hadoop design supports less number of very big files over a very large number of tiny files.

What are the operations supported by HDFS?

If we think about the kind of functionalities which HDFS should provide, below are the main points.

  • It should have mostly all commands as UNIX/Linux file system
  • As a distributed file system, there should be some mechanism to move file from local file system of HDFS
  • Similarly, there should be some reverse mechanism to copy file from HDFS to local file system as well

HDFS Commands

Before we go into further details, make sure that you have Hadoop installed properly. If you don’t have that, follow all the steps given in my post Setup Multi Node Hadoop 2.6.0 Cluster with YARN.

Below are the list of commands for accessing HDFS

I’ll cover the examples of Java API for accessing files from HDFS in my upcoming posts.


Leave a Reply

Notify of