Flume Hadoop Agent – Spool directory to HDFS

In the previous post, we have already seen how to write the data from spooling directory to the console log. In this post, we will learn how Flume Hadoop agent can write data from Spooling directory to HDFS.

Similarly, if we want to write the data from spooling directory to HDFS we need to edit the properties files and change the sink information to HDFS.

spool-to-hdfs.properties

With the help of above properties file when the file will be written to HDFS, the file name will be events_.log

How many events read from the source will go into one HDFS file?

The files in HDFS are rolled over every 30 seconds by default. You can change the interval by setting the “rollInterval” property. The value of “rollInterval” property will be specified in seconds. You can also roll over the files by event count or cumulative event size.

“Filetype” property can be of three different types i.e. sequence file, datastream (text file) or compressed stream.

The default option is sequence file which is binary format file. The event body will be the byte array and the byte array will be written to the sequence file.

It is expected that whoever is reading the sequence file knows how to serialize the binary data to object/data.

Datastream can be any file which is uncompressed such as text file while as the compressed stream will be any file format such as gzip, rar, bzip2 etc.

Now, start the flume agent using below command:

Once, the Flume Hadoop agent is ready, start putting the files in spooling directory. It will trigger some actions in the flume agent.

Once you will see that the spooling directory files are suffixed with “COMPLETED”, go to the HDFS and check whether files have arrived or not. Use below command to list the file in HDFS directory.

Use the ‘cat’ command to print the content of the file.
Team RCV Academy

About Team RCV Academy

RCV Academy Team is a group of professionals working in various industries and contributing to tutorials on the website and other channels.

View all posts by Team RCV Academy