Explain about Apache Flume architecture

5 min readOct 31, 2020


Apache Flume is a device that is used to transfer large amounts of streaming data to HDFS. The compilation of log data from web servers in log files and its aggregation for review in HDFS is a typical example of Flume’s use.

Flume supports various outlets, such as follows.

  • ‘tail’ (which pipes data from a local file and writes via Flume to HDFS, similar to the ‘tail’ Unix command)
  • Logs from the machine
  • Apache log4j (Allows Java applications to write events via Flume to HDFS files)<More info go through Big Data Hadoop Course Tutorials Blog

Architecture Apache Flume

A Flume agent is a JVM mechanism with 3 components.

In the diagram above, the Flume Data Source consumes events produced by an external source (WebServer). The external source sends events in a format known by the target source to the Flume source.

An event is received and processed by Flume Source on one or more networks. The channel serves as a store that holds the event before the flume sink absorbs it. To store these events, this channel can use a local file system.

Flume sink eliminates the event from a channel and stores it in, for example, HDFS, an external repository. There may be several flume agents, in which case the event can be forwarded by the flume sink to the next flume agent’s flume source in the flow.

Any of Apache FLUME’s essential characteristics

Centered on streaming data flows, Flume has a versatile architecture. With several failovers and recovery mechanisms, it is fault-tolerant and stable. Flume provides various levels of reliability, including ‘best-effort delivery’ and ‘end-to-end delivery’. Best-effort delivery does not accept any failure of the Flume node, while the ‘end-to-end delivery’ mode ensures delivery even in case of multiple node failures.

Data between sources and sinks is carried by Flume. This information collection can either be organized or event-driven. Flume has its query processing engine. Moreover, it makes it easy for any new batch of data. Thus, to convert it before you move it to the sink.

Possible sinks for Flume include HDFS and HBase.

Setup Apache Flume, Library, and Source Code

Ensure that you have Hadoop installed before we begin with the actual procedure. Shift user to ‘house’ (you can switch to the user id used during your Hadoop setup when using Hadoop configuration)

Step 1)

Build a new folder called ‘Apache Flume’

Offer permission to read, write and execute

Chmod sudo -R 777 Flume

The MyTwitterSource.java and MyTwitterSourceForFlume.java files are copied to this directory.

Download Input Files Here From

Check the file permissions of all these files and then grant the same if ‘read’ permissions are missing.

Create Your First FLUME Program- for Beginners

Step 2)

Download the site ‘Apache Flume’-https:/flume.apache.org/download.htmll

In this Apache Flume, 1.4.0 was used.

Create Your First FLUME Program Beginners

Next Click on the

Create Your First FLUME Program

Step 3)

In the directory of your choosing, copy the downloaded tarball, and extract the contents using the following command.

Tar sudo -xvf apache-flume-1.4.0-bin.tar.gzz-flume

This command creates and extracts files into a new directory called apache-flume-1.4.0-bin. This directory is referred to in the remainder of the paper.

Step 4)

Configuration of Apache flume library

It is conceivable that permission will be needed to execute one or all of the copied JARs. This will cause a problem with code compilation. So, revoke permission to execute on such a Container.

In my case, authorization had to be executed by twitter4j-core-4.0.1.jar. I revoked it, as seen below.

Chmod sudo -x twitter4j-core-4.0.1.jar-core

After this instruction, twitter4j-core-4.0.1.jar permits ‘read’ to everyone.

Chmod sudo + rrr /usr / local / apache-flume-1.4.0-bin / lib / twitter4j-core-4.0.1.jar-core-4.0.1.jar-code

Notice, please, that I have downloaded —

— HTTPS:/mvnrepository.com/artifact/org.twitter4j/twitter4j-core-4.0.1.jar from twitter4j-core-4.0.1.jar

-All JARs of flame, i.e. flume-ng-*-1.4.0.jar from http:/mvnrepository.com/artifact/org.apache.flume

Using Apache Flume to load data from Twitter

Step 1)

Go to the directory containing files containing the source code.

Step 2)

Set /lib/ * and ~/FlumeTutorial / flume / mytwittersource/ * to include =

Step 3)

Compile the code

Javak-d. MyTwitterSourceForFlume.java MyTwitterSource.java MyTwitterSourceForFlume.jeva

Step 4)

Form a container

First, use a text editor of your choice to create the Manifest.txt file and add it below —

Flume.mytwittersource. MyTwitterSourceForFlume Main-Class:

Here, the name of the key class is flume.mytwittersource. MyTwitterSourceForFlume. Please note that at the end of this section, you must click the Enter key.

Now, build ‘MyTwitterSourceForFlume.jar’ from JAR as —

Jar cfm MyTwitterSourceForFlume.jar Manifest.txt flume / mytwittersource/*.class flume.txt flume / mytwittersource.

Step 5

This jar is copied to /lib/

MyTwitterSourceForFlume.jar /lib/ sudo cp

Step 6

Go to Apache Flume’s configuration directory/conf

If there is no flume.conf, copy flume-conf.properties.template and rename it as flume.conf.

Sudo cp flume-conf.properties.template flume.conf.template flume-conf.properties

If flume-env.sh does not exist, copy and rename flume-env.sh.template to flume-env.sh.

Cp sudo flume-env.sh.template flume-env.sh.template flume-env.sh.

Creating an Apache Flume application for Twitter

Step 1)

Sign in to https:/developer.twitter.com/Create a Twitter application

Create Your First FLUME Program-Tutorial for Beginners

Step 2)

Go to ‘My Applications’ (This option is dropped when the ‘Egg’ button is pressed in the top right corner)

Step 3)

By pressing ‘Create New App’, create a new application

Step 4)

By defining the name of the application, definition, and website, fill in the application information. Underneath each input box, you may refer to the notes issued.

Step 5)

Scroll down the page and approve terms by selecting ‘Yes, I agree’ and clicking ‘Build your application for Twitter’

Step 6)

Go to the ‘API Keys’ tab in the newly created application window, scroll down the page and press the’ Build my access token ‘button.

Step 7)

Page refresh.

Step 8)

Press ‘OAuth exam’. This will show the application’s ‘OAuth’ settings.

Step 9)

Using these OAuth parameters, change ‘flume.conf’ The steps for modifying ‘flume.conf’ are given below.

To update ‘flume.conf’, we need to copy the consumer key, consumer secret, access token, and access token secret.

Note: These principles are user-specific and thus private, so they should not be shared.

Modify a file called ‘flume.conf’

Step 1)

Open ‘flume.conf’ in write mode and set the parameter values below —

Gedit sudo flume.conf

Copying the contents below-

Twitter MyTwitAgent.sources =

MyTwitAgent.channels = MemChannel MemChannel

MyTwitAgent.sinks = HDFS = HDFSS

MyTwitterSourceForFlume. MyTwitterAgent.sources. Twitter.type = flume.mytwittersource.

MyTwitAgent.sources. Twitter.channels = MemChannel = Twitter.channels

MyTwitAgent.sources. Twitter.consumerKey = = ConsumerKey =

MyTwitAgent.sources. Twitter.consumerSecret = Secret of Market =

MyTwitAgent.sources. Twitter.accessToken = = MyTwitAgent.sources.

MyTwitAgent.sources. Twitter.accessTokenSecret = MyTwitAgent.sources.

= MyTwitAgent.sources. Twitter.keywords

MyTwitAgent.sinks. HDFS.channel = MemChannel.channel = MemChannel.

MyTwitAgent.sinks. HDFS.type = hdfs.type = hdfs

HDFS.hdfs.path = hdfs:/localhost:54310 / user / hduser / flume / tweets / tweets / path= MyTwitAgent.sinks.

MyTwitAgent.sinks. HDFS.hdfs.fileType = DataStream.sinks.hdfs.fileType=

MyTwitAgent.sinks. HDFS.hdfs.writeFormat = Document Format = Text Format

MyTwitAgent.sinks.hdfs.hdfs.batchSize= 1000 Dimensions:

MyTwitAgent.sinks. HDFS.hdfs.rollSize= 0

MyTwitAgent.sinks. HDFS.hdfs.rollCount= 10000 = 10000

MemChannel.type = memory = MyTwitAgent.channels.

Capacity = 1000000 = MyTwitAgent.channels. MemChannel.capacity

Capacity of MyTwitAgent.channels. MemChannel.transaction = 1000

Step 2)

Also, set as below, TwitterAgent.sinks. HDFS.hdfs.path,

HDFS.hdfs.path = hdfs:/:/flume / tweets / TwitterAgent.sinks.

Know and see the value of the fs.defaultFS parameter set in the $HADOOP HOME / etc / hadoop / core-site.xml value.

Step 3)

Delete the below entry if it exists, to flush the data to HDFS, as well as when it arrives.


I hope you reach a conclusion about Apache flume in big data. You can learn more concepts in big data through Big Data online training.




Big Data,ios,android,Spark