Apache Flume is a device that is used to transfer large amounts of streaming data to HDFS. The compilation of log data from web servers in log files and its aggregation for review in HDFS is a typical example of Flume’s use.
Flume supports various outlets, such as follows.
- ‘tail’ (which pipes data from a local file and writes via Flume to HDFS, similar to the ‘tail’ Unix command)
- Logs from the machine
- Apache log4j (Allows Java applications to write events via Flume to HDFS files)<More info go through Big Data Hadoop Course Tutorials Blog
Architecture Apache Flume
A Flume agent is a JVM mechanism with 3 components.
In the diagram above, the Flume Data Source consumes events produced by an external source (WebServer). The external source sends events in a format known by the target source to the Flume source.
An event is received and processed by Flume Source on one or more networks. The channel serves as a store that holds the event before the flume sink absorbs it. To store these events, this channel can use a local file system.
Flume sink eliminates the event from a channel and stores it in, for example, HDFS, an external repository. There may be several flume agents, in which case the event can be forwarded by the flume sink to the next flume agent’s flume source in the flow.
Any of Apache FLUME’s essential characteristics
Centered on streaming data flows, Flume has a versatile architecture. With several failovers and recovery mechanisms, it is fault-tolerant and stable. Flume provides various levels of reliability, including ‘best-effort delivery’ and ‘end-to-end delivery’. Best-effort delivery does not accept any failure of the Flume node, while the ‘end-to-end delivery’ mode ensures delivery even in case of multiple node failures.
Data between sources and sinks is carried by Flume. This information collection can either be organized or event-driven. Flume has its query processing engine. Moreover, it makes it easy for any new batch of data. Thus, to convert it before you move it to the sink.
Possible sinks for Flume include HDFS and HBase.
Setup Apache Flume, Library, and Source Code
Ensure that you have Hadoop installed before we begin with the actual procedure. Shift user to ‘house’ (you can switch to the user id used during your Hadoop setup when using Hadoop configuration)
Build a new folder called ‘Apache Flume’
Offer permission to read, write and execute
Chmod sudo -R 777 Flume
The MyTwitterSource.java and MyTwitterSourceForFlume.java files are copied to this directory.
Download Input Files Here From
Check the file permissions of all these files and then grant the same if ‘read’ permissions are missing.
Create Your First FLUME Program- for Beginners
Download the site ‘Apache Flume’-https:/flume.apache.org/download.htmll
In this Apache Flume, 1.4.0 was used.
Create Your First FLUME Program Beginners
Next Click on the
Create Your First FLUME Program
In the directory of your choosing, copy the downloaded tarball, and extract the contents using the following command.
Tar sudo -xvf apache-flume-1.4.0-bin.tar.gzz-flume
This command creates and extracts files into a new directory called apache-flume-1.4.0-bin. This directory is referred to in the remainder of the paper.
Configuration of Apache flume library
It is conceivable that permission will be needed to execute one or all of the copied JARs. This will cause a problem with code compilation. So, revoke permission to execute on such a Container.
In my case, authorization had to be executed by twitter4j-core-4.0.1.jar. I revoked it, as seen below.
Chmod sudo -x twitter4j-core-4.0.1.jar-core
After this instruction, twitter4j-core-4.0.1.jar permits ‘read’ to everyone.
Chmod sudo + rrr /usr / local / apache-flume-1.4.0-bin / lib / twitter4j-core-4.0.1.jar-core-4.0.1.jar-code
Notice, please, that I have downloaded —
— HTTPS:/mvnrepository.com/artifact/org.twitter4j/twitter4j-core-4.0.1.jar from twitter4j-core-4.0.1.jar
-All JARs of flame, i.e. flume-ng-*-1.4.0.jar from http:/mvnrepository.com/artifact/org.apache.flume
Using Apache Flume to load data from Twitter
Go to the directory containing files containing the source code.
Set /lib/ * and ~/FlumeTutorial / flume / mytwittersource/ * to include =
Compile the code
Javak-d. MyTwitterSourceForFlume.java MyTwitterSource.java MyTwitterSourceForFlume.jeva
Form a container
First, use a text editor of your choice to create the Manifest.txt file and add it below —
Flume.mytwittersource. MyTwitterSourceForFlume Main-Class:
Here, the name of the key class is flume.mytwittersource. MyTwitterSourceForFlume. Please note that at the end of this section, you must click the Enter key.
Now, build ‘MyTwitterSourceForFlume.jar’ from JAR as —
Jar cfm MyTwitterSourceForFlume.jar Manifest.txt flume / mytwittersource/*.class flume.txt flume / mytwittersource.
This jar is copied to /lib/
MyTwitterSourceForFlume.jar /lib/ sudo cp
Go to Apache Flume’s configuration directory/conf
If there is no flume.conf, copy flume-conf.properties.template and rename it as flume.conf.
Sudo cp flume-conf.properties.template flume.conf.template flume-conf.properties
If flume-env.sh does not exist, copy and rename flume-env.sh.template to flume-env.sh.
Cp sudo flume-env.sh.template flume-env.sh.template flume-env.sh.
Creating an Apache Flume application for Twitter
Sign in to https:/developer.twitter.com/Create a Twitter application
Create Your First FLUME Program-Tutorial for Beginners
Go to ‘My Applications’ (This option is dropped when the ‘Egg’ button is pressed in the top right corner)
By pressing ‘Create New App’, create a new application
By defining the name of the application, definition, and website, fill in the application information. Underneath each input box, you may refer to the notes issued.
Scroll down the page and approve terms by selecting ‘Yes, I agree’ and clicking ‘Build your application for Twitter’
Go to the ‘API Keys’ tab in the newly created application window, scroll down the page and press the’ Build my access token ‘button.
Press ‘OAuth exam’. This will show the application’s ‘OAuth’ settings.
Using these OAuth parameters, change ‘flume.conf’ The steps for modifying ‘flume.conf’ are given below.
To update ‘flume.conf’, we need to copy the consumer key, consumer secret, access token, and access token secret.
Note: These principles are user-specific and thus private, so they should not be shared.
Modify a file called ‘flume.conf’
Open ‘flume.conf’ in write mode and set the parameter values below —
Gedit sudo flume.conf
Copying the contents below-
Twitter MyTwitAgent.sources =
MyTwitAgent.channels = MemChannel MemChannel
MyTwitAgent.sinks = HDFS = HDFSS
MyTwitterSourceForFlume. MyTwitterAgent.sources. Twitter.type = flume.mytwittersource.
MyTwitAgent.sources. Twitter.channels = MemChannel = Twitter.channels
MyTwitAgent.sources. Twitter.consumerKey = = ConsumerKey =
MyTwitAgent.sources. Twitter.consumerSecret = Secret of Market =
MyTwitAgent.sources. Twitter.accessToken = = MyTwitAgent.sources.
MyTwitAgent.sources. Twitter.accessTokenSecret = MyTwitAgent.sources.
= MyTwitAgent.sources. Twitter.keywords
MyTwitAgent.sinks. HDFS.channel = MemChannel.channel = MemChannel.
MyTwitAgent.sinks. HDFS.type = hdfs.type = hdfs
HDFS.hdfs.path = hdfs:/localhost:54310 / user / hduser / flume / tweets / tweets / path= MyTwitAgent.sinks.
MyTwitAgent.sinks. HDFS.hdfs.fileType = DataStream.sinks.hdfs.fileType=
MyTwitAgent.sinks. HDFS.hdfs.writeFormat = Document Format = Text Format
MyTwitAgent.sinks.hdfs.hdfs.batchSize= 1000 Dimensions:
MyTwitAgent.sinks. HDFS.hdfs.rollSize= 0
MyTwitAgent.sinks. HDFS.hdfs.rollCount= 10000 = 10000
MemChannel.type = memory = MyTwitAgent.channels.
Capacity = 1000000 = MyTwitAgent.channels. MemChannel.capacity
Capacity of MyTwitAgent.channels. MemChannel.transaction = 1000
Also, set as below, TwitterAgent.sinks. HDFS.hdfs.path,
HDFS.hdfs.path = hdfs:/:/flume / tweets / TwitterAgent.sinks.
Know and see the value of the fs.defaultFS parameter set in the $HADOOP HOME / etc / hadoop / core-site.xml value.
Delete the below entry if it exists, to flush the data to HDFS, as well as when it arrives.
I hope you reach a conclusion about Apache flume in big data. You can learn more concepts in big data through Big Data online training.