what are the steps for MapReduce in big data?

What is MapReduce?

A MapReduce is a data processing tool which is used to process the data parallelly in a distributed form. It was developed in 2004, on the basis of paper titled as “MapReduce: Simplified Data Processing on Large Clusters,” published by Google.

The MapReduce is a paradigm which has two phases, the mapper phase, and the reducer phase. In the Mapper, the input is given in the form of a key-value pair. The output of the Mapper is fed to the reducer as input. The reducer runs only after the Mapper is over. The reducer too takes input in key-value format, and the output of reducer is the final output.To more info visit:big data online training

Steps in Map Reduce

  • The map takes data in the form of pairs and returns a list of <key, value> pairs. The keys will not be unique in this case.
Image for post
Image for post
Image for post
Image for post

Sort and Shuffle

The sort and shuffle occur on the output of Mapper and before the reducer. When the Mapper task is complete, the results are sorted by key, partitioned if there are multiple reducers, and then written to disk. Using the input from each Mapper <k2,v2>, we collect all the values for each unique key k2. This output from the shuffle phase in the form of <k2, list(v2)> is sent as input to reducer phase.

Usage of MapReduce

  • It can be used in various application like document clustering, distributed sorting, and web link-graph reversal.

Prerequisite

Before learning MapReduce, you must have the basic knowledge of Big Data.

Audience

Our MapReduce tutorial is designed to help beginners and professionals.

Problem

We assure that you will not find any problem in this MapReduce tutorial. But if there is any mistake, please post the problem in contact form.

Data Flow In MapReduce

MapReduce is used to compute the huge amount of data . To handle the upcoming data in a parallel and distributed form, the data has to flow from various phases.

Image for post
Image for post

Data Flow In MapReduce

Phases of MapReduce data flow

Input reader

The input reader reads the upcoming data and splits it into the data blocks of the appropriate size (64 MB to 128 MB). Each data block is associated with a Map function.

Once input reads the data, it generates the corresponding key-value pairs. The input files reside in HDFS.

Map function

The map function process the upcoming key-value pairs and generated the corresponding output key-value pairs. The map input and output type may be different from each other.

Partition function

The partition function assigns the output of each Map function to the appropriate reducer. The available key and value provide this function. It returns the index of reducers.

Shuffling and Sorting

The data are shuffled between/within nodes so that it moves out from the map and get ready to process for reduce function. Sometimes, the shuffling of data can take much computation time.

The sorting operation is performed on input data for Reduce function. Here, the data is compared using comparison function and arranged in a sorted form.

Reduce function

The Reduce function is assigned to each unique key. These keys are already arranged in sorted order. The values associated with the keys can iterate the Reduce and generates the corresponding output.

Output writer

Once the data flow from all the above phases, Output writer executes. The role of Output writer is to write the Reduce output to the stable storage.

MapReduce API

In this section, we focus on MapReduce APIs. Here, we learn about the classes and methods used in MapReduce programming.

MapReduce Mapper Class

In MapReduce, the role of the Mapper class is to map the input key-value pairs to a set of intermediate key-value pairs. It transforms the input records into intermediate records.

These intermediate records associated with a given output key and passed to Reducer for the final output.

MapReduce Word Count Example

In MapReduce word count example, we find out the frequency of each word. Here, the role of Mapper is to map the keys to the existing values and the role of Reducer is to aggregate the keys of common values. So, everything is represented in the form of Key-value pair.If you are intrested to learn complete course visit ITGuru’s :big data and hadoop online training

Pre-requisite

  • Java Installation — Check whether the Java is installed or not using the following command.

MapReduce Word Count Example

In MapReduce word count example, we find out the frequency of each word. Here, the role of Mapper is to map the keys to the existing values and the role of Reducer is to aggregate the keys of common values. So, everything is represented in the form of Key-value pair.

Pre-requisite

  • Java Installation — Check whether the Java is installed or not using the following command.

Steps to execute MapReduce word count example

  • Create a text file in your local machine and write some text into it.
Image for post
Image for post

MapReduce Word Count Example

  • Check the text written in the data.txt file.
Image for post
Image for post

MapReduce Word Count Example

In this example, we find out the frequency of each word exists in this text file.

  • Create a directory in HDFS, where to kept text file.
Image for post
Image for post

MapReduce Word Count Example

  • Write the MapReduce program using eclipse.

File: WC_Mapper.java

  1. package com.javatpoint;

File: WC_Reducer.java

  1. package com.javatpoint;

File: WC_Runner.java

  1. package com.javatpoint;

Download the source code.

  • Create the jar file of this program and name it countworddemo.jar.
Image for post
Image for post

MapReduce Word Count Example

  • Now execute the command to see the output.
Image for post
Image for post

MapReduce Word Count Example

To more info visit:big data hadoop course

Written by

Big Data,ios,android,Spark

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store