Aggregation framework and Map Reduce in Mongodb

Sandeshjain
3 min readSep 7, 2022

In MongoDB, map-reduce is a data processing programming model that helps to perform operations on large data sets and produce aggregated results.

MongoDB provides the mapReduce() function to perform the map-reduce operations.This function has two main functions, i.e., map function and reduce function.The map function is used to group all the data based on the key-value and the reduce function is used to perform operations on the mapped data. So, the data is independently mapped and reduced in different spaces and then combined together in the function and the result will save to the specified new collection.

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.

This mapReduce() function generally operated on large data sets only. Using Map Reduce you can perform aggregation operations such as max, avg on the data using some key and it is similar to group By in SQL. It performs on data independently and parallel.

Let’s try to understand the mapReduce() using the following example:

In this example, we have eight records from which we need to take out the “maximum marks of each section” and the keys are id, sec, marks and sub.

Here we need to find the maximum physics marks in each section. So, our key by which we will group documents is the sec key and the value will be marks. Inside the map function, we use emit(this.sec, this.marks) function, and we will return the sec and marks of each record(document) from the emit function. This is similar to group By MySQL.

var map = function(){ emit({section: this.sec},this.marks)};

After iterating over each document Emit function will give back the data like this:

{“A”:[90,95]},{“B”:[80,87,95]},{“C”:[80,89,97]}

and up to this point it is what map() function does. The data given by emit function is grouped by sec key, Now this data will be input to our reduce function. Reduce function is where actual aggregation of data takes place. In our example we will pick the Max of each section like for sec A:[90,95] = 95 (Max) B:[80,87,95] = 95 (max) , C:[80,89,97] =97(max)

var reduce = function(section,marks){return {maximum_marks:Math.max.apply(null,marks)};};

Here in reduce() function, we have reduced the records now we will output them into a new collection.{out :”collectionName”}

Syntax:

db.collectionName.mapReduce( map(),reduce(),query{},output{});

map() function: It uses emit() function in which it takes two parameters key and value key. Here the key is on which we make groups like groups by in MySQL.

Example like group by ages or names and the second parameter is on which aggregation is performed like avg(), sum() is calculated on.

reduce() function: It is the step in which we perform our aggregate function like avg(), sum().

query: Here we will pass the query to filter the resultset.

In this, we will specify the collection name where the result will be stored.

db.user_collection.mapReduce(map,reduce,{out:”output”});

In the above query we have already defined the map, reduce. Then for checking we need to look into the newly created collection we can use the query db.collectionName.find() we get

--

--