Home >>MongoDB Tutorial >MongoDB Aggregation

MongoDB Aggregation

MongoDB Aggregation

Aggregation tasks process records of data and return estimated results. Aggregation operations group values together from multiple documents, and may perform a variety of operations to return a single result on the grouped data. An alternative to MongoDB aggregation is in SQL count (*) and group by.

The aggregate () Method

You can use the aggregate () method when aggregate in MongoDB.

Syntax

The basic aggregate () method syntax is as follows –

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

Example

In the collection you have the following data −

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by_user: 'phptpoint',
   url: 'http://www.phptpoint.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview', 
   description: 'No sql database is very fast',
   by_user: 'phptpoint',
   url: 'http://www.phptpoint.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview', 
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

If you want to show a list of how many tutorials are written by each user from the above set, then you can use the following aggregate() method –

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{ "_id" : "phptpoint", "num_tutorial" : 2 }
{ "_id" : "Neo4j", "num_tutorial" : 1 }
>

For the above use case, the Sql equivalent query will be selected by user, count (*) by user from mycol group.

We have grouped documents by field by user in the above example and the previous total value is increased for each occurrence of the user value. The following is a list of available expressions for aggregation.

Expression Description Example
$sum From all documents in the set, sum up the given value. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg The average of all the values given from all the documents in the set is calculated. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min Obtains a minimum of the corresponding values from all the documents in the collection. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max Gets the full value of all documents in the collection for the respective values. db.mycol.aggregate([{$group : {_id : "$by_user", db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push Inserts the value in the resulting document in an array. db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet Inserts the value in the resulting document to an array, but does not produce duplicates. db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first Gets the first document according to grouping from the source documents. Usually, with some previously implemented "$sort"-stage, this only makes sense. db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last Get the last document, by sorting, from the source documents. Usually, with some previously implemented "$sort"-stage, this only makes sense. db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

Pipeline Concept

The shell pipeline in the UNIX command implies the ability to perform an operation on some input and use the output as the input for the next command, etc. MongoDB also supports the same definition within the framework of aggregation. There is a collection of possible phases, each of which is taken as an input to a set of documents and produces the resulting set of documents (or the final resulting JSON document at the end of the pipeline). This can then be used for the next stage and so on, in turn.

The possible steps in the aggregation framework are below.

$project −Used to select a list of certain specific fields.

$match -This is a filtering operation and can therefore reduce the amount of documents that are given to the next stage as input.

$group − This is the actual aggregation, as discussed above. $group.

$sort − Sort documents.

$skip − With this, for a given number of documents it is possible to skip forward in the document list.

$limit −This limits the amount of documents to look at, starting from the current positions, by the specified number.

$unwind − This is used to unwind documents that use arrays. The data is kind of pre-joined by using an array, and this operation will be undone for this to provide individual documents again. With this phase, we will also increase the number of documents for the next step.


No Sidebar ads