MongoDB Aggregation Pipeline Tutorial with Real-World Examples

by Didin J. on Jun 15, 2025 MongoDB Aggregation Pipeline Tutorial with Real-World Examples

Learn MongoDB Aggregation Pipeline with real-world examples. Master $match, $group, $lookup, $unwind, and more to build powerful data analytics reports.

MongoDB is one of the most popular NoSQL databases for modern applications due to its flexible schema and powerful querying capabilities. Among its most powerful features is the Aggregation Pipeline, which allows developers to process data records and return computed results, making it ideal for building reports, analytics dashboards, and real-time data transformations.

In this tutorial, we’ll walk you through the fundamentals of the MongoDB aggregation pipeline and demonstrate its application to real-world use cases using practical examples. Whether you're working with product sales, user data, or nested arrays, you’ll learn how to transform and analyze your data more effectively.


What is the MongoDB Aggregation Pipeline?

The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline where each stage transforms the documents as they pass through.

Instead of writing complex queries or logic in your application code, you can perform operations like filtering, grouping, sorting, and joining directly in MongoDB using stages such as:

  • $match: Filter documents.

  • $group: Group by a field and perform accumulations (e.g., sum, count).

  • $project: Shape the output documents.

  • $sort: Order results.

  • $lookup: Perform joins between collections.

The aggregation pipeline is efficient, expressive, and perfect for reporting use cases.


Understanding Aggregation Stages

Here’s a quick overview of the commonly used stages in the aggregation pipeline:

Stage Description
$match Filters documents based on a condition, similar to the find() query.
$group Groups documents by a specified field and can calculate aggregate values like $sum, $avg, $max.
$project Includes, excludes, or reshapes document fields.
$sort Orders the documents by specified fields.
$limit Restricts the number of documents passed to the next stage.
$skip Skips the first N documents.
$unwind Deconstructs an array field into multiple documents.
$lookup Performs a left outer join with another collection.

Example: Simple Pipeline

db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$customerId", totalSpent: { $sum: "$amount" } } },
  { $sort: { totalSpent: -1 } }
]);

This pipeline filters completed orders, groups them by customer, and calculates the total spend for each customer.


Setting Up the Project

To follow along with this tutorial, you’ll need a basic development environment with MongoDB installed and a sample dataset to experiment with.

Requirements

  • MongoDB installed locally or access to MongoDB Atlas (cloud version)

  • MongoDB Compass (optional GUI for browsing data)

  • Mongo Shell or a MongoDB driver (Node.js, Python, etc.)

  • A JSON or BSON dataset to import

Sample Dataset: E-Commerce Orders

We’ll use a sample dataset simulating an e-commerce platform. The orders collection might look like this:

{
  "_id": ObjectId("..."),
  "orderId": "ORD001",
  "customerId": "CUST123",
  "orderDate": ISODate("2024-12-10T10:15:00Z"),
  "status": "completed",
  "amount": 150.75,
  "items": [
    { "productId": "PROD001", "quantity": 2, "price": 30.00 },
    { "productId": "PROD005", "quantity": 1, "price": 90.75 }
  ]
}

You can also create a customers collection if you'd like to use $lookup later:

{
  "_id": ObjectId("..."),
  "customerId": "CUST123",
  "name": "Alice Johnson",
  "email": "[email protected]",
  "joinedDate": ISODate("2023-01-15T12:00:00Z")
}

Importing Data into MongoDB

If you're using MongoDB locally, you can import JSON data using the following command:

mongoimport --db ecommerce --collection orders --file orders.json --jsonArray

If you're using MongoDB Atlas:

  1. Use MongoDB Compass to connect to your cluster.

  2. Navigate to the ecommerce database.

  3. Create the orders and customers collections.

  4. Use the "Add Data" → "Import File" option to upload your JSON files.


Real-World Example 1: Sales Report by Product

Once the data is in place, you're ready to start building powerful aggregations. In the next section, we’ll begin with a common reporting example using $group and $sort.

One of the most common use cases for aggregation is generating a sales report, such as calculating total revenue per product.

Let’s say you're the orders collection has an embedded items array, with each item representing a product purchased, its quantity, and price. To get a summary of total sales per product, we can use the following stages:

Goal:

  • Extract each product from the items array

  • Multiply quantity × price to get the subtotal

  • Group by productId and sum up the totals

  • Sort the result by total sales in descending order

Aggregation Pipeline

db.orders.aggregate([
  { $unwind: "$items" },
  {
    $project: {
      productId: "$items.productId",
      subtotal: { $multiply: ["$items.quantity", "$items.price"] }
    }
  },
  {
    $group: {
      _id: "$productId",
      totalSales: { $sum: "$subtotal" }
    }
  },
  { $sort: { totalSales: -1 } }
]);

Explanation

Stage Description
$unwind Flattens the items array so each item becomes its document
$project Calculates the subtotal (price × quantity) for each item
$group Group data by productId and sums the subtotal values
$sort Sorts the result by totalSales in descending order

Sample Output

[
  { "_id": "PROD005", "totalSales": 4520.75 },
  { "_id": "PROD001", "totalSales": 3760.00 },
  { "_id": "PROD003", "totalSales": 2150.50 }
]

This aggregation gives you a ranked list of top-selling products by revenue, perfect for dashboards, analytics reports, or business decision-making.


Real-World Example 2: Monthly User Registrations

Tracking user growth over time is a common requirement in analytics dashboards. Using the aggregation pipeline, we can group users by registration month and count how many joined in each period.

Let’s assume we have a customers collection where each document includes a joinedDate field.

Goal:

  • Filter users based on a timeframe (e.g., past 12 months)

  • Format the date to "YYYY-MM" using $dateToString

  • Group by month and count the users

  • Sort by date in ascending order

Aggregation Pipeline

db.customers.aggregate([
  {
    $match: {
      joinedDate: {
        $gte: ISODate("2024-06-01T00:00:00Z")
      }
    }
  },
  {
    $project: {
      month: { $dateToString: { format: "%Y-%m", date: "$joinedDate" } }
    }
  },
  {
    $group: {
      _id: "$month",
      registrations: { $sum: 1 }
    }
  },
  { $sort: { _id: 1 } }
]);

Explanation

Stage Description
$match Filters users who registered after a given date
$project Converts joinedDate to a "YYYY-MM" format string
$group Groups users by month and counts them
$sort Orders the result chronologically by month

Sample Output

[
  { "_id": "2024-06", "registrations": 15 },
  { "_id": "2024-07", "registrations": 23 },
  { "_id": "2024-08", "registrations": 31 },
  { "_id": "2024-09", "registrations": 18 }
]

This output can easily be visualized in charts to understand user trends, spikes, or drops in activity.

✅ Tip: You can customize the date range in $match to show weekly or daily stats using %Y-%m-%d or use $week, $year, and $month fields if needed.


Real-World Example 3: Joining Collections with $lookup

In relational databases, joining tables is a standard practice. MongoDB supports a similar concept using the $lookup stage, which lets you join documents from different collections, such as linking orders with customers.

Let’s say you have:

  • An orders collection containing customerId fields

  • A customers collection containing detailed customer info

You want to enrich each order with the customer’s name and email.

Goal:

  • Join orders.customerId with customers.customerId

  • Merge matching customer data into each order

  • Optionally reshape the output

Aggregation Pipeline

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "customerId",
      as: "customer"
    }
  },
  {
    $unwind: "$customer"
  },
  {
    $project: {
      orderId: 1,
      orderDate: 1,
      amount: 1,
      "customer.name": 1,
      "customer.email": 1
    }
  }
]);

Explanation

Stage Description
$lookup Joins each order with its matching customer using customerId
$unwind Converts the joined customer array into a flat object
$project Selects fields to include in the final result

Sample Output

{
  "orderId": "ORD001",
  "orderDate": "2024-12-10T10:15:00Z",
  "amount": 150.75,
  "customer": {
    "name": "Alice Johnson",
    "email": "[email protected]"
  }
}

This enriched result is useful for:

  • Generating full invoices

  • Displaying customer info in admin dashboards

  • Performing deeper analytics (e.g., customer lifetime value)

✅ Note: If multiple customers could share a customerId, you'd need to handle multiple results, but in most cases customerId is unique.


Real-World Example 4: Nested Data and $unwind

MongoDB allows arrays inside documents, which is great for flexibility, but analyzing or filtering items inside those arrays often requires flattening them. That’s where the $unwind stage becomes powerful.

Let’s say your orders collection includes an items array, where each item represents a product with a productId, quantity, and price.

Goal:

  • Break each item in the items array into its own document

  • Analyze product performance (e.g., total quantity sold per product)

Aggregation Pipeline

db.orders.aggregate([
  { $unwind: "$items" },
  {
    $group: {
      _id: "$items.productId",
      totalQuantity: { $sum: "$items.quantity" },
      totalRevenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } }
    }
  },
  { $sort: { totalRevenue: -1 } }
]);

Explanation

Stage Description
$unwind Deconstructs the items array so each item becomes a separate document
$group Aggregates quantity and revenue by productId
$sort Orders the result by total revenue (highest to lowest)

Sample Output

[
  {
    "_id": "PROD005",
    "totalQuantity": 120,
    "totalRevenue": 10920.00
  },
  {
    "_id": "PROD003",
    "totalQuantity": 88,
    "totalRevenue": 7920.00
  }
]

This kind of analysis is essential for:

  • Inventory planning

  • Identifying best-selling products

  • Revenue breakdowns by product

🧠 Tip: You can also apply $match after $unwind to analyze specific product categories or apply date filters.


Advanced Aggregation Tips

As you build more complex pipelines, MongoDB provides advanced operators and stages to handle multi-dimensional queries, conditional logic, and performance optimizations.

Let’s look at some powerful techniques to level up your aggregations.

Use $facet for Multi-Query Pipelines

$facet allows you to run multiple sub-pipelines in parallel on the same input set—ideal for dashboards that need multiple summaries at once.

Example: Total sales and top 5 products in one query
db.orders.aggregate([
  { $unwind: "$items" },
  {
    $facet: {
      totalRevenue: [
        {
          $group: {
            _id: null,
            revenue: {
              $sum: { $multiply: ["$items.quantity", "$items.price"] }
            }
          }
        }
      ],
      topProducts: [
        {
          $group: {
            _id: "$items.productId",
            totalSold: { $sum: "$items.quantity" }
          }
        },
        { $sort: { totalSold: -1 } },
        { $limit: 5 }
      ]
    }
  }
]);

Use $cond for Conditional Aggregation

You can use $cond to apply conditional logic inside stages, like $project or $group.

Example: Tag high-value orders
db.orders.aggregate([
  {
    $project: {
      orderId: 1,
      amount: 1,
      highValue: {
        $cond: { if: { $gt: ["$amount", 500] }, then: true, else: false }
      }
    }
  }
]);

Performance Tips for Aggregation

  • Use Indexes Early: Try to place $match as early as possible to leverage indexes.

  • Avoid $project with all fields unless needed: It can increase processing overhead.

  • Limit Output: Use $limit after $sort to reduce memory usage.

  • Use $merge or $out to write results to a new collection for caching or batch processing.

Example: Cache results into a collection
db.orders.aggregate([
  /* your pipeline here */,
  { $merge: { into: "cachedReports", whenMatched: "merge", whenNotMatched: "insert" } }
]);

📌 Note: $merge is available in MongoDB 4.2 and later.

With these tools, you can build high-performance, scalable data aggregation workflows suitable for complex analytics, dashboards, or scheduled reports.


Tools for Working with Aggregations

While the MongoDB aggregation framework is powerful, writing and testing pipelines manually can be complex, especially as they grow. Fortunately, several tools can simplify development, debugging, and visualization.

1. MongoDB Compass

MongoDB Compass is MongoDB’s official GUI. It includes a built-in aggregation pipeline builder with a visual interface.

Features:

  • Step-by-step stage previews

  • Auto-complete and syntax suggestions

  • Ability to save and share pipelines

  • Export to JSON or shell syntax

✅ Ideal for testing pipelines before using them in production code.

2. MongoDB Atlas Aggregation Builder

If you're using MongoDB Atlas, the cloud interface includes a visual Aggregation Pipeline Builder similar to Compass but within the browser.

Bonus: You can test aggregations directly on live data in your cluster.

3. Playground in MongoDB for VS Code

The MongoDB for VS Code extension allows you to:

  • Connect to your database

  • Run queries and aggregations inside .mongodb playground files

  • View real-time results inside your editor

Great for developers who prefer staying inside their IDE.

4. Online Aggregation Builders and Translators

Some useful third-party tools:

5. Using Drivers and ORMs

When integrating aggregation into code, use official drivers for:

  • Node.js (Mongoose) – Supports raw aggregation with .aggregate() method.

  • Python (PyMongo) – Use collection.aggregate() with Python syntax.

  • Java, Go, C#, Rust – All support aggregation pipelines via native syntax.

💡 Pro Tip: For frequently-used reports, cache aggregation results in a new collection using $merge to improve performance.


Conclusion and Best Practices

MongoDB’s aggregation pipeline is a powerful and flexible tool for data transformation, analysis, and reporting. Whether you’re building dashboards, generating analytics, or preparing data for machine learning, mastering aggregation opens the door to building fast, real-time insights directly from your database.

Key Takeaways

  • Aggregation Stages like $match, $group, $project, and $sort form the foundation of most pipelines.

  • Nested Data can be flattened using $unwind to make analysis easier.

  • Joins between collections are possible using $lookup, bringing relational-style power to document databases.

  • Date Handling with $dateToString or $year, $month makes time-based reports a breeze.

  • Advanced Features like $facet, $cond, and $merge help optimize complex analytics and support real-world use cases.

Best Practices

  1. Filter Early: Use $match as soon as possible in the pipeline to reduce the number of documents processed in later stages.

  2. Limit Fields: Use $project to include only necessary fields, especially when working with large documents.

  3. Avoid Unbounded $group: Grouping on unindexed or high-cardinality fields can affect performance.

  4. Use $merge or $out: For complex pipelines that run frequently, write results to a cache collection to improve performance.

  5. Test with Real Data: Use MongoDB Compass or the Atlas Aggregation Builder to validate pipelines visually before deploying them.

  6. Index Strategically: Ensure fields used in $match or $lookup are indexed to speed up pipeline execution.

MongoDB’s aggregation framework may seem intimidating at first, but with the right tools and patterns, it becomes a powerful asset in your backend or analytics stack.

Keep exploring more operators and pipeline stages through MongoDB’s official documentation to push the boundaries even further.

You can find those examples on our GitHub.

That's just the basics. If you need more deep learning about MongoDB or related, you can take the following cheap course:

Thanks!