MongoDB Performance Tuning: Indexing and Optimization Best Practices

by Didin J. on Nov 08, 2025 MongoDB Performance Tuning: Indexing and Optimization Best Practices

MongoDB performance tuning guide on indexing, query optimization, schema design, and monitoring techniques to improve speed, efficiency, and scalability.

MongoDB can deliver high performance at scale, but only when data models, indexes, and queries are designed carefully. As applications grow, inefficient queries or poorly designed indexes can create slow responses, increased CPU usage, and unnecessary disk I/O. Performance tuning ensures that your MongoDB cluster continues to serve fast, predictable results even under heavy workloads.

This tutorial provides a practical guide to MongoDB performance tuning with a strong focus on indexing and query optimization. You will learn how MongoDB’s query planner works, how indexes affect query execution, and how to design collections and queries that scale efficiently. Each section includes clear explanations, code examples, and real-world best practices that you can apply immediately to any MongoDB project.

 

By the end of this tutorial, you will be able to:
• Identify slow queries and understand why they occur
• Build the right indexes for read-heavy and write-heavy workloads
• Optimize schemas to support fast indexing and querying
• Use MongoDB tools such as the profiler and explain plans to diagnose issues
• Apply proven techniques to reduce latency and improve throughput


Understanding How MongoDB Executes Queries

Efficient performance tuning begins with understanding how MongoDB selects and executes a query plan. MongoDB uses a cost-based query planner that evaluates available indexes, estimates their efficiency, and chooses the best plan to answer the query. Knowing how this process works helps you design indexes that the planner can use effectively.

1. Query Planner Basics

When a query is issued, MongoDB:

  1. Identifies all indexes that could satisfy the query.

  2. Generates candidate plans for each possible index.

  3. Runs a short trial phase (“plan ranking”) to evaluate performance.

  4. Selects a winning plan and caches it for subsequent queries.

MongoDB may re-plan automatically if data distributions change or if cached plans become inefficient.

2. Winning Plan vs. Rejected Plans

The explain() output contains:
winningPlan: the plan MongoDB chooses to execute.
rejectedPlans: alternative plans evaluated but not selected.

Common winning plan types:
IXSCAN: query uses an index scan
FETCH: index scan followed by document fetch
COLLSCAN: full collection scan (usually bad for performance)
SORT: in-memory sort when no suitable index exists

A COLLSCAN or in-memory SORT indicates missing or suboptimal indexes.

3. Covered Queries

A query is covered when all requested fields exist inside an index.
Benefits:
• No document fetch from disk
• Faster queries
• Lower I/O and memory usage

Example of a potentially covered query:

db.users.find(
  { email: "[email protected]" },
  { email: 1, _id: 0 }
)

If an index { email: 1 } exists, this query becomes covered.

4. Important: Explain Output Fields

Key fields to focus on:
stage: Indicates COLLSCAN, IXSCAN, SORT, FETCH, etc.
nReturned: Number of documents returned.
executionTimeMillis: Total execution time.
totalKeysExamined: Number of index entries scanned.
totalDocsExamined: Number of documents scanned.

Goal: totalDocsExamined should be as close to 0 as possible for efficient queries.

5. Practical Indicators of Problems

• High totalDocsExamined → missing or inefficient index.
• Winning plan = COLLSCAN → index not used.
SORT stage → index order not optimized.
• Large gap between totalKeysExamined and nReturned → index not selective.


Indexing Fundamentals

Indexes are the core of MongoDB performance tuning. A well-designed index can reduce query time from seconds to milliseconds, while a poorly chosen or missing index can force MongoDB to scan entire collections. This section covers the essential index types and the rules that guide their effective use.

1. Why Indexes Matter

Indexes allow MongoDB to locate data without scanning every document. They:
• Reduce query latency
• Lower CPU and disk I/O
• Support efficient sorting
• Enable covered queries
• Improve scalability under high load

However, indexes also consume memory and slow down writes, so they must be designed deliberately.

2. Single-Field Indexes

A single-field index is the simplest form:

db.users.createIndex({ email: 1 })

Use cases:
• Equality lookups (email = …)
• High-cardinality fields (many unique values)
• Supporting covered queries

Avoid indexing fields with very few unique values (e.g., active: true) unless combined with another field.

3. Compound Indexes

Compound indexes include multiple fields and are critical for real-world query optimization.

Example:

db.orders.createIndex({ userId: 1, createdAt: -1 })

Key rules:
Prefix rule: MongoDB can use any leftmost prefix of the index (e.g., { userId } works, { createdAt } does not).
• Order matters for sorting; sort direction must match index order.
• Place equality fields first, sort fields next, and range fields last.

4. Equality → Sort → Range Rule

When designing compound indexes:

  1. Equality fields

  2. Sort fields

  3. Range fields ($gt, $lt, $in, etc.)

Example query:

db.orders.find(
  { userId: 42, status: "PAID", createdAt: { $gte: ISODate("2025-01-01") } }
).sort({ createdAt: -1 })

Best index:

{ userId: 1, status: 1, createdAt: -1 }

Following this structure ensures MongoDB avoids in-memory sorts and unnecessary scans.

5. Sparse vs Partial Indexes

Sparse Indexes
Include documents only where the indexed field exists.
Useful when many documents lack the field.

db.users.createIndex({ phone: 1 }, { sparse: true })

Partial Indexes
Include documents that meet a filter expression.
More efficient and more flexible than sparse indexes.

db.users.createIndex(
  { status: 1 },
  { partialFilterExpression: { status: "ACTIVE" } }
)

Use partial indexes to reduce index size and improve performance when queries always filter by a known condition.

6. TTL (Time-to-Live) Indexes

TTL indexes automatically remove documents after a specified time.

db.logs.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 })

Common for logs, sessions, or temporary data.

7. When to Use Each Index Type

Scenario Recommended Index
Lookup by unique field Single-field index
Combined filters and sort Compound index
Large collection with many missing fields Sparse or partial index
Time-based auto-delete TTL index
Range queries with sort Compound index with range field last


Advanced Indexing Strategies

Once the fundamentals are in place, advanced indexing strategies help optimize complex queries, support specialized workloads, and ensure performance at scale. This section covers multikey, text, geospatial, hashed, and wildcard indexes—when to use them and when to avoid them.

1. Multikey Indexes

Multikey indexes support fields that contain arrays. MongoDB creates index entries for each element of the array.

Example:

db.products.createIndex({ tags: 1 })

Query example:

db.products.find({ tags: "electronics" })

Considerations:
• Only one multikey field is allowed per compound index.
• Multikey indexes can cause increased index size.
• Index order rules still apply for compound multikey indexes.

Avoid unbounded or growing arrays—they lead to index bloat and slower writes.

2. Text Indexes

Text indexes support full-text search for fields containing string content.

db.articles.createIndex({ title: "text", body: "text" })

Query:

db.articles.find({ $text: { $search: "mongodb indexing" } })

Features:
• Stemming and tokenization
• Text score sorting
• Case-insensitive search

Limitations:
• Only one text index per collection
• No support for compound text + non-text fields except for sort-by-score
• Not suitable for large-scale search; consider Atlas Search for production-level needs

3. Geospatial Indexes

MongoDB supports geospatial data with 2dsphere and 2d indexes.

Example:

db.locations.createIndex({ coordinates: "2dsphere" })

Query:

db.locations.find({
  coordinates: {
    $near: {
      $geometry: { type: "Point", coordinates: [106.8456, -6.2088] },
      $maxDistance: 1000
    }
  }
})

Use cases:
• Maps and location-based features
• Radius queries
• GeoJSON-based applications

4. Hashed Indexes

Hashed indexes hash field values and distribute them evenly. Commonly used as a shard key.

db.users.createIndex({ userId: "hashed" })

Advantages:
• Good for horizontal scaling
• Avoids hotspotting on monotonically increasing values

Limitations:
• Not usable for range queries
• Not suitable for sorting

Use hashed indexes when shard key distribution is more important than query flexibility.

5. Wildcard Indexes

Wildcard indexes support indexing arbitrary nested fields, useful for semi-structured or user-generated data.

db.events.createIndex({ "payload.$**": 1 })


Use cases:
• Dynamic schemas
• Analytics events
• Logs with variable fields

Considerations:
• Large index size
• Must be filtered or limited where possible
• Slower writes due to many index entries

Example of filtered wildcard index:

db.events.createIndex(
  { "payload.$**": 1 },
  { wildcardProjection: { "payload.meta": 1 } }
)

6. Choosing the Right Advanced Index

Scenario Best Index Type
Array fields Multikey index
Full-text search Text index
Location queries 2dsphere index
Sharded cluster with high write volume Hashed index
Flexible schema Wildcard index


Common Indexing Mistakes

Misconfigured indexes are one of the most common causes of slow MongoDB performance. Even with strong hardware, poorly designed indexes can force MongoDB to scan millions of documents, perform unnecessary in-memory sorts, or waste RAM with oversized index structures. This section highlights the most frequent mistakes and how to avoid them.

1. Over-Indexing

Every index speeds up reads but slows down writes. Inserts, updates, and deletes must update all relevant indexes.

Symptoms:
• High write latency
• High disk I/O during updates
• Large memory footprint for the index cache

Best practice:
Keep only the indexes your queries use. Remove unused indexes:

db.collection.getIndexes()
db.collection.dropIndex("indexName")

Use the profiler or explain() to verify real usage before dropping.

2. Indexes on Low-Cardinality Fields

Low-cardinality fields have few distinct values, such as:

status: "ACTIVE"
gender: "M"
isDeleted: false

Such fields are rarely selective and lead to poor performance.

Bad index example:

db.users.createIndex({ active: 1 })


If 95% of users are active, the index does not meaningfully reduce scanned keys.

Fix:
Combine low-cardinality fields with high-cardinality fields in a compound index:

db.users.createIndex({ active: 1, createdAt: -1 })

3. Misordered Compound Indexes

Order matters. Wrong ordering forces MongoDB into in-memory sorts or collection scans.

Example query:

db.orders.find({ userId: 42 }).sort({ createdAt: -1 })

Bad index:

{ createdAt: -1, userId: 1 }

Good index:

{ userId: 1, createdAt: -1 }

Follow the rule: Equality → Sort → Range.

4. Indexes on Frequently Updated Fields

When a field changes frequently, updating its index repeatedly increases write cost.

Problematic field types:
lastLogin
updatedAt
• Frequently changing status flags
• Rolling counters

If the field is not used for filtering or sorting, avoid indexing it.

5. Missing Indexes on $lookup (Join-Like Operations)

For aggregation pipelines with $lookup, both sides of the join must be indexed.

Example:

{
  $lookup: {
    from: "orders",
    localField: "userId",
    foreignField: "userId",
    as: "orders"
  }
}

Index required:

db.orders.createIndex({ userId: 1 })

Without this, MongoDB performs a full scan on the foreign collection for every document.

6. Large, Unbounded Arrays Causing Multikey Bloat

Large arrays generate multiple index entries per document.
Arrays that grow indefinitely lead to:

• Huge multikey indexes
• Slow updates due to index rebuild
• Poor write throughput

If the array grows continuously, consider:
• Capping array size
• Moving array items into a separate collection
• Using referencing instead of embedding

7. Forgetting to Analyze Query Patterns Before Indexing

A common mistake is creating indexes before understanding how the application queries the database.

Correct sequence:

  1. Observe real queries using profiler or logs

  2. Group by frequency and performance impact

  3. Create or tune indexes based on actual workload

Indexes should match query patterns, not assumptions.

8. Relying on Implicit Indexes

MongoDB automatically indexes _id.
Do not assume it helps your queries unless you explicitly query by _id.

Example mistake:

db.posts.find({ slug: "my-first-post" })

Without an index on slug, MongoDB performs COLLSCAN even though _id is indexed.

9. Using Too Many Text or Wildcard Indexes

Text and wildcard indexes are large and expensive.

Guidelines:
• Use one text index per collection
• Use filtered wildcard indexes to reduce bloat
• Prefer Atlas Search for heavy full-text workloads


Schema Optimization for Performance

Schema design has a direct impact on query speed, index efficiency, and overall memory usage. MongoDB’s flexible document model is powerful, but without careful planning, it can lead to oversized documents, inefficient index use, and slow queries. This section covers practical schema strategies to maintain high performance.

1. Embedding vs. Referencing

Choosing between embedding and referencing affects read performance, write cost, and index behavior.

Embedding (denormalization)
Store related data inside the parent document.

Benefits:
• Fewer queries
• No $lookup overhead
• Atomic updates for nested fields

Example:

{
  _id: 1,
  name: "Alice",
  address: {
    street: "Main",
    city: "Jakarta"
  }
}

Best for:
• Small, bounded subdocuments
• Data frequently accessed together

Referencing (normalized)
Store related data separately and link via an ID.

{ _id: 1, userId: 45, productId: 99 }

Benefits:
• Smaller document size
• Avoids unbounded array growth
• Independent indexing and lifecycle

Use referencing when:
• Embedded items grow continuously
• Data is accessed separately
• Many-to-many relationships exist

2. Controlling Document Size

Large documents degrade performance by increasing:
• Disk usage
• RAM requirements
• Network transfer time
• Index entry size

Avoid:
• Storing large binary blobs (store in GridFS instead)
• Unbounded nested arrays
• Excessive history or logs inside a single document

Target: Well-structured documents ≤ 16MB limit and preferably much smaller.

3. Avoiding Unbounded Arrays

Arrays are convenient but can cause write slowdowns and multikey index bloat.

Problematic patterns:
comments: [...] growing indefinitely
logs: [...] appended daily
events: [...] added real-time

Fixes:
• Use a separate collection
• Limit array size using application logic
• Store only recent items and archive older ones

Example: capping the array size to last 100 items.

4. Precomputing Fields to Improve Query Speed

Compute expensive values during writes, not during reads.

Examples:
• Storing totalPrice instead of computing from the items array
• Storing normalized or denormalized fields for fast lookup
• Adding a searchable field combining several text fields

Precomputation reduces:
• Query CPU usage
• Aggregation cost
• Need for large pipelines

5. Optimizing for Range Queries

Range queries ($gt, $lt, $gte, $lte) are common for time-series workloads.

Schema improvements:
• Use ISODate for timestamps
• Avoid storing dates as strings
• Keep range fields monotonic where possible

Index rule:
Place range fields last in compound indexes.

6. Controlling Field Cardinality

Cardinality affects index selectivity.

Guidelines:
• Use high-cardinality fields (unique or near-unique) for queries
• Avoid indexing low-cardinality fields alone
• Consider combining fields in compound indexes for better selectivity

Example:
Instead of indexing status, index { status: 1, createdAt: -1 }.

7. Using Lean, Consistent Field Types

Inconsistent field types prevent index use.

Example issue:
createdAt stored as both ISODate and string makes queries unpredictable.

Ensure:
• Consistent field types across all documents
• Avoid flexible schemas unless truly needed
• Apply validation rules via schema or schema validation

8. Schema Validation for Performance

Schema validation ensures clean, predictable documents:

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["email", "createdAt"],
      properties: {
        email: { bsonType: "string" },
        createdAt: { bsonType: "date" }
      }
    }
  }
})

Benefits:
• Prevents invalid data
• Ensures indexes remain usable
• Improves query planner consistency


Query Optimization Techniques

Optimizing queries is essential for ensuring MongoDB consistently delivers low-latency responses. Even with correct indexes and a good schema, inefficient queries can force unnecessary scans, in-memory sorts, and wasted CPU cycles. This section provides practical techniques to improve query performance.

1. Rewrite Queries to Use Indexes

Queries must match the index structure to avoid full scans.

Example:
Query:

db.users.find({ age: { $gt: 30 } })

Index:

{ age: 1 }

This is efficient.

However, if the query wraps fields unnecessarily:

db.users.find({ $expr: { $gt: ["$age", 30] } })

MongoDB cannot use the index.

Guideline:
Avoid $expr, $function, and computed expressions when an indexed field can be queried directly.

2. Avoid Leading Wildcards in $regex

Regex patterns starting with .* disable index use.

Bad:

db.products.find({ name: { $regex: ".*book" } })

Good:

db.products.find({ name: { $regex: "^book" } })

For more advanced use cases, consider:
• Storing normalized fields (lowercase, trimmed)
• Using $text index or Atlas Search

3. Use Projections to Limit Returned Fields

Returning unnecessary fields increases disk I/O and network transfer.

Example:

db.users.find(
  { active: true },
  { email: 1, name: 1, _id: 0 }
)

Benefits:
• Reduces document size
• Increases chance of covered queries
• Lowers memory usage

Avoid projection that pulls in whole subdocuments if not needed.

4. Avoid N+1 Query Patterns

An N+1 pattern occurs when your application repeatedly queries for related data inside a loop.

Example:

users.forEach((u) => {
  db.orders.find({ userId: u._id })
})

Fix via:
$lookup with proper indexing
• Denormalized fields (e.g., store last order date in user)
• Batched queries using $in

Example optimized:

db.orders.find({ userId: { $in: userIds } })

5. Keep Sorts Indexed

MongoDB requires the sort order to match an index.
If it cannot use an index for sorting, it performs an in-memory sort (up to 100MB limit).

Bad:

db.posts.find({ status: "PUBLISHED" }).sort({ likes: -1 })

If no matching index exists, MongoDB sorts in memory.

 

Fix:

db.posts.createIndex({ status: 1, likes: -1 })

6. Optimize Pagination Queries

Offset-based pagination (skip()) becomes slower as the skip count increases.

Bad:

db.posts.find().skip(50000).limit(10)

Better: Cursor-based pagination

db.posts.find({ _id: { $gt: lastId } }).limit(10)

Benefits:
• Avoids scanning skipped documents
• Faster at scale

7. Use Hinting Only for Diagnostics

hint() forces MongoDB to use an index.

Example:

db.users.find({ email: "[email protected]" }).hint({ email: 1 })

Use cases:
• Debugging index selection
• Temporary fixes during query planner misestimation

Do not use hint() permanently unless absolutely necessary—MongoDB may choose better plans over time.

8. Prefer Covered Queries When Possible

Covered queries avoid fetching documents from disk.

Example:

db.logs.find(
  { level: "ERROR" },
  { timestamp: 1, message: 1, _id: 0 }
)

Index requirement:

{ level: 1, timestamp: 1, message: 1 }

Benefits:
• Lower latency
• Reduced I/O
• Index-only read operations

9. Avoid Unnecessary Aggregation Pipelines

Aggregation is powerful but can be slower than simple finds.

Prefer:

db.users.find({ age: { $gt: 30 } })

Over:

db.users.aggregate([{ $match: { age: { $gt: 30 } } }])

Aggregation becomes necessary only when:
• Multiple transformations are needed
$project, $group, or $lookup is required


Write Performance Tuning

MongoDB write performance depends on how efficiently documents are inserted, updated, indexed, and replicated. Inefficient schemas or excessive indexes can slow down writes, while improper write configurations can increase latency or create bottlenecks. This section explains practical techniques to boost write throughput.

1. Batch Writes for Higher Throughput

Batching reduces round-trip and improves efficiency.

Example using bulk operations:

 
const bulk = db.logs.initializeUnorderedBulkOp();
bulk.insert({ level: "INFO", message: "Start" });
bulk.insert({ level: "INFO", message: "Processing" });
bulk.execute();

 

Benefits:
• Fewer network calls
• Better compression
• More predictable throughput

Applications ingesting large volumes should always batch writes.

2. Use Bulk Update Operations

Instead of performing multiple update commands one-by-one:

Bad:

ids.forEach(id => db.items.updateOne({ _id: id }, { $set: { active: true } }))

Use:

db.items.bulkWrite(
  ids.map(id => ({
    updateOne: {
      filter: { _id: id },
      update: { $set: { active: true } }
    }
  }))
)

Results:
• Lower latency
• Better batching internally
• Reduced lock contention

3. Optimize Write Concern

Write concern controls replication durability. Higher write concern = more latency.

Typical levels:
{ w: 1 } — Fastest; acknowledged by primary only
{ w: "majority" } — Safer; most durable
{ w: 0 } — Fire-and-forget (risky; not recommended)

For high-throughput ingestion pipelines:
Use { w: 1 } unless strong consistency is required.

4. Reduce Index Overhead

Indexes significantly slow down writes because every insert or update must modify all index entries.

Guidelines:
• Keep indexes minimal
• Avoid indexing frequently updated fields
• Use compound indexes instead of multiple single-field indexes

Example: replace two indexes

{ userId: 1 }
{ createdAt: -1 }

with one:

{ userId: 1, createdAt: -1 }

5. Avoid Document Growth (Update Inflation)

When updates increase document size, MongoDB may need to relocate the document to new storage, causing fragmentation and slower writes.

Patterns that cause growth:
• Appending logs to arrays
• Adding new fields frequently
• Expanding nested subdocuments

Fixes:
• Use fixed-size arrays
• Use separate collections for logs and events
• Preallocate predictable fields

6. Use $setOnInsert and Upserts Wisely

Upserts can cause extra work if misused.

Example upsert:

db.cache.updateOne(
  { key: "x" },
  { $set: { value: 42 }, $setOnInsert: { createdAt: new Date() } },
  { upsert: true }
)

Tips:
• Ensure upsert fields match indexes
• Avoid upserts on high-traffic collections unless needed
• Avoid unindexed upsert queries

7. Improve Update Selectivity

Ensure the filter portion of an update hits indexed fields.

Bad:

db.users.updateMany(
  { lastLogin: { $exists: true } }, // not indexed
  { $set: { active: true } }
)

Better:

db.users.updateMany(
  { status: "ACTIVE" }, // indexed
  { $set: { active: true } }
)

Selective updates reduce document scans and improve performance.

8. Use Time-Series Collections for High-Volume Metrics

For metrics, logs, telemetry, or IoT data, use native time-series collections.

Benefits:
• Automatically optimized internal schema
• Built-in compression
• Fast range queries
• Lower write amplification

Example:

db.createCollection("temperature", {
  timeseries: { timeField: "timestamp", metaField: "deviceId" }
})

9. Tune Journaling and Write-Ahead Logging

For self-managed clusters:
• Journaling provides durability but increases write latency
• Use filesystem barriers wisely
• Tune WiredTiger cache size for write-heavy workloads

In MongoDB Atlas, most low-level tuning is handled automatically.


Sharding and Scalability Optimization

Sharding allows MongoDB to scale horizontally by distributing data across multiple nodes. When implemented correctly, sharding maintains high throughput and balanced cluster performance. When designed poorly, it can lead to hotspots, uneven chunk distribution, and slow queries across shards. This section covers essential tuning practices for scalable, distributed MongoDB deployments.

1. Choosing the Right Shard Key

The shard key determines how data is distributed. A good shard key ensures both read and write operations scale evenly.

Characteristics of a good shard key:
High cardinality — many unique values
Even distribution — avoids hotspots
Commonly used in queries — improves targeted operations
Immutable — shard key cannot be updated after insertion

Examples of good shard keys:
{ userId: 1 } for user-centric data
{ deviceId: 1, timestamp: 1 } for time-series workloads
{ region: 1, customerId: 1 } for multi-region applications

Avoid:
• Monotonically increasing keys (e.g., timestamps alone)
• Low-cardinality fields
• Frequently updated fields

2. Hashed vs. Ranged Sharding

MongoDB supports two primary shard key strategies:

Hashed Shard Key

{ userId: "hashed" }

Use for:
• Large collections with random access patterns
• High write throughput
• Avoiding hotspots

Limitations:
• Not suitable for range queries
• Harder to perform sorted reads

Ranged Shard Key

{ createdAt: 1 }

Use for:
• Time-series workloads
• Range-based queries (>$gt, <$lt)
• Sorting on range fields

Limitations:
• Risk of hotspots if inserts always target the latest chunk
• Requires careful pre-splitting or zone configuration

3. Avoiding Hotspots

Hotspots occur when most operations hit the same shard.

Common causes:
• Ranged key based on timestamp
• Sequential auto-increment fields
• Users grouped by region
• Non-uniform traffic patterns

Mitigations:
• Use a hashed shard key for write-heavy workloads
• Use compound shard keys combining high-cardinality fields
• Pre-split chunks for sequential shard keys
• Use zone sharding for regional isolation

4. Balancer Best Practices

The balancer distributes chunks across shards. Poor configuration may cause performance degradation.

Guidelines:
• Keep balancer enabled for normal operations
• Avoid running balancer during peak traffic
• Use windowed balancing for predictable timing
• Monitor balancer activity via sh.status()

Chunk migrations cause:
• Temporary locking
• Network load
• Increased replication lag

5. Querying in a Sharded Cluster

To achieve high performance, queries should be targeted to one shard whenever possible.

Targeted query example:

db.orders.find({ userId: 123 })

If userId is the shard key, this hits only one shard.

Scatter-gather query example:

db.orders.find({ status: "PAID" })

This hits all shards and scales poorly.

Strategies to avoid scatter-gather:
• Include the shard key in queries whenever possible
• Use compound shard keys matching major query patterns
• Add supporting indexes on each shard for non-shard-key filters

6. Distributed Writes and Reads

Best practices:
• Use write concern { w: 1 } for high-throughput ingestion
• Pin read operations to appropriate nodes using read preferences
• Avoid $lookup across shards—inefficient and slow
• Ensure secondary indexes match primary indexes

7. Time-Series Sharding

For time-series workloads, use:

Shard key example:

{ meta.deviceId: 1, timestamp: 1 }

Benefits:
• Even distribution of devices across shards
• Efficient range queries
• Avoids writing concentration on the latest chunk

8. Monitoring Sharded Clusters

Monitor:
• Chunk distribution
• Chunk migration frequency
• Hot partitions
• Query scatter ratios
• Replication lag

Key tools:
• Atlas metrics dashboards
mongosh commands (sh.status(), db.currentOp())
• Query profiler
• FTDC diagnostic data


Monitoring Tools and Diagnostics

Effective MongoDB performance tuning depends on continuous monitoring. MongoDB provides built-in tools to analyze query execution, track slow operations, inspect resource usage, and identify bottlenecks. This section covers the essential monitoring and diagnostic tools you should use regularly.

1. Database Profiler

The profiler captures slow operations and detailed query performance data.

Enable slow operation logging:

db.setProfilingLevel(1, { slowms: 50 }) // logs operations slower than 50ms

Profiler levels:
0 — Off
1 — Slow operations only
2 — All operations (use with caution)

Inspect profiler output:

db.system.profile.find().sort({ ts: -1 }).limit(5)

Use the profiler to:
• Identify slow queries
• Detect missing indexes
• Understand actual query patterns in production

2. explain() for Query Diagnostics

explain() reveals how MongoDB executes a query.

Example:

db.orders.find({ userId: 123 }).explain("executionStats")

Key metrics:
totalDocsExamined
totalKeysExamined
executionTimeMillis
• Winning plan vs rejected plans

Goal:
totalDocsExamined should be near zero for indexed queries.

3. db.currentOp() for Real-Time Analysis

Shows currently running operations.

db.currentOp()

Useful for:
• Identifying long-running queries
• Finding blocked or stuck operations
• Inspecting lock usage

You can also terminate an operation:

db.killOp(opid)

4. Slow Query Log

MongoDB logs slow operations automatically.

Enable diagnostic logging (self-managed clusters):

systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true
  verbosity: 0

Look for entries containing:
COLLSCAN
SORT
• High-duration queries

These logs are critical for production tuning.

5. FTDC (Full-Time Diagnostic Data Capture)

FTDC collects performance metrics at regular intervals.

Captured metrics include:
• CPU, memory, disk usage
• WiredTiger cache metrics
• I/O latency
• Replication metrics

Tools use FTDC:
• Compass
• Atlas UI
• MongoDB Diagnostic Archive

FTDC provides long-term historical insight for capacity planning.

6. Atlas Monitoring Dashboards

If using MongoDB Atlas, built-in monitoring includes:
• Real-time and historical charts
• Slow query analyzer
• Automatic performance recommendations
• Index usage metrics
• Query profiler integrated UI

Key charts to watch:
• Opcounters (reads/writes)
• CPU usage
• Disk IOPS
• Replication lag
• Cache usage (WiredTiger)

7. Monitoring Index Usage

Identify unused indexes:

db.collection.getIndexes()
db.collection.aggregate([
  { $indexStats: {} }
])

Look for:
accesses.ops near zero
• Indexes larger than the actual collection
• Indexes are slowing write performance

Unused indexes should be removed to improve write throughput.

8. Memory and Cache Diagnostics

The WiredTiger engine caches data and indexes for rapid access.

Inspect cache stats:

db.serverStatus().wiredTiger.cache

Key indicators:
• High eviction rate → cache pressure
• Dirty bytes → write backlogs
• Pages read into cache → insufficient working set memory

If cache pressure is high:
• Reduce index count
• Optimize schema
• Scale instance memory

9. Lock Diagnostics

Check lock status:

db.serverStatus().locks

High lock wait times indicate:
• Write-heavy workloads
• Long-running queries
• Large document updates

Solutions:
• Improve indexing
• Reduce document size
• Break large updates into batches

10. End-to-End Performance Baseline

Track these metrics over time:
• Query latency
• Throughput (ops/sec)
• CPU/disk usage
• Cache hit ratio
• Index effectiveness
• Slow query frequency
• Replication lag

A performance baseline helps detect regressions early.


Real-World Examples

This section illustrates how indexing and optimization techniques apply to real workloads. Each example demonstrates a common performance issue, how to diagnose it, and the exact steps taken to fix it.

1. Slow Query Diagnosis and Fix

Scenario
A query on the orders collection takes 800ms:

db.orders.find({ customerId: 123 }).sort({ createdAt: -1 }).limit(20)

Diagnosis
explain("executionStats") output shows:

  • totalDocsExamined: 240,000

  • totalKeysExamined: 240,000

  • Stage: COLLSCAN → no index

  • In-memory sort occurs

Cause
Missing compound index matching filter + sort.

Fix

db.orders.createIndex({ customerId: 1, createdAt: -1 })

Outcome

  • totalDocsExamined: 0

  • totalKeysExamined: ~20

  • Execution time: <5ms

This is the classic Equality → Sort index pattern.

2. Index Redesign Improves Dashboard Queries

Scenario
Analytics dashboard queries are slow:

db.logs.find({
  level: "ERROR",
  createdAt: { $gte: ISODate("2025-01-01") }
})
.sort({ createdAt: -1 })
.limit(50)

Current indexes:

{ level: 1 }
{ createdAt: 1 }

Diagnosis
Planner uses { createdAt: 1 }, but still scans thousands of docs and does an in-memory sort.

Fix
Use a single compound index:

db.logs.createIndex({ level: 1, createdAt: -1 })

Remove older redundant indexes:

db.logs.dropIndex({ level: 1 })
db.logs.dropIndex({ createdAt: 1 })

Outcome
Query is now fully indexed and sortable with near-zero docs examined.

 

3. Schema Refactor Fixes Array Bloat

Scenario
A users collection stores login history:

{
  "_id": 1,
  "email": "[email protected]",
  "loginHistory": ["2025-01-02", "2025-01-03", ...]
}

Over time, loginHistory becomes unbounded (thousands of entries), causing:

  • Large document size

  • Slow updates

  • Large multikey index

Fix
Move login events to a separate logins collection:

{
  userId: 1,
  timestamp: ISODate(...)
}

Indexes:

db.logins.createIndex({ userId: 1, timestamp: -1 })

Outcome:

  • Users documents become small and fast to update

  • Login queries remain efficient

  • No multikey index bloat

4. Rewriting a Query to Enable Index Use

Scenario
Product search endpoint uses:

db.products.find({ $expr: { $gt: ["$price", 100] } })

price has an index, but $expr disables it.

Fix
Rewrite query:

db.products.find({ price: { $gt: 100 } })

Result:
Index is used, and scan drops from 100k docs to a few hundred.

5. Fixing a Scatter-Gather Query in a Sharded Cluster

Scenario
On a sharded cluster, this query is slow:

db.payments.find({ status: "SUCCESS" })

Sharded on { userId: 1 }, so the query is broadcast to all shards.

Fix:
Modify API to include shard key:

db.payments.find({ userId: 42, status: "SUCCESS" })

To support combined filtering, create an index:

db.payments.createIndex({ userId: 1, status: 1 })

Outcome:
Query becomes targeted (single shard) and latency drops dramatically.


Conclusion and Next Steps

Effective MongoDB performance tuning requires a combination of well-designed indexes, efficient query patterns, and a schema that supports your application’s real-world workloads. By understanding how MongoDB’s query planner works, building the right indexes, optimizing read and write operations, and monitoring cluster behavior continuously, you can ensure predictable, high-performance data access at any scale.

Key takeaways:
• Use the Equality → Sort → Range rule for compound index design.
• Avoid over-indexing and remove unused indexes to improve write throughput.
• Keep schemas lean—avoid unbounded arrays, large documents, and inconsistent field types.
• Diagnose slow queries using explain, the profiler, and slow query logs.
• Use batching, bulk writes, and appropriate write concerns to increase write performance.
• Choose shard keys that provide high cardinality, even distribution, and targeted queries.
• Monitor cache, IOPS, replication lag, and index statistics to maintain long-term performance.

Next steps:
• Benchmark your queries using explain("executionStats").
• Review existing indexes and drop those not used.
• Evaluate schema consistency and adjust where documents are too large or too dynamic.
• Enable profiling on non-production environments to identify slow queries.
• Use Atlas Performance Advisor or similar tools to discover missing indexes.
• Document your key query patterns and align index design with real application access patterns.

That's just the basics. If you need more deep learning about MongoDB, you can take the following cheap course:

Thanks!