MongoDB can deliver high performance at scale, but only when data models, indexes, and queries are designed carefully. As applications grow, inefficient queries or poorly designed indexes can create slow responses, increased CPU usage, and unnecessary disk I/O. Performance tuning ensures that your MongoDB cluster continues to serve fast, predictable results even under heavy workloads.
This tutorial provides a practical guide to MongoDB performance tuning with a strong focus on indexing and query optimization. You will learn how MongoDB’s query planner works, how indexes affect query execution, and how to design collections and queries that scale efficiently. Each section includes clear explanations, code examples, and real-world best practices that you can apply immediately to any MongoDB project.
By the end of this tutorial, you will be able to:
• Identify slow queries and understand why they occur
• Build the right indexes for read-heavy and write-heavy workloads
• Optimize schemas to support fast indexing and querying
• Use MongoDB tools such as the profiler and explain plans to diagnose issues
• Apply proven techniques to reduce latency and improve throughput
Understanding How MongoDB Executes Queries
Efficient performance tuning begins with understanding how MongoDB selects and executes a query plan. MongoDB uses a cost-based query planner that evaluates available indexes, estimates their efficiency, and chooses the best plan to answer the query. Knowing how this process works helps you design indexes that the planner can use effectively.
1. Query Planner Basics
When a query is issued, MongoDB:
-
Identifies all indexes that could satisfy the query.
-
Generates candidate plans for each possible index.
-
Runs a short trial phase (“plan ranking”) to evaluate performance.
-
Selects a winning plan and caches it for subsequent queries.
MongoDB may re-plan automatically if data distributions change or if cached plans become inefficient.
2. Winning Plan vs. Rejected Plans
The explain() output contains:
• winningPlan: the plan MongoDB chooses to execute.
• rejectedPlans: alternative plans evaluated but not selected.
Common winning plan types:
• IXSCAN: query uses an index scan
• FETCH: index scan followed by document fetch
• COLLSCAN: full collection scan (usually bad for performance)
• SORT: in-memory sort when no suitable index exists
A COLLSCAN or in-memory SORT indicates missing or suboptimal indexes.
3. Covered Queries
A query is covered when all requested fields exist inside an index.
Benefits:
• No document fetch from disk
• Faster queries
• Lower I/O and memory usage
Example of a potentially covered query:
db.users.find(
{ email: "[email protected]" },
{ email: 1, _id: 0 }
)
If an index { email: 1 } exists, this query becomes covered.
4. Important: Explain Output Fields
Key fields to focus on:
• stage: Indicates COLLSCAN, IXSCAN, SORT, FETCH, etc.
• nReturned: Number of documents returned.
• executionTimeMillis: Total execution time.
• totalKeysExamined: Number of index entries scanned.
• totalDocsExamined: Number of documents scanned.
Goal: totalDocsExamined should be as close to 0 as possible for efficient queries.
5. Practical Indicators of Problems
• High totalDocsExamined → missing or inefficient index.
• Winning plan = COLLSCAN → index not used.
• SORT stage → index order not optimized.
• Large gap between totalKeysExamined and nReturned → index not selective.
Indexing Fundamentals
Indexes are the core of MongoDB performance tuning. A well-designed index can reduce query time from seconds to milliseconds, while a poorly chosen or missing index can force MongoDB to scan entire collections. This section covers the essential index types and the rules that guide their effective use.
1. Why Indexes Matter
Indexes allow MongoDB to locate data without scanning every document. They:
• Reduce query latency
• Lower CPU and disk I/O
• Support efficient sorting
• Enable covered queries
• Improve scalability under high load
However, indexes also consume memory and slow down writes, so they must be designed deliberately.
2. Single-Field Indexes
A single-field index is the simplest form:
db.users.createIndex({ email: 1 })
Use cases:
• Equality lookups (email = …)
• High-cardinality fields (many unique values)
• Supporting covered queries
Avoid indexing fields with very few unique values (e.g., active: true) unless combined with another field.
3. Compound Indexes
Compound indexes include multiple fields and are critical for real-world query optimization.
Example:
db.orders.createIndex({ userId: 1, createdAt: -1 })
Key rules:
• Prefix rule: MongoDB can use any leftmost prefix of the index (e.g., { userId } works, { createdAt } does not).
• Order matters for sorting; sort direction must match index order.
• Place equality fields first, sort fields next, and range fields last.
4. Equality → Sort → Range Rule
When designing compound indexes:
-
Equality fields
-
Sort fields
-
Range fields (
$gt,$lt,$in, etc.)
Example query:
db.orders.find(
{ userId: 42, status: "PAID", createdAt: { $gte: ISODate("2025-01-01") } }
).sort({ createdAt: -1 })
Best index:
{ userId: 1, status: 1, createdAt: -1 }
Following this structure ensures MongoDB avoids in-memory sorts and unnecessary scans.
5. Sparse vs Partial Indexes
Sparse Indexes
Include documents only where the indexed field exists.
Useful when many documents lack the field.
db.users.createIndex({ phone: 1 }, { sparse: true })
Partial Indexes
Include documents that meet a filter expression.
More efficient and more flexible than sparse indexes.
db.users.createIndex(
{ status: 1 },
{ partialFilterExpression: { status: "ACTIVE" } }
)
Use partial indexes to reduce index size and improve performance when queries always filter by a known condition.
6. TTL (Time-to-Live) Indexes
TTL indexes automatically remove documents after a specified time.
db.logs.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 })
Common for logs, sessions, or temporary data.
7. When to Use Each Index Type
| Scenario | Recommended Index |
|---|---|
| Lookup by unique field | Single-field index |
| Combined filters and sort | Compound index |
| Large collection with many missing fields | Sparse or partial index |
| Time-based auto-delete | TTL index |
| Range queries with sort | Compound index with range field last |
Advanced Indexing Strategies
Once the fundamentals are in place, advanced indexing strategies help optimize complex queries, support specialized workloads, and ensure performance at scale. This section covers multikey, text, geospatial, hashed, and wildcard indexes—when to use them and when to avoid them.
1. Multikey Indexes
Multikey indexes support fields that contain arrays. MongoDB creates index entries for each element of the array.
Example:
db.products.createIndex({ tags: 1 })
Query example:
db.products.find({ tags: "electronics" })
Considerations:
• Only one multikey field is allowed per compound index.
• Multikey indexes can cause increased index size.
• Index order rules still apply for compound multikey indexes.
Avoid unbounded or growing arrays—they lead to index bloat and slower writes.
2. Text Indexes
Text indexes support full-text search for fields containing string content.
db.articles.createIndex({ title: "text", body: "text" })
Query:
db.articles.find({ $text: { $search: "mongodb indexing" } })
Features:
• Stemming and tokenization
• Text score sorting
• Case-insensitive search
Limitations:
• Only one text index per collection
• No support for compound text + non-text fields except for sort-by-score
• Not suitable for large-scale search; consider Atlas Search for production-level needs
3. Geospatial Indexes
MongoDB supports geospatial data with 2dsphere and 2d indexes.
Example:
db.locations.createIndex({ coordinates: "2dsphere" })
Query:
db.locations.find({
coordinates: {
$near: {
$geometry: { type: "Point", coordinates: [106.8456, -6.2088] },
$maxDistance: 1000
}
}
})
Use cases:
• Maps and location-based features
• Radius queries
• GeoJSON-based applications
4. Hashed Indexes
Hashed indexes hash field values and distribute them evenly. Commonly used as a shard key.
db.users.createIndex({ userId: "hashed" })
Advantages:
• Good for horizontal scaling
• Avoids hotspotting on monotonically increasing values
Limitations:
• Not usable for range queries
• Not suitable for sorting
Use hashed indexes when shard key distribution is more important than query flexibility.
5. Wildcard Indexes
Wildcard indexes support indexing arbitrary nested fields, useful for semi-structured or user-generated data.
db.events.createIndex({ "payload.$**": 1 })
Use cases:
• Dynamic schemas
• Analytics events
• Logs with variable fields
Considerations:
• Large index size
• Must be filtered or limited where possible
• Slower writes due to many index entries
Example of filtered wildcard index:
db.events.createIndex(
{ "payload.$**": 1 },
{ wildcardProjection: { "payload.meta": 1 } }
)
6. Choosing the Right Advanced Index
| Scenario | Best Index Type |
|---|---|
| Array fields | Multikey index |
| Full-text search | Text index |
| Location queries | 2dsphere index |
| Sharded cluster with high write volume | Hashed index |
| Flexible schema | Wildcard index |
Common Indexing Mistakes
Misconfigured indexes are one of the most common causes of slow MongoDB performance. Even with strong hardware, poorly designed indexes can force MongoDB to scan millions of documents, perform unnecessary in-memory sorts, or waste RAM with oversized index structures. This section highlights the most frequent mistakes and how to avoid them.
1. Over-Indexing
Every index speeds up reads but slows down writes. Inserts, updates, and deletes must update all relevant indexes.
Symptoms:
• High write latency
• High disk I/O during updates
• Large memory footprint for the index cache
Best practice:
Keep only the indexes your queries use. Remove unused indexes:
db.collection.getIndexes()
db.collection.dropIndex("indexName")
Use the profiler or explain() to verify real usage before dropping.
2. Indexes on Low-Cardinality Fields
Low-cardinality fields have few distinct values, such as:
• status: "ACTIVE"
• gender: "M"
• isDeleted: false
Such fields are rarely selective and lead to poor performance.
Bad index example:
db.users.createIndex({ active: 1 })
If 95% of users are active, the index does not meaningfully reduce scanned keys.
Fix:
Combine low-cardinality fields with high-cardinality fields in a compound index:
db.users.createIndex({ active: 1, createdAt: -1 })
3. Misordered Compound Indexes
Order matters. Wrong ordering forces MongoDB into in-memory sorts or collection scans.
Example query:
db.orders.find({ userId: 42 }).sort({ createdAt: -1 })
Bad index:
{ createdAt: -1, userId: 1 }
Good index:
{ userId: 1, createdAt: -1 }
Follow the rule: Equality → Sort → Range.
4. Indexes on Frequently Updated Fields
When a field changes frequently, updating its index repeatedly increases write cost.
Problematic field types:
• lastLogin
• updatedAt
• Frequently changing status flags
• Rolling counters
If the field is not used for filtering or sorting, avoid indexing it.
5. Missing Indexes on $lookup (Join-Like Operations)
For aggregation pipelines with $lookup, both sides of the join must be indexed.
Example:
{
$lookup: {
from: "orders",
localField: "userId",
foreignField: "userId",
as: "orders"
}
}
Index required:
db.orders.createIndex({ userId: 1 })
Without this, MongoDB performs a full scan on the foreign collection for every document.
6. Large, Unbounded Arrays Causing Multikey Bloat
Large arrays generate multiple index entries per document.
Arrays that grow indefinitely lead to:
• Huge multikey indexes
• Slow updates due to index rebuild
• Poor write throughput
If the array grows continuously, consider:
• Capping array size
• Moving array items into a separate collection
• Using referencing instead of embedding
7. Forgetting to Analyze Query Patterns Before Indexing
A common mistake is creating indexes before understanding how the application queries the database.
Correct sequence:
-
Observe real queries using profiler or logs
-
Group by frequency and performance impact
-
Create or tune indexes based on actual workload
Indexes should match query patterns, not assumptions.
8. Relying on Implicit Indexes
MongoDB automatically indexes _id.
Do not assume it helps your queries unless you explicitly query by _id.
Example mistake:
db.posts.find({ slug: "my-first-post" })
Without an index on slug, MongoDB performs COLLSCAN even though _id is indexed.
9. Using Too Many Text or Wildcard Indexes
Text and wildcard indexes are large and expensive.
Guidelines:
• Use one text index per collection
• Use filtered wildcard indexes to reduce bloat
• Prefer Atlas Search for heavy full-text workloads
Schema Optimization for Performance
Schema design has a direct impact on query speed, index efficiency, and overall memory usage. MongoDB’s flexible document model is powerful, but without careful planning, it can lead to oversized documents, inefficient index use, and slow queries. This section covers practical schema strategies to maintain high performance.
1. Embedding vs. Referencing
Choosing between embedding and referencing affects read performance, write cost, and index behavior.
Embedding (denormalization)
Store related data inside the parent document.
Benefits:
• Fewer queries
• No $lookup overhead
• Atomic updates for nested fields
Example:
{
_id: 1,
name: "Alice",
address: {
street: "Main",
city: "Jakarta"
}
}
Best for:
• Small, bounded subdocuments
• Data frequently accessed together
Referencing (normalized)
Store related data separately and link via an ID.
{ _id: 1, userId: 45, productId: 99 }
Benefits:
• Smaller document size
• Avoids unbounded array growth
• Independent indexing and lifecycle
Use referencing when:
• Embedded items grow continuously
• Data is accessed separately
• Many-to-many relationships exist
2. Controlling Document Size
Large documents degrade performance by increasing:
• Disk usage
• RAM requirements
• Network transfer time
• Index entry size
Avoid:
• Storing large binary blobs (store in GridFS instead)
• Unbounded nested arrays
• Excessive history or logs inside a single document
Target: Well-structured documents ≤ 16MB limit and preferably much smaller.
3. Avoiding Unbounded Arrays
Arrays are convenient but can cause write slowdowns and multikey index bloat.
Problematic patterns:
• comments: [...] growing indefinitely
• logs: [...] appended daily
• events: [...] added real-time
Fixes:
• Use a separate collection
• Limit array size using application logic
• Store only recent items and archive older ones
Example: capping the array size to last 100 items.
4. Precomputing Fields to Improve Query Speed
Compute expensive values during writes, not during reads.
Examples:
• Storing totalPrice instead of computing from the items array
• Storing normalized or denormalized fields for fast lookup
• Adding a searchable field combining several text fields
Precomputation reduces:
• Query CPU usage
• Aggregation cost
• Need for large pipelines
5. Optimizing for Range Queries
Range queries ($gt, $lt, $gte, $lte) are common for time-series workloads.
Schema improvements:
• Use ISODate for timestamps
• Avoid storing dates as strings
• Keep range fields monotonic where possible
Index rule:
Place range fields last in compound indexes.
6. Controlling Field Cardinality
Cardinality affects index selectivity.
Guidelines:
• Use high-cardinality fields (unique or near-unique) for queries
• Avoid indexing low-cardinality fields alone
• Consider combining fields in compound indexes for better selectivity
Example:
Instead of indexing status, index { status: 1, createdAt: -1 }.
7. Using Lean, Consistent Field Types
Inconsistent field types prevent index use.
Example issue:
createdAt stored as both ISODate and string makes queries unpredictable.
Ensure:
• Consistent field types across all documents
• Avoid flexible schemas unless truly needed
• Apply validation rules via schema or schema validation
8. Schema Validation for Performance
Schema validation ensures clean, predictable documents:
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "createdAt"],
properties: {
email: { bsonType: "string" },
createdAt: { bsonType: "date" }
}
}
}
})
Benefits:
• Prevents invalid data
• Ensures indexes remain usable
• Improves query planner consistency
Query Optimization Techniques
Optimizing queries is essential for ensuring MongoDB consistently delivers low-latency responses. Even with correct indexes and a good schema, inefficient queries can force unnecessary scans, in-memory sorts, and wasted CPU cycles. This section provides practical techniques to improve query performance.
1. Rewrite Queries to Use Indexes
Queries must match the index structure to avoid full scans.
Example:
Query:
db.users.find({ age: { $gt: 30 } })
Index:
{ age: 1 }
This is efficient.
However, if the query wraps fields unnecessarily:
db.users.find({ $expr: { $gt: ["$age", 30] } })
MongoDB cannot use the index.
Guideline:
Avoid $expr, $function, and computed expressions when an indexed field can be queried directly.
2. Avoid Leading Wildcards in $regex
Regex patterns starting with .* disable index use.
Bad:
db.products.find({ name: { $regex: ".*book" } })
Good:
db.products.find({ name: { $regex: "^book" } })
For more advanced use cases, consider:
• Storing normalized fields (lowercase, trimmed)
• Using $text index or Atlas Search
3. Use Projections to Limit Returned Fields
Returning unnecessary fields increases disk I/O and network transfer.
Example:
db.users.find(
{ active: true },
{ email: 1, name: 1, _id: 0 }
)
Benefits:
• Reduces document size
• Increases chance of covered queries
• Lowers memory usage
Avoid projection that pulls in whole subdocuments if not needed.
4. Avoid N+1 Query Patterns
An N+1 pattern occurs when your application repeatedly queries for related data inside a loop.
Example:
users.forEach((u) => {
db.orders.find({ userId: u._id })
})
Fix via:
• $lookup with proper indexing
• Denormalized fields (e.g., store last order date in user)
• Batched queries using $in
Example optimized:
db.orders.find({ userId: { $in: userIds } })
5. Keep Sorts Indexed
MongoDB requires the sort order to match an index.
If it cannot use an index for sorting, it performs an in-memory sort (up to 100MB limit).
Bad:
db.posts.find({ status: "PUBLISHED" }).sort({ likes: -1 })
If no matching index exists, MongoDB sorts in memory.
Fix:
db.posts.createIndex({ status: 1, likes: -1 })
6. Optimize Pagination Queries
Offset-based pagination (skip()) becomes slower as the skip count increases.
Bad:
db.posts.find().skip(50000).limit(10)
Better: Cursor-based pagination
db.posts.find({ _id: { $gt: lastId } }).limit(10)
Benefits:
• Avoids scanning skipped documents
• Faster at scale
7. Use Hinting Only for Diagnostics
hint() forces MongoDB to use an index.
Example:
db.users.find({ email: "[email protected]" }).hint({ email: 1 })
Use cases:
• Debugging index selection
• Temporary fixes during query planner misestimation
Do not use hint() permanently unless absolutely necessary—MongoDB may choose better plans over time.
8. Prefer Covered Queries When Possible
Covered queries avoid fetching documents from disk.
Example:
db.logs.find(
{ level: "ERROR" },
{ timestamp: 1, message: 1, _id: 0 }
)
Index requirement:
{ level: 1, timestamp: 1, message: 1 }
Benefits:
• Lower latency
• Reduced I/O
• Index-only read operations
9. Avoid Unnecessary Aggregation Pipelines
Aggregation is powerful but can be slower than simple finds.
Prefer:
db.users.find({ age: { $gt: 30 } })
Over:
db.users.aggregate([{ $match: { age: { $gt: 30 } } }])
Aggregation becomes necessary only when:
• Multiple transformations are needed
• $project, $group, or $lookup is required
Write Performance Tuning
MongoDB write performance depends on how efficiently documents are inserted, updated, indexed, and replicated. Inefficient schemas or excessive indexes can slow down writes, while improper write configurations can increase latency or create bottlenecks. This section explains practical techniques to boost write throughput.
1. Batch Writes for Higher Throughput
Batching reduces round-trip and improves efficiency.
Example using bulk operations:
const bulk = db.logs.initializeUnorderedBulkOp();
bulk.insert({ level: "INFO", message: "Start" });
bulk.insert({ level: "INFO", message: "Processing" });
bulk.execute();
Benefits:
• Fewer network calls
• Better compression
• More predictable throughput
Applications ingesting large volumes should always batch writes.
2. Use Bulk Update Operations
Instead of performing multiple update commands one-by-one:
Bad:
ids.forEach(id => db.items.updateOne({ _id: id }, { $set: { active: true } }))
Use:
db.items.bulkWrite(
ids.map(id => ({
updateOne: {
filter: { _id: id },
update: { $set: { active: true } }
}
}))
)
Results:
• Lower latency
• Better batching internally
• Reduced lock contention
3. Optimize Write Concern
Write concern controls replication durability. Higher write concern = more latency.
Typical levels:
• { w: 1 } — Fastest; acknowledged by primary only
• { w: "majority" } — Safer; most durable
• { w: 0 } — Fire-and-forget (risky; not recommended)
For high-throughput ingestion pipelines:
Use { w: 1 } unless strong consistency is required.
4. Reduce Index Overhead
Indexes significantly slow down writes because every insert or update must modify all index entries.
Guidelines:
• Keep indexes minimal
• Avoid indexing frequently updated fields
• Use compound indexes instead of multiple single-field indexes
Example: replace two indexes
{ userId: 1 }
{ createdAt: -1 }
with one:
{ userId: 1, createdAt: -1 }
5. Avoid Document Growth (Update Inflation)
When updates increase document size, MongoDB may need to relocate the document to new storage, causing fragmentation and slower writes.
Patterns that cause growth:
• Appending logs to arrays
• Adding new fields frequently
• Expanding nested subdocuments
Fixes:
• Use fixed-size arrays
• Use separate collections for logs and events
• Preallocate predictable fields
6. Use $setOnInsert and Upserts Wisely
Upserts can cause extra work if misused.
Example upsert:
db.cache.updateOne(
{ key: "x" },
{ $set: { value: 42 }, $setOnInsert: { createdAt: new Date() } },
{ upsert: true }
)
Tips:
• Ensure upsert fields match indexes
• Avoid upserts on high-traffic collections unless needed
• Avoid unindexed upsert queries
7. Improve Update Selectivity
Ensure the filter portion of an update hits indexed fields.
Bad:
db.users.updateMany(
{ lastLogin: { $exists: true } }, // not indexed
{ $set: { active: true } }
)
Better:
db.users.updateMany(
{ status: "ACTIVE" }, // indexed
{ $set: { active: true } }
)
Selective updates reduce document scans and improve performance.
8. Use Time-Series Collections for High-Volume Metrics
For metrics, logs, telemetry, or IoT data, use native time-series collections.
Benefits:
• Automatically optimized internal schema
• Built-in compression
• Fast range queries
• Lower write amplification
Example:
db.createCollection("temperature", {
timeseries: { timeField: "timestamp", metaField: "deviceId" }
})
9. Tune Journaling and Write-Ahead Logging
For self-managed clusters:
• Journaling provides durability but increases write latency
• Use filesystem barriers wisely
• Tune WiredTiger cache size for write-heavy workloads
In MongoDB Atlas, most low-level tuning is handled automatically.
Sharding and Scalability Optimization
Sharding allows MongoDB to scale horizontally by distributing data across multiple nodes. When implemented correctly, sharding maintains high throughput and balanced cluster performance. When designed poorly, it can lead to hotspots, uneven chunk distribution, and slow queries across shards. This section covers essential tuning practices for scalable, distributed MongoDB deployments.
1. Choosing the Right Shard Key
The shard key determines how data is distributed. A good shard key ensures both read and write operations scale evenly.
Characteristics of a good shard key:
• High cardinality — many unique values
• Even distribution — avoids hotspots
• Commonly used in queries — improves targeted operations
• Immutable — shard key cannot be updated after insertion
Examples of good shard keys:
• { userId: 1 } for user-centric data
• { deviceId: 1, timestamp: 1 } for time-series workloads
• { region: 1, customerId: 1 } for multi-region applications
Avoid:
• Monotonically increasing keys (e.g., timestamps alone)
• Low-cardinality fields
• Frequently updated fields
2. Hashed vs. Ranged Sharding
MongoDB supports two primary shard key strategies:
Hashed Shard Key
{ userId: "hashed" }
Use for:
• Large collections with random access patterns
• High write throughput
• Avoiding hotspots
Limitations:
• Not suitable for range queries
• Harder to perform sorted reads
Ranged Shard Key
{ createdAt: 1 }
Use for:
• Time-series workloads
• Range-based queries (>$gt, <$lt)
• Sorting on range fields
Limitations:
• Risk of hotspots if inserts always target the latest chunk
• Requires careful pre-splitting or zone configuration
3. Avoiding Hotspots
Hotspots occur when most operations hit the same shard.
Common causes:
• Ranged key based on timestamp
• Sequential auto-increment fields
• Users grouped by region
• Non-uniform traffic patterns
Mitigations:
• Use a hashed shard key for write-heavy workloads
• Use compound shard keys combining high-cardinality fields
• Pre-split chunks for sequential shard keys
• Use zone sharding for regional isolation
4. Balancer Best Practices
The balancer distributes chunks across shards. Poor configuration may cause performance degradation.
Guidelines:
• Keep balancer enabled for normal operations
• Avoid running balancer during peak traffic
• Use windowed balancing for predictable timing
• Monitor balancer activity via sh.status()
Chunk migrations cause:
• Temporary locking
• Network load
• Increased replication lag
5. Querying in a Sharded Cluster
To achieve high performance, queries should be targeted to one shard whenever possible.
Targeted query example:
db.orders.find({ userId: 123 })
If userId is the shard key, this hits only one shard.
Scatter-gather query example:
db.orders.find({ status: "PAID" })
This hits all shards and scales poorly.
Strategies to avoid scatter-gather:
• Include the shard key in queries whenever possible
• Use compound shard keys matching major query patterns
• Add supporting indexes on each shard for non-shard-key filters
6. Distributed Writes and Reads
Best practices:
• Use write concern { w: 1 } for high-throughput ingestion
• Pin read operations to appropriate nodes using read preferences
• Avoid $lookup across shards—inefficient and slow
• Ensure secondary indexes match primary indexes
7. Time-Series Sharding
For time-series workloads, use:
Shard key example:
{ meta.deviceId: 1, timestamp: 1 }
Benefits:
• Even distribution of devices across shards
• Efficient range queries
• Avoids writing concentration on the latest chunk
8. Monitoring Sharded Clusters
Monitor:
• Chunk distribution
• Chunk migration frequency
• Hot partitions
• Query scatter ratios
• Replication lag
Key tools:
• Atlas metrics dashboards
• mongosh commands (sh.status(), db.currentOp())
• Query profiler
• FTDC diagnostic data
Monitoring Tools and Diagnostics
Effective MongoDB performance tuning depends on continuous monitoring. MongoDB provides built-in tools to analyze query execution, track slow operations, inspect resource usage, and identify bottlenecks. This section covers the essential monitoring and diagnostic tools you should use regularly.
1. Database Profiler
The profiler captures slow operations and detailed query performance data.
Enable slow operation logging:
db.setProfilingLevel(1, { slowms: 50 }) // logs operations slower than 50ms
Profiler levels:
• 0 — Off
• 1 — Slow operations only
• 2 — All operations (use with caution)
Inspect profiler output:
db.system.profile.find().sort({ ts: -1 }).limit(5)
Use the profiler to:
• Identify slow queries
• Detect missing indexes
• Understand actual query patterns in production
2. explain() for Query Diagnostics
explain() reveals how MongoDB executes a query.
Example:
db.orders.find({ userId: 123 }).explain("executionStats")
Key metrics:
• totalDocsExamined
• totalKeysExamined
• executionTimeMillis
• Winning plan vs rejected plans
Goal:
totalDocsExamined should be near zero for indexed queries.
3. db.currentOp() for Real-Time Analysis
Shows currently running operations.
db.currentOp()
Useful for:
• Identifying long-running queries
• Finding blocked or stuck operations
• Inspecting lock usage
You can also terminate an operation:
db.killOp(opid)
4. Slow Query Log
MongoDB logs slow operations automatically.
Enable diagnostic logging (self-managed clusters):
systemLog:
destination: file
path: /var/log/mongodb/mongod.log
logAppend: true
verbosity: 0
Look for entries containing:
• COLLSCAN
• SORT
• High-duration queries
These logs are critical for production tuning.
5. FTDC (Full-Time Diagnostic Data Capture)
FTDC collects performance metrics at regular intervals.
Captured metrics include:
• CPU, memory, disk usage
• WiredTiger cache metrics
• I/O latency
• Replication metrics
Tools use FTDC:
• Compass
• Atlas UI
• MongoDB Diagnostic Archive
FTDC provides long-term historical insight for capacity planning.
6. Atlas Monitoring Dashboards
If using MongoDB Atlas, built-in monitoring includes:
• Real-time and historical charts
• Slow query analyzer
• Automatic performance recommendations
• Index usage metrics
• Query profiler integrated UI
Key charts to watch:
• Opcounters (reads/writes)
• CPU usage
• Disk IOPS
• Replication lag
• Cache usage (WiredTiger)
7. Monitoring Index Usage
Identify unused indexes:
db.collection.getIndexes()
db.collection.aggregate([
{ $indexStats: {} }
])
Look for:
• accesses.ops near zero
• Indexes larger than the actual collection
• Indexes are slowing write performance
Unused indexes should be removed to improve write throughput.
8. Memory and Cache Diagnostics
The WiredTiger engine caches data and indexes for rapid access.
Inspect cache stats:
db.serverStatus().wiredTiger.cache
Key indicators:
• High eviction rate → cache pressure
• Dirty bytes → write backlogs
• Pages read into cache → insufficient working set memory
If cache pressure is high:
• Reduce index count
• Optimize schema
• Scale instance memory
9. Lock Diagnostics
Check lock status:
db.serverStatus().locks
High lock wait times indicate:
• Write-heavy workloads
• Long-running queries
• Large document updates
Solutions:
• Improve indexing
• Reduce document size
• Break large updates into batches
10. End-to-End Performance Baseline
Track these metrics over time:
• Query latency
• Throughput (ops/sec)
• CPU/disk usage
• Cache hit ratio
• Index effectiveness
• Slow query frequency
• Replication lag
A performance baseline helps detect regressions early.
Real-World Examples
This section illustrates how indexing and optimization techniques apply to real workloads. Each example demonstrates a common performance issue, how to diagnose it, and the exact steps taken to fix it.
1. Slow Query Diagnosis and Fix
Scenario
A query on the orders collection takes 800ms:
db.orders.find({ customerId: 123 }).sort({ createdAt: -1 }).limit(20)
Diagnosis
explain("executionStats") output shows:
-
totalDocsExamined: 240,000 -
totalKeysExamined: 240,000 -
Stage:
COLLSCAN→ no index -
In-memory sort occurs
Cause
Missing compound index matching filter + sort.
Fix
db.orders.createIndex({ customerId: 1, createdAt: -1 })
Outcome
-
totalDocsExamined: 0 -
totalKeysExamined: ~20 -
Execution time: <5ms
This is the classic Equality → Sort index pattern.
2. Index Redesign Improves Dashboard Queries
Scenario
Analytics dashboard queries are slow:
db.logs.find({
level: "ERROR",
createdAt: { $gte: ISODate("2025-01-01") }
})
.sort({ createdAt: -1 })
.limit(50)
Current indexes:
{ level: 1 }
{ createdAt: 1 }
Diagnosis
Planner uses { createdAt: 1 }, but still scans thousands of docs and does an in-memory sort.
Fix
Use a single compound index:
db.logs.createIndex({ level: 1, createdAt: -1 })
Remove older redundant indexes:
db.logs.dropIndex({ level: 1 })
db.logs.dropIndex({ createdAt: 1 })
Outcome
Query is now fully indexed and sortable with near-zero docs examined.
3. Schema Refactor Fixes Array Bloat
Scenario
A users collection stores login history:
{
"_id": 1,
"email": "[email protected]",
"loginHistory": ["2025-01-02", "2025-01-03", ...]
}
Over time, loginHistory becomes unbounded (thousands of entries), causing:
-
Large document size
-
Slow updates
-
Large multikey index
Fix
Move login events to a separate logins collection:
{
userId: 1,
timestamp: ISODate(...)
}
Indexes:
db.logins.createIndex({ userId: 1, timestamp: -1 })
Outcome:
-
Users documents become small and fast to update
-
Login queries remain efficient
-
No multikey index bloat
4. Rewriting a Query to Enable Index Use
Scenario
Product search endpoint uses:
db.products.find({ $expr: { $gt: ["$price", 100] } })
price has an index, but $expr disables it.
Fix
Rewrite query:
db.products.find({ price: { $gt: 100 } })
Result:
Index is used, and scan drops from 100k docs to a few hundred.
5. Fixing a Scatter-Gather Query in a Sharded Cluster
Scenario
On a sharded cluster, this query is slow:
db.payments.find({ status: "SUCCESS" })
Sharded on { userId: 1 }, so the query is broadcast to all shards.
Fix:
Modify API to include shard key:
db.payments.find({ userId: 42, status: "SUCCESS" })
To support combined filtering, create an index:
db.payments.createIndex({ userId: 1, status: 1 })
Outcome:
Query becomes targeted (single shard) and latency drops dramatically.
Conclusion and Next Steps
Effective MongoDB performance tuning requires a combination of well-designed indexes, efficient query patterns, and a schema that supports your application’s real-world workloads. By understanding how MongoDB’s query planner works, building the right indexes, optimizing read and write operations, and monitoring cluster behavior continuously, you can ensure predictable, high-performance data access at any scale.
Key takeaways:
• Use the Equality → Sort → Range rule for compound index design.
• Avoid over-indexing and remove unused indexes to improve write throughput.
• Keep schemas lean—avoid unbounded arrays, large documents, and inconsistent field types.
• Diagnose slow queries using explain, the profiler, and slow query logs.
• Use batching, bulk writes, and appropriate write concerns to increase write performance.
• Choose shard keys that provide high cardinality, even distribution, and targeted queries.
• Monitor cache, IOPS, replication lag, and index statistics to maintain long-term performance.
Next steps:
• Benchmark your queries using explain("executionStats").
• Review existing indexes and drop those not used.
• Evaluate schema consistency and adjust where documents are too large or too dynamic.
• Enable profiling on non-production environments to identify slow queries.
• Use Atlas Performance Advisor or similar tools to discover missing indexes.
• Document your key query patterns and align index design with real application access patterns.
That's just the basics. If you need more deep learning about MongoDB, you can take the following cheap course:
- MongoDB - The Complete Developer's Guide 2025
- The Complete MongoDB Course
- Node.js, Express, MongoDB & More: The Complete Bootcamp
- MongoDB - The Ultimate Administration and Developer's Guide
- MongoDB Masterclass: Excel in NoSQL & Pass Certification!
- MongoDB Associate Developer Exam - Practice Tests
- MongoDB Associate Database Administrator DBA Exam - Tests
- MongoDB - Learn NoSQL Databases - Complete Bootcamp
Thanks!
