Database Indexing Strategies: A Performance Optimization Guide
Master database indexing with practical strategies for query optimization. Learn when to use B-tree, hash, and composite indexes, understand index trade-offs, and implement effective indexing patterns for high-performance databases.
Database queries running slowly? Before scaling up your hardware or switching databases, understanding indexing strategies can often improve query performance by 100x or more. This comprehensive guide explores practical indexing techniques, common pitfalls, and real-world optimization strategies that every developer should master.
Indexing is one of the most powerful yet frequently misunderstood tools in database performance optimization. A well-designed indexing strategy can transform a query that takes minutes into one that completes in milliseconds. However, poorly implemented indexes can actually degrade performance and waste storage space.
Understanding Database Indexes: The Fundamentals
At its core, a database index works like a book’s index. It provides a fast lookup mechanism to locate data without scanning every row. When you query a table without an index, the database performs a full table scan, examining every single row. With an appropriate index, the database can jump directly to relevant rows.
How Indexes Work Under the Hood
Most databases use B-tree (balanced tree) structures for indexes. Here’s what happens when you query an indexed column:
- Without Index: Database scans all rows sequentially (O(n) complexity)
- With Index: Database traverses the B-tree structure (O(log n) complexity)
For a table with 1 million rows:
- Full table scan: ~1,000,000 operations
- Indexed lookup: ~20 operations (log₂ 1,000,000 ≈ 20)
This fundamental difference explains why proper indexing can yield 100x-1000x performance improvements.
Index Types and When to Use Them
Different index types serve different purposes. Understanding these variations is crucial for effective optimization.
1. B-tree Indexes: The Default Workhorse
What They Are: Balanced tree structures that maintain sorted data and support range queries.
Best For:
- Equality comparisons (
WHERE user_id = 123) - Range queries (
WHERE created_at > '2025-01-01') - Sorting operations (
ORDER BY last_name) - Pattern matching with leading wildcards (
WHERE email LIKE 'john%')
Implementation Example:
-- Create a B-tree index on user email
CREATE INDEX idx_users_email ON users(email);
-- This query will use the index efficiently
SELECT * FROM users WHERE email = '[email protected]';
-- Range query also benefits
SELECT * FROM users WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31';
Performance Characteristics:
- Insert/Update Cost: O(log n)
- Lookup Cost: O(log n)
- Storage Overhead: ~10-20% of table size
2. Hash Indexes: Fast Exact Matches
What They Are: Hash-based structures optimized for exact equality comparisons.
Best For:
- Exact match queries only
- High-cardinality columns (many unique values)
- Scenarios where range queries are never needed
Implementation Example:
-- PostgreSQL hash index
CREATE INDEX idx_users_api_token_hash ON users USING HASH (api_token);
-- Perfect for exact lookups
SELECT * FROM users WHERE api_token = 'abc123xyz';
-- NOT suitable for ranges or pattern matching
-- These won't use the hash index:
SELECT * FROM users WHERE api_token LIKE 'abc%'; -- Won't use index
SELECT * FROM users WHERE api_token > 'abc'; -- Won't use index
Performance Characteristics:
- Lookup Cost: O(1) average case
- Not suitable for range queries
- Smaller than B-tree indexes
When to Choose Hash Over B-tree:
- You only perform equality checks (never ranges)
- Storage space is constrained
- The column has very high cardinality
3. Composite (Multi-Column) Indexes
What They Are: Indexes spanning multiple columns, creating a hierarchical lookup structure.
Best For:
- Queries filtering on multiple columns
- Queries with specific column order requirements
- Covering indexes that eliminate table lookups
Implementation Example:
-- Composite index on user queries
CREATE INDEX idx_users_country_city_age
ON users(country, city, age);
-- These queries can use the index effectively:
-- Full match (all columns)
SELECT * FROM users
WHERE country = 'USA' AND city = 'New York' AND age = 25;
-- Prefix match (leftmost columns)
SELECT * FROM users
WHERE country = 'USA' AND city = 'New York';
-- Leftmost column only
SELECT * FROM users
WHERE country = 'USA';
-- These queries CANNOT use the index:
-- Missing leftmost column
SELECT * FROM users WHERE city = 'New York'; -- Won't use index
-- Non-contiguous columns
SELECT * FROM users WHERE country = 'USA' AND age = 25; -- Partial use only
The Leftmost Prefix Rule: Composite indexes work left-to-right. You can use them for queries that match the leftmost columns, but not for queries that skip columns.
Column Ordering Strategy:
- Equality filters first: Columns with
=conditions - High selectivity: Columns that filter out the most rows
- Range filters last: Columns with
>,<,BETWEEN
-- Good ordering: equality → high selectivity → range
CREATE INDEX idx_orders_status_customer_date
ON orders(status, customer_id, created_at);
-- Optimal for this common query pattern:
SELECT * FROM orders
WHERE status = 'pending'
AND customer_id = 123
AND created_at > '2025-01-01';
4. Partial Indexes: Targeted Optimization
What They Are: Indexes that only cover rows matching a specific condition.
Best For:
- Queries that consistently filter on the same condition
- Tables with skewed data distribution
- Reducing index size and maintenance cost
Implementation Example:
-- Only index active users (90% of queries target active users)
CREATE INDEX idx_users_active_email
ON users(email)
WHERE status = 'active';
-- This query uses the smaller, faster partial index
SELECT * FROM users
WHERE status = 'active' AND email = '[email protected]';
-- Example with nulls
CREATE INDEX idx_orders_pending_customer
ON orders(customer_id)
WHERE status = 'pending' AND shipped_at IS NULL;
Benefits:
- Smaller index size (less disk I/O)
- Faster index scans
- Reduced maintenance overhead
- Lower storage costs
Use Cases:
- Active/inactive records (index active only)
- Soft deletes (index non-deleted records)
- Status-based queries (index pending orders)
5. Covering Indexes: Eliminating Table Lookups
What They Are: Indexes that include all columns needed by a query, eliminating the need to access the table.
Best For:
- Queries accessing the same small set of columns repeatedly
- Read-heavy workloads
- Reducing I/O operations
Implementation Example:
-- Standard index
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Covering index includes additional columns
CREATE INDEX idx_orders_customer_covering
ON orders(customer_id)
INCLUDE (order_date, total_amount, status);
-- This query can be satisfied entirely from the index
SELECT order_date, total_amount, status
FROM orders
WHERE customer_id = 123;
-- No table lookup needed!
Performance Impact:
- Can reduce query time by 50-80%
- Particularly effective for queries returning many rows
- Trade-off: larger index size
Real-World Indexing Patterns
Let’s explore common application scenarios and optimal indexing strategies.
Pattern 1: User Lookup and Authentication
Scenario: Application frequently looks up users by email for authentication.
-- Table structure
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL,
password_hash VARCHAR(255),
status VARCHAR(20),
last_login_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW()
);
-- Optimal indexing strategy
CREATE UNIQUE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_status_last_login
ON users(status, last_login_at)
WHERE status = 'active';
Rationale:
UNIQUEon email prevents duplicates and provides fast lookups- Partial index on active users optimizes dashboard queries
- Composite index supports “recent active users” queries
Pattern 2: Time-Series Data Queries
Scenario: Logs or events table with frequent range queries.
-- Table structure
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT,
event_type VARCHAR(50),
created_at TIMESTAMP DEFAULT NOW(),
data JSONB
);
-- Optimal indexing strategy
CREATE INDEX idx_events_user_created
ON events(user_id, created_at DESC);
CREATE INDEX idx_events_type_created
ON events(event_type, created_at DESC)
WHERE created_at > NOW() - INTERVAL '90 days';
Rationale:
- Composite index supports user-specific queries with date ranges
DESCordering optimizes “recent events first” queries- Partial index reduces size by focusing on recent data
Pattern 3: E-commerce Product Search
Scenario: Product catalog with filtering by category, price, and availability.
-- Table structure
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255),
category_id INTEGER,
price DECIMAL(10,2),
stock_quantity INTEGER,
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT NOW()
);
-- Optimal indexing strategy
-- For category browsing
CREATE INDEX idx_products_category_price
ON products(category_id, price)
WHERE is_active = true AND stock_quantity > 0;
-- For full-text search (PostgreSQL)
CREATE INDEX idx_products_name_gin
ON products USING GIN (to_tsvector('english', name));
-- For admin queries
CREATE INDEX idx_products_active_stock
ON products(is_active, stock_quantity);
Rationale:
- Partial index excludes inactive/out-of-stock products from customer queries
- GIN index enables fast full-text search
- Separate index for administrative queries
Pattern 4: Social Media Timeline
Scenario: Fetching posts for user’s feed with followers/following relationships.
-- Relationships table
CREATE TABLE follows (
follower_id BIGINT,
following_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (follower_id, following_id)
);
-- Posts table
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT,
content TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Optimal indexing strategy
CREATE INDEX idx_follows_follower ON follows(follower_id, created_at DESC);
CREATE INDEX idx_follows_following ON follows(following_id, created_at DESC);
CREATE INDEX idx_posts_user_created ON posts(user_id, created_at DESC);
Rationale:
- Both directions of follow relationship are indexed
- Posts ordered by date support chronological feeds
- Enables efficient “posts from people I follow” queries
Common Indexing Mistakes and How to Avoid Them
Mistake 1: Over-Indexing
Problem: Creating indexes on every column or every query pattern.
Impact:
- Slows down INSERT/UPDATE/DELETE operations
- Wastes storage space
- Index maintenance overhead
- Query optimizer confusion (too many index choices)
Solution:
-- Bad: Index every column
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_first_name ON users(first_name);
CREATE INDEX idx_users_last_name ON users(last_name);
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_users_status ON users(status);
-- Good: Index based on actual query patterns
CREATE INDEX idx_users_email ON users(email); -- Frequent exact lookups
CREATE INDEX idx_users_status_created
ON users(status, created_at)
WHERE status = 'active'; -- Common query pattern
Guidelines:
- Analyze actual query patterns before creating indexes
- Monitor index usage and remove unused indexes
- Focus on columns in WHERE, JOIN, and ORDER BY clauses
- Limit indexes to 5-7 per table in most cases
Mistake 2: Wrong Column Order in Composite Indexes
Problem: Creating composite indexes without considering query patterns and column selectivity.
Impact:
- Index cannot be used for many queries
- Wasted storage and maintenance cost
Example:
-- Bad: Low selectivity column first
CREATE INDEX idx_orders_status_customer
ON orders(status, customer_id);
-- Only 3-4 distinct statuses, but thousands of customers
-- Good: High selectivity column first
CREATE INDEX idx_orders_customer_status
ON orders(customer_id, status);
-- Filters to specific customer first, then status
Column Ordering Best Practices:
- Equality conditions before range conditions
- High selectivity before low selectivity
- Most frequently queried columns first
Mistake 3: Ignoring Index Maintenance
Problem: Creating indexes but never monitoring their performance or usage.
Impact:
- Bloated, fragmented indexes
- Unused indexes wasting resources
- Outdated statistics causing poor query plans
Solution:
-- PostgreSQL: Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- Remove unused indexes
DROP INDEX IF EXISTS idx_rarely_used;
-- Rebuild fragmented indexes (PostgreSQL)
REINDEX INDEX CONCURRENTLY idx_heavily_used;
-- Update statistics
ANALYZE users;
Maintenance Schedule:
- Review index usage quarterly
- Rebuild fragmented indexes monthly (for high-write tables)
- Update statistics after bulk operations
- Monitor index bloat
Mistake 4: Indexing Low-Cardinality Columns
Problem: Creating indexes on columns with few distinct values (boolean, status with 2-3 values).
Impact:
- Index provides minimal benefit
- Maintenance overhead outweighs benefits
Example:
-- Bad: Index on boolean column
CREATE INDEX idx_users_is_active ON users(is_active);
-- Only 2 possible values (true/false)
-- Good: Use partial index if needed
CREATE INDEX idx_users_inactive_email
ON users(email)
WHERE is_active = false;
-- Only indexes inactive users if that's what you query
Low-Cardinality Threshold:
- Avoid indexing columns with < 5-10% distinct values
- Exception: Use partial indexes for specific value queries
- Consider composite indexes where low-cardinality column filters significantly
Mistake 5: Not Using EXPLAIN to Verify Index Usage
Problem: Assuming an index will be used without verification.
Impact:
- Queries don’t use expected indexes
- Inefficient query plans
- Wasted indexing effort
Solution:
-- Check if index is being used
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = '[email protected]';
-- Look for:
-- PostgreSQL: "Index Scan using idx_users_email"
-- MySQL: "Using index"
-- If you see "Seq Scan" or "Full Table Scan", investigate why:
-- 1. Index doesn't match query pattern
-- 2. Statistics are outdated
-- 3. Query optimizer chose full scan (table too small)
-- 4. Query uses functions on indexed column
Verification Checklist:
- Always EXPLAIN queries after creating indexes
- Check actual execution time, not just estimated cost
- Monitor slow query logs
- Review query plans for production queries
Index Performance Tuning Strategies
Strategy 1: Analyze Query Patterns First
Before creating any index, understand your actual query patterns:
-- PostgreSQL: Enable query logging
ALTER DATABASE mydb SET log_min_duration_statement = 100;
-- Logs queries taking > 100ms
-- Analyze slow queries
SELECT
query,
calls,
total_time,
mean_time,
min_time,
max_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;
Questions to Answer:
- Which queries are slowest?
- Which queries are most frequent?
- What columns appear in WHERE/JOIN/ORDER BY?
- Are there query patterns (e.g., always filtering by status=’active’)?
Strategy 2: Measure Index Impact
After creating an index, measure its actual impact:
-- Before indexing
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';
-- Note: Execution time: 250 ms
-- Create index
CREATE INDEX idx_orders_customer_status
ON orders(customer_id, status);
-- After indexing
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';
-- Note: Execution time: 5 ms (50x improvement!)
Metrics to Track:
- Query execution time (before/after)
- Number of rows scanned
- Index size and overhead
- Write operation impact (INSERT/UPDATE timing)
Strategy 3: Use Index-Only Scans
Optimize indexes to avoid table lookups entirely:
-- Query that needs three columns
SELECT order_id, customer_id, total_amount
FROM orders
WHERE customer_id = 123;
-- Create covering index
CREATE INDEX idx_orders_customer_covering
ON orders(customer_id)
INCLUDE (order_id, total_amount);
-- Verify index-only scan
EXPLAIN SELECT order_id, customer_id, total_amount
FROM orders
WHERE customer_id = 123;
-- Should show "Index Only Scan"
Benefits:
- No table access needed
- Reduced I/O operations
- Faster query execution
- Better cache efficiency
Strategy 4: Optimize for Write Performance
Balance read performance with write performance:
-- High-write table
CREATE TABLE metrics (
id BIGSERIAL PRIMARY KEY,
metric_name VARCHAR(100),
value DECIMAL,
recorded_at TIMESTAMP DEFAULT NOW()
);
-- Strategy: Minimize indexes on high-write tables
-- Only index what's absolutely necessary
CREATE INDEX idx_metrics_name_time
ON metrics(metric_name, recorded_at)
WHERE recorded_at > NOW() - INTERVAL '7 days';
-- Partial index reduces write overhead
Write-Optimized Guidelines:
- Fewer indexes on high-write tables
- Use partial indexes to reduce index size
- Consider async index creation during off-peak hours
- Batch inserts to amortize index maintenance cost
Database-Specific Indexing Features
PostgreSQL Advanced Features
-- Expression indexes
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));
-- Now queries with LOWER(email) can use the index
-- BRIN indexes for large, sequentially ordered tables
CREATE INDEX idx_logs_created_brin
ON logs USING BRIN (created_at);
-- Much smaller than B-tree, suitable for time-series data
-- GIN indexes for array and full-text search
CREATE INDEX idx_posts_tags_gin
ON posts USING GIN (tags);
-- Fast array containment queries
CREATE INDEX idx_posts_content_fts
ON posts USING GIN (to_tsvector('english', content));
-- Full-text search optimization
MySQL Specific Considerations
-- Prefix indexes for long strings
CREATE INDEX idx_users_email_prefix
ON users(email(20));
-- Only indexes first 20 characters
-- Full-text indexes
CREATE FULLTEXT INDEX idx_posts_content
ON posts(content);
-- Use InnoDB (default) which clusters data by primary key
-- Optimize primary key choice for range queries
Monitoring and Maintaining Indexes
Regular Index Health Checks
-- PostgreSQL: Check index bloat
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
idx_scan as number_of_scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Identify candidates for removal
SELECT
schemaname || '.' || tablename as table,
indexname as index,
pg_size_pretty(pg_relation_size(indexrelid)) as size,
idx_scan as scans
FROM pg_stat_user_indexes
WHERE idx_scan < 100 -- Adjust threshold as needed
AND indexrelname NOT LIKE '%_pkey' -- Exclude primary keys
ORDER BY pg_relation_size(indexrelid) DESC;
Index Rebuild Strategy
-- PostgreSQL: Rebuild without blocking writes
REINDEX INDEX CONCURRENTLY idx_orders_customer;
-- Or rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY orders;
-- Update statistics after major data changes
ANALYZE orders;
VACUUM ANALYZE orders; -- Also reclaims space
When to Rebuild:
- After bulk imports or updates
- When query performance degrades over time
- Index bloat exceeds 30%
- Following database version upgrades
Practical Decision Framework
Use this framework to decide on indexing strategy:
Step 1: Identify Candidates
- Analyze slow query logs
- Check columns in WHERE, JOIN, ORDER BY clauses
- Review query frequency and business impact
Step 2: Evaluate Trade-offs
- Read frequency vs. write frequency: High-read → more indexes acceptable
- Query complexity: Complex queries benefit more from indexes
- Data size: Larger tables need indexes more urgently
- Storage constraints: Consider index size vs. available space
Step 3: Choose Index Type
- Equality only: Hash index
- Range queries: B-tree index
- Multiple columns: Composite index (order carefully)
- Subset of rows: Partial index
- Specific columns only: Covering index
Step 4: Implement and Verify
- Create index using CONCURRENTLY (no table locks)
- EXPLAIN queries to verify usage
- Measure actual performance improvement
- Monitor write operation impact
Step 5: Maintain and Review
- Quarterly: Review index usage and remove unused indexes
- Monthly: Check for bloat and rebuild if needed
- After bulk ops: Update statistics
- Continuously: Monitor slow query logs
Conclusion
Effective database indexing is both an art and a science. While indexes can dramatically improve query performance, over-indexing or poor index design can hurt overall system performance. The key is understanding your query patterns, measuring impact, and maintaining your indexes over time.
Key Takeaways:
- Always analyze query patterns before creating indexes
- Choose appropriate index types for your use case
- Order composite index columns strategically
- Use partial and covering indexes to optimize specific queries
- Monitor index usage and remove unused indexes
- Balance read performance with write overhead
- Verify index usage with EXPLAIN
- Maintain indexes regularly
Remember, indexing is not a one-time task. It requires ongoing monitoring, adjustment, and optimization as your application evolves and query patterns change. Start with the queries that matter most, measure the impact, and iterate from there.
The difference between a slow application and a fast one often comes down to proper indexing strategy. Master these principles, and you’ll have a powerful tool for optimizing database performance across any application scale.
Comments