Database queries running slowly? Before scaling up your hardware or switching databases, understanding indexing strategies can often improve query performance by 100x or more. This comprehensive guide explores practical indexing techniques, common pitfalls, and real-world optimization strategies that every developer should master.

Indexing is one of the most powerful yet frequently misunderstood tools in database performance optimization. A well-designed indexing strategy can transform a query that takes minutes into one that completes in milliseconds. However, poorly implemented indexes can actually degrade performance and waste storage space.

Understanding Database Indexes: The Fundamentals

At its core, a database index works like a book’s index. It provides a fast lookup mechanism to locate data without scanning every row. When you query a table without an index, the database performs a full table scan, examining every single row. With an appropriate index, the database can jump directly to relevant rows.

How Indexes Work Under the Hood

Most databases use B-tree (balanced tree) structures for indexes. Here’s what happens when you query an indexed column:

  1. Without Index: Database scans all rows sequentially (O(n) complexity)
  2. With Index: Database traverses the B-tree structure (O(log n) complexity)

For a table with 1 million rows:

  • Full table scan: ~1,000,000 operations
  • Indexed lookup: ~20 operations (log₂ 1,000,000 ≈ 20)

This fundamental difference explains why proper indexing can yield 100x-1000x performance improvements.

Index Types and When to Use Them

Different index types serve different purposes. Understanding these variations is crucial for effective optimization.

1. B-tree Indexes: The Default Workhorse

What They Are: Balanced tree structures that maintain sorted data and support range queries.

Best For:

  • Equality comparisons (WHERE user_id = 123)
  • Range queries (WHERE created_at > '2025-01-01')
  • Sorting operations (ORDER BY last_name)
  • Pattern matching with leading wildcards (WHERE email LIKE 'john%')

Implementation Example:

-- Create a B-tree index on user email
CREATE INDEX idx_users_email ON users(email);

-- This query will use the index efficiently
SELECT * FROM users WHERE email = '[email protected]';

-- Range query also benefits
SELECT * FROM users WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31';

Performance Characteristics:

  • Insert/Update Cost: O(log n)
  • Lookup Cost: O(log n)
  • Storage Overhead: ~10-20% of table size

2. Hash Indexes: Fast Exact Matches

What They Are: Hash-based structures optimized for exact equality comparisons.

Best For:

  • Exact match queries only
  • High-cardinality columns (many unique values)
  • Scenarios where range queries are never needed

Implementation Example:

-- PostgreSQL hash index
CREATE INDEX idx_users_api_token_hash ON users USING HASH (api_token);

-- Perfect for exact lookups
SELECT * FROM users WHERE api_token = 'abc123xyz';

-- NOT suitable for ranges or pattern matching
-- These won't use the hash index:
SELECT * FROM users WHERE api_token LIKE 'abc%';  -- Won't use index
SELECT * FROM users WHERE api_token > 'abc';       -- Won't use index

Performance Characteristics:

  • Lookup Cost: O(1) average case
  • Not suitable for range queries
  • Smaller than B-tree indexes

When to Choose Hash Over B-tree:

  • You only perform equality checks (never ranges)
  • Storage space is constrained
  • The column has very high cardinality

3. Composite (Multi-Column) Indexes

What They Are: Indexes spanning multiple columns, creating a hierarchical lookup structure.

Best For:

  • Queries filtering on multiple columns
  • Queries with specific column order requirements
  • Covering indexes that eliminate table lookups

Implementation Example:

-- Composite index on user queries
CREATE INDEX idx_users_country_city_age 
ON users(country, city, age);

-- These queries can use the index effectively:
-- Full match (all columns)
SELECT * FROM users 
WHERE country = 'USA' AND city = 'New York' AND age = 25;

-- Prefix match (leftmost columns)
SELECT * FROM users 
WHERE country = 'USA' AND city = 'New York';

-- Leftmost column only
SELECT * FROM users 
WHERE country = 'USA';

-- These queries CANNOT use the index:
-- Missing leftmost column
SELECT * FROM users WHERE city = 'New York';  -- Won't use index

-- Non-contiguous columns
SELECT * FROM users WHERE country = 'USA' AND age = 25;  -- Partial use only

The Leftmost Prefix Rule: Composite indexes work left-to-right. You can use them for queries that match the leftmost columns, but not for queries that skip columns.

Column Ordering Strategy:

  1. Equality filters first: Columns with = conditions
  2. High selectivity: Columns that filter out the most rows
  3. Range filters last: Columns with >, <, BETWEEN
-- Good ordering: equality → high selectivity → range
CREATE INDEX idx_orders_status_customer_date 
ON orders(status, customer_id, created_at);

-- Optimal for this common query pattern:
SELECT * FROM orders 
WHERE status = 'pending' 
  AND customer_id = 123 
  AND created_at > '2025-01-01';

4. Partial Indexes: Targeted Optimization

What They Are: Indexes that only cover rows matching a specific condition.

Best For:

  • Queries that consistently filter on the same condition
  • Tables with skewed data distribution
  • Reducing index size and maintenance cost

Implementation Example:

-- Only index active users (90% of queries target active users)
CREATE INDEX idx_users_active_email 
ON users(email) 
WHERE status = 'active';

-- This query uses the smaller, faster partial index
SELECT * FROM users 
WHERE status = 'active' AND email = '[email protected]';

-- Example with nulls
CREATE INDEX idx_orders_pending_customer 
ON orders(customer_id) 
WHERE status = 'pending' AND shipped_at IS NULL;

Benefits:

  • Smaller index size (less disk I/O)
  • Faster index scans
  • Reduced maintenance overhead
  • Lower storage costs

Use Cases:

  • Active/inactive records (index active only)
  • Soft deletes (index non-deleted records)
  • Status-based queries (index pending orders)

5. Covering Indexes: Eliminating Table Lookups

What They Are: Indexes that include all columns needed by a query, eliminating the need to access the table.

Best For:

  • Queries accessing the same small set of columns repeatedly
  • Read-heavy workloads
  • Reducing I/O operations

Implementation Example:

-- Standard index
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Covering index includes additional columns
CREATE INDEX idx_orders_customer_covering 
ON orders(customer_id) 
INCLUDE (order_date, total_amount, status);

-- This query can be satisfied entirely from the index
SELECT order_date, total_amount, status 
FROM orders 
WHERE customer_id = 123;
-- No table lookup needed!

Performance Impact:

  • Can reduce query time by 50-80%
  • Particularly effective for queries returning many rows
  • Trade-off: larger index size

Real-World Indexing Patterns

Let’s explore common application scenarios and optimal indexing strategies.

Pattern 1: User Lookup and Authentication

Scenario: Application frequently looks up users by email for authentication.

-- Table structure
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    password_hash VARCHAR(255),
    status VARCHAR(20),
    last_login_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Optimal indexing strategy
CREATE UNIQUE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_status_last_login 
ON users(status, last_login_at) 
WHERE status = 'active';

Rationale:

  • UNIQUE on email prevents duplicates and provides fast lookups
  • Partial index on active users optimizes dashboard queries
  • Composite index supports “recent active users” queries

Pattern 2: Time-Series Data Queries

Scenario: Logs or events table with frequent range queries.

-- Table structure
CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT,
    event_type VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW(),
    data JSONB
);

-- Optimal indexing strategy
CREATE INDEX idx_events_user_created 
ON events(user_id, created_at DESC);

CREATE INDEX idx_events_type_created 
ON events(event_type, created_at DESC) 
WHERE created_at > NOW() - INTERVAL '90 days';

Rationale:

  • Composite index supports user-specific queries with date ranges
  • DESC ordering optimizes “recent events first” queries
  • Partial index reduces size by focusing on recent data

Scenario: Product catalog with filtering by category, price, and availability.

-- Table structure
CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255),
    category_id INTEGER,
    price DECIMAL(10,2),
    stock_quantity INTEGER,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Optimal indexing strategy
-- For category browsing
CREATE INDEX idx_products_category_price 
ON products(category_id, price) 
WHERE is_active = true AND stock_quantity > 0;

-- For full-text search (PostgreSQL)
CREATE INDEX idx_products_name_gin 
ON products USING GIN (to_tsvector('english', name));

-- For admin queries
CREATE INDEX idx_products_active_stock 
ON products(is_active, stock_quantity);

Rationale:

  • Partial index excludes inactive/out-of-stock products from customer queries
  • GIN index enables fast full-text search
  • Separate index for administrative queries

Pattern 4: Social Media Timeline

Scenario: Fetching posts for user’s feed with followers/following relationships.

-- Relationships table
CREATE TABLE follows (
    follower_id BIGINT,
    following_id BIGINT,
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (follower_id, following_id)
);

-- Posts table
CREATE TABLE posts (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT,
    content TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Optimal indexing strategy
CREATE INDEX idx_follows_follower ON follows(follower_id, created_at DESC);
CREATE INDEX idx_follows_following ON follows(following_id, created_at DESC);
CREATE INDEX idx_posts_user_created ON posts(user_id, created_at DESC);

Rationale:

  • Both directions of follow relationship are indexed
  • Posts ordered by date support chronological feeds
  • Enables efficient “posts from people I follow” queries

Common Indexing Mistakes and How to Avoid Them

Mistake 1: Over-Indexing

Problem: Creating indexes on every column or every query pattern.

Impact:

  • Slows down INSERT/UPDATE/DELETE operations
  • Wastes storage space
  • Index maintenance overhead
  • Query optimizer confusion (too many index choices)

Solution:

-- Bad: Index every column
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_first_name ON users(first_name);
CREATE INDEX idx_users_last_name ON users(last_name);
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_users_status ON users(status);

-- Good: Index based on actual query patterns
CREATE INDEX idx_users_email ON users(email);  -- Frequent exact lookups
CREATE INDEX idx_users_status_created 
ON users(status, created_at) 
WHERE status = 'active';  -- Common query pattern

Guidelines:

  • Analyze actual query patterns before creating indexes
  • Monitor index usage and remove unused indexes
  • Focus on columns in WHERE, JOIN, and ORDER BY clauses
  • Limit indexes to 5-7 per table in most cases

Mistake 2: Wrong Column Order in Composite Indexes

Problem: Creating composite indexes without considering query patterns and column selectivity.

Impact:

  • Index cannot be used for many queries
  • Wasted storage and maintenance cost

Example:

-- Bad: Low selectivity column first
CREATE INDEX idx_orders_status_customer 
ON orders(status, customer_id);
-- Only 3-4 distinct statuses, but thousands of customers

-- Good: High selectivity column first
CREATE INDEX idx_orders_customer_status 
ON orders(customer_id, status);
-- Filters to specific customer first, then status

Column Ordering Best Practices:

  1. Equality conditions before range conditions
  2. High selectivity before low selectivity
  3. Most frequently queried columns first

Mistake 3: Ignoring Index Maintenance

Problem: Creating indexes but never monitoring their performance or usage.

Impact:

  • Bloated, fragmented indexes
  • Unused indexes wasting resources
  • Outdated statistics causing poor query plans

Solution:

-- PostgreSQL: Check index usage
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Remove unused indexes
DROP INDEX IF EXISTS idx_rarely_used;

-- Rebuild fragmented indexes (PostgreSQL)
REINDEX INDEX CONCURRENTLY idx_heavily_used;

-- Update statistics
ANALYZE users;

Maintenance Schedule:

  • Review index usage quarterly
  • Rebuild fragmented indexes monthly (for high-write tables)
  • Update statistics after bulk operations
  • Monitor index bloat

Mistake 4: Indexing Low-Cardinality Columns

Problem: Creating indexes on columns with few distinct values (boolean, status with 2-3 values).

Impact:

  • Index provides minimal benefit
  • Maintenance overhead outweighs benefits

Example:

-- Bad: Index on boolean column
CREATE INDEX idx_users_is_active ON users(is_active);
-- Only 2 possible values (true/false)

-- Good: Use partial index if needed
CREATE INDEX idx_users_inactive_email 
ON users(email) 
WHERE is_active = false;
-- Only indexes inactive users if that's what you query

Low-Cardinality Threshold:

  • Avoid indexing columns with < 5-10% distinct values
  • Exception: Use partial indexes for specific value queries
  • Consider composite indexes where low-cardinality column filters significantly

Mistake 5: Not Using EXPLAIN to Verify Index Usage

Problem: Assuming an index will be used without verification.

Impact:

  • Queries don’t use expected indexes
  • Inefficient query plans
  • Wasted indexing effort

Solution:

-- Check if index is being used
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = '[email protected]';

-- Look for:
-- PostgreSQL: "Index Scan using idx_users_email"
-- MySQL: "Using index"

-- If you see "Seq Scan" or "Full Table Scan", investigate why:
-- 1. Index doesn't match query pattern
-- 2. Statistics are outdated
-- 3. Query optimizer chose full scan (table too small)
-- 4. Query uses functions on indexed column

Verification Checklist:

  • Always EXPLAIN queries after creating indexes
  • Check actual execution time, not just estimated cost
  • Monitor slow query logs
  • Review query plans for production queries

Index Performance Tuning Strategies

Strategy 1: Analyze Query Patterns First

Before creating any index, understand your actual query patterns:

-- PostgreSQL: Enable query logging
ALTER DATABASE mydb SET log_min_duration_statement = 100;
-- Logs queries taking > 100ms

-- Analyze slow queries
SELECT 
    query,
    calls,
    total_time,
    mean_time,
    min_time,
    max_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

Questions to Answer:

  • Which queries are slowest?
  • Which queries are most frequent?
  • What columns appear in WHERE/JOIN/ORDER BY?
  • Are there query patterns (e.g., always filtering by status=’active’)?

Strategy 2: Measure Index Impact

After creating an index, measure its actual impact:

-- Before indexing
EXPLAIN ANALYZE
SELECT * FROM orders 
WHERE customer_id = 123 AND status = 'pending';
-- Note: Execution time: 250 ms

-- Create index
CREATE INDEX idx_orders_customer_status 
ON orders(customer_id, status);

-- After indexing
EXPLAIN ANALYZE
SELECT * FROM orders 
WHERE customer_id = 123 AND status = 'pending';
-- Note: Execution time: 5 ms (50x improvement!)

Metrics to Track:

  • Query execution time (before/after)
  • Number of rows scanned
  • Index size and overhead
  • Write operation impact (INSERT/UPDATE timing)

Strategy 3: Use Index-Only Scans

Optimize indexes to avoid table lookups entirely:

-- Query that needs three columns
SELECT order_id, customer_id, total_amount 
FROM orders 
WHERE customer_id = 123;

-- Create covering index
CREATE INDEX idx_orders_customer_covering 
ON orders(customer_id) 
INCLUDE (order_id, total_amount);

-- Verify index-only scan
EXPLAIN SELECT order_id, customer_id, total_amount 
FROM orders 
WHERE customer_id = 123;
-- Should show "Index Only Scan"

Benefits:

  • No table access needed
  • Reduced I/O operations
  • Faster query execution
  • Better cache efficiency

Strategy 4: Optimize for Write Performance

Balance read performance with write performance:

-- High-write table
CREATE TABLE metrics (
    id BIGSERIAL PRIMARY KEY,
    metric_name VARCHAR(100),
    value DECIMAL,
    recorded_at TIMESTAMP DEFAULT NOW()
);

-- Strategy: Minimize indexes on high-write tables
-- Only index what's absolutely necessary
CREATE INDEX idx_metrics_name_time 
ON metrics(metric_name, recorded_at) 
WHERE recorded_at > NOW() - INTERVAL '7 days';
-- Partial index reduces write overhead

Write-Optimized Guidelines:

  • Fewer indexes on high-write tables
  • Use partial indexes to reduce index size
  • Consider async index creation during off-peak hours
  • Batch inserts to amortize index maintenance cost

Database-Specific Indexing Features

PostgreSQL Advanced Features

-- Expression indexes
CREATE INDEX idx_users_lower_email 
ON users(LOWER(email));
-- Now queries with LOWER(email) can use the index

-- BRIN indexes for large, sequentially ordered tables
CREATE INDEX idx_logs_created_brin 
ON logs USING BRIN (created_at);
-- Much smaller than B-tree, suitable for time-series data

-- GIN indexes for array and full-text search
CREATE INDEX idx_posts_tags_gin 
ON posts USING GIN (tags);
-- Fast array containment queries

CREATE INDEX idx_posts_content_fts 
ON posts USING GIN (to_tsvector('english', content));
-- Full-text search optimization

MySQL Specific Considerations

-- Prefix indexes for long strings
CREATE INDEX idx_users_email_prefix 
ON users(email(20));
-- Only indexes first 20 characters

-- Full-text indexes
CREATE FULLTEXT INDEX idx_posts_content 
ON posts(content);

-- Use InnoDB (default) which clusters data by primary key
-- Optimize primary key choice for range queries

Monitoring and Maintaining Indexes

Regular Index Health Checks

-- PostgreSQL: Check index bloat
SELECT 
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
    idx_scan as number_of_scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Identify candidates for removal
SELECT 
    schemaname || '.' || tablename as table,
    indexname as index,
    pg_size_pretty(pg_relation_size(indexrelid)) as size,
    idx_scan as scans
FROM pg_stat_user_indexes
WHERE idx_scan < 100  -- Adjust threshold as needed
    AND indexrelname NOT LIKE '%_pkey'  -- Exclude primary keys
ORDER BY pg_relation_size(indexrelid) DESC;

Index Rebuild Strategy

-- PostgreSQL: Rebuild without blocking writes
REINDEX INDEX CONCURRENTLY idx_orders_customer;

-- Or rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY orders;

-- Update statistics after major data changes
ANALYZE orders;
VACUUM ANALYZE orders;  -- Also reclaims space

When to Rebuild:

  • After bulk imports or updates
  • When query performance degrades over time
  • Index bloat exceeds 30%
  • Following database version upgrades

Practical Decision Framework

Use this framework to decide on indexing strategy:

Step 1: Identify Candidates

  • Analyze slow query logs
  • Check columns in WHERE, JOIN, ORDER BY clauses
  • Review query frequency and business impact

Step 2: Evaluate Trade-offs

  • Read frequency vs. write frequency: High-read → more indexes acceptable
  • Query complexity: Complex queries benefit more from indexes
  • Data size: Larger tables need indexes more urgently
  • Storage constraints: Consider index size vs. available space

Step 3: Choose Index Type

  • Equality only: Hash index
  • Range queries: B-tree index
  • Multiple columns: Composite index (order carefully)
  • Subset of rows: Partial index
  • Specific columns only: Covering index

Step 4: Implement and Verify

  • Create index using CONCURRENTLY (no table locks)
  • EXPLAIN queries to verify usage
  • Measure actual performance improvement
  • Monitor write operation impact

Step 5: Maintain and Review

  • Quarterly: Review index usage and remove unused indexes
  • Monthly: Check for bloat and rebuild if needed
  • After bulk ops: Update statistics
  • Continuously: Monitor slow query logs

Conclusion

Effective database indexing is both an art and a science. While indexes can dramatically improve query performance, over-indexing or poor index design can hurt overall system performance. The key is understanding your query patterns, measuring impact, and maintaining your indexes over time.

Key Takeaways:

  • Always analyze query patterns before creating indexes
  • Choose appropriate index types for your use case
  • Order composite index columns strategically
  • Use partial and covering indexes to optimize specific queries
  • Monitor index usage and remove unused indexes
  • Balance read performance with write overhead
  • Verify index usage with EXPLAIN
  • Maintain indexes regularly

Remember, indexing is not a one-time task. It requires ongoing monitoring, adjustment, and optimization as your application evolves and query patterns change. Start with the queries that matter most, measure the impact, and iterate from there.

The difference between a slow application and a fast one often comes down to proper indexing strategy. Master these principles, and you’ll have a powerful tool for optimizing database performance across any application scale.