Last updated: November 14, 2025
Reading time: 8 minutes
Level: Intermediate


Introduction

PostgreSQL is one of the most powerful open-source relational databases available today. It’s trusted by companies like Instagram, Netflix, and Spotify to handle billions of transactions. But here’s the thing: PostgreSQL’s power comes with responsibility. Out of the box, it’s configured conservatively to run on almost any hardware—which means it’s probably not optimized for your specific workload.

Over the past 15 years working with databases, I’ve seen the same optimization mistakes repeated across dozens of companies. These mistakes can slow your application by 10x or more, cause unexpected downtime, and waste thousands in cloud costs. The good news? They’re all preventable.

In this article, you’ll learn the five most common PostgreSQL mistakes and exactly how to fix them. Whether you’re running a startup with a 10GB database or an enterprise with terabytes of data, these optimizations will make a measurable difference.


Mistake #1: Not Using Appropriate Indexes

This is by far the most common mistake. I’ve walked into companies where critical queries were doing full table scans on million-row tables—simply because no one had created the right index.

Understanding Index Types

PostgreSQL offers several index types, each optimized for different scenarios:

  • B-tree (default): Best for equality and range queries. Use for columns in WHERE clauses, JOIN conditions, and ORDER BY.
  • Hash: Optimized for simple equality comparisons. Faster than B-tree for =, but can’t handle <, >, or ranges.
  • GiST: For geometric and full-text search data.
  • GIN: Perfect for JSONB, arrays, and full-text search.
  • BRIN: Block Range Index—extremely compact for large tables with naturally sorted data (like timestamps).

When to Use Each Index Type

Example scenario: You have a table with 10 million orders.

-- B-tree: For customer lookups
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- B-tree composite: For date range queries by customer
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at);

-- BRIN: For time-series data (orders are naturally sorted by created_at)
CREATE INDEX idx_orders_created_at_brin ON orders USING BRIN(created_at);

-- GIN: For JSONB metadata column
CREATE INDEX idx_orders_metadata_gin ON orders USING GIN(metadata);

-- Hash: For exact UUID lookups (if you only use =)
CREATE INDEX idx_orders_uuid_hash ON orders USING HASH(order_uuid);

Pro Tip: Use EXPLAIN ANALYZE before and after creating an index to measure the actual improvement. A “Seq Scan” in the query plan means you’re missing an index.


Mistake #2: Not Running VACUUM Regularly

PostgreSQL uses Multi-Version Concurrency Control (MVCC), which means when you UPDATE or DELETE a row, the old version isn’t immediately removed. Over time, these “dead tuples” accumulate, causing table bloat.

What is Table Bloat?

Imagine your users table should be 100MB, but it’s actually 500MB because it’s full of dead rows. This means:

  • Slower sequential scans (more data to read)
  • Wasted disk space
  • More expensive backups
  • Higher cloud storage costs

How to Configure autovacuum

PostgreSQL has autovacuum enabled by default, but the default settings are often too conservative for production workloads.

-- Check current autovacuum settings
SHOW autovacuum_vacuum_scale_factor;
SHOW autovacuum_vacuum_threshold;

-- Recommended production settings (in postgresql.conf)
autovacuum = on
autovacuum_max_workers = 4  # Increase for high-write workloads
autovacuum_naptime = 15s    # Check more frequently (default: 1min)
autovacuum_vacuum_scale_factor = 0.05  # Vacuum when 5% of table is dead (default: 20%)
autovacuum_vacuum_threshold = 50

How to Check for Bloat

-- Check dead tuples per table
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup,
    round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 20;

Pro Tip: If you see tables with >20% dead tuples, run a manual VACUUM ANALYZE immediately, then tune autovacuum settings.


Mistake #3: Using Default shared_buffers Configuration

The default shared_buffers setting is 128MB. This made sense in 2005, but today’s servers have 16GB, 64GB, or more RAM. Using the default is like buying a Ferrari and only putting one gallon of gas in it.

Why 128MB Is Not Enough

shared_buffers is PostgreSQL’s internal cache for data pages. When properly sized, frequently accessed data stays in memory instead of being read from disk repeatedly. This can improve query performance by 10-100x.

How to Calculate the Appropriate Value

General rule of thumb:

  • Dedicated database server: 25% of total RAM
  • Shared server: 15-20% of total RAM
  • Maximum: Don’t exceed 40% of RAM
-- Example: Server with 32GB RAM
shared_buffers = 8GB          # 25% of 32GB

-- Other critical memory settings
effective_cache_size = 24GB   # 75% of RAM (helps query planner)
work_mem = 64MB               # Per-operation memory (sort, hash)
maintenance_work_mem = 2GB    # For VACUUM, CREATE INDEX

Warning: Changing shared_buffers requires a PostgreSQL restart. Always test changes in a staging environment first.


Mistake #4: Not Using Connection Pooling

I once debugged a production issue where a web application was creating 500+ database connections, bringing the entire database to its knees. Each PostgreSQL connection consumes 5-10MB of memory. With 500 connections, that’s 2.5-5GB just for connection overhead!

The “Too Many Connections” Problem

PostgreSQL uses a process-per-connection model. This is different from MySQL’s thread-based model and means PostgreSQL doesn’t handle thousands of concurrent connections efficiently.

Symptoms of connection overload:

  • Error: “FATAL: sorry, too many clients already”
  • High CPU usage even with simple queries
  • Slow connection establishment
  • Memory exhaustion

Solution: PgBouncer Implementation

PgBouncer is a lightweight connection pooler that sits between your application and PostgreSQL. It maintains a small pool of actual database connections and multiplexes application connections through them.

# /etc/pgbouncer/pgbouncer.ini

[databases]
myapp = host=localhost port=5432 dbname=myapp_production

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# Connection pool settings
pool_mode = transaction          # Most efficient mode
max_client_conn = 1000           # Max connections from apps
default_pool_size = 25           # Actual Postgres connections per database
reserve_pool_size = 10           # Emergency pool
reserve_pool_timeout = 3

Result: 1000 application connections use only 25-35 actual database connections. Memory usage drops from 5GB to 250MB.

Pro Tip: Use pool_mode = transaction for web applications (most efficient). Use session mode only if your app uses session-level features like temporary tables.


Mistake #5: Running Queries Without EXPLAIN ANALYZE

This is the debugging mistake. When a query is slow, developers often try random fixes: “Let me add an index here… maybe change this JOIN… perhaps rewrite the subquery…” This is optimization by guesswork.

How to Read Query Plans

EXPLAIN ANALYZE shows you exactly what PostgreSQL is doing to execute your query, with actual timing and row counts.

EXPLAIN ANALYZE
SELECT o.order_id, o.total, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.created_at >= '2025-01-01'
  AND o.status = 'completed';

Key Things to Look For

  • Seq Scan: Full table scan (BAD for large tables)—usually means missing index
  • Index Scan: Using an index (GOOD)
  • Bitmap Heap Scan: Using multiple indexes (GOOD for complex conditions)
  • Nested Loop: Acceptable for small datasets; problematic with large tables
  • Hash Join / Merge Join: Efficient for larger datasets
  • High “rows” vs “actual rows”: Statistics are outdated—run ANALYZE

Real-World Example

Before optimization:

Seq Scan on orders  (cost=0.00..180000.00 rows=50 width=40)
                    (actual time=12.453..2847.234 rows=52 loops=1)
  Filter: ((created_at >= '2025-01-01') AND (status = 'completed'))
  Rows Removed by Filter: 4999948

Planning Time: 0.324 ms
Execution Time: 2847.489 ms

After adding index:

CREATE INDEX idx_orders_date_status ON orders(created_at, status);
Index Scan using idx_orders_date_status on orders
  (cost=0.43..178.45 rows=50 width=40)
  (actual time=0.034..0.156 rows=52 loops=1)
  Index Cond: ((created_at >= '2025-01-01') AND (status = 'completed'))

Planning Time: 0.298 ms
Execution Time: 0.178 ms

Result: Query went from 2.8 seconds to 0.18 milliseconds—16,000x faster!

Pro Tip: Use explain.depesz.com to visualize complex query plans. It makes them much easier to understand.


Conclusion

PostgreSQL is incredibly powerful, but it requires proper configuration and maintenance. These five mistakes are responsible for 80% of the performance problems I’ve encountered in production systems.

Key takeaways:

  1. Create the right indexes—B-tree for most cases, BRIN for time-series, GIN for JSONB
  2. Monitor and tune autovacuum—Don’t let table bloat slow you down
  3. Increase shared_buffers—25% of RAM for dedicated database servers
  4. Use connection pooling—PgBouncer can reduce connections by 95%
  5. Always use EXPLAIN ANALYZE—Never optimize blind

Next steps:

  1. Run the bloat check query on your database today
  2. Review your top 10 slowest queries with EXPLAIN ANALYZE
  3. Install and configure PgBouncer if you’re not using connection pooling

Need Help Optimizing Your PostgreSQL Database?

At INNOVABASE, we’ve been optimizing PostgreSQL databases for Fortune 500 companies and fast-growing startups since 2010. We’ve helped clients:

  • Reduce query times from minutes to milliseconds
  • Cut cloud database costs by 60-80%
  • Scale from thousands to millions of users
  • Migrate from Oracle to PostgreSQL (saving $100K+ annually in licensing)

Whether you need a one-time performance audit or ongoing database consulting, we can help.


Tags: #PostgreSQL #DatabaseOptimization #Performance #SQL #DevOps


Leave a Reply

Your email address will not be published. Required fields are marked *

  • es
  • en