Estimate PostgreSQL Resource Need

Estimating resources for a PostgreSQL database based on website visitors involves a few key considerations, including traffic patterns, query complexity, data size, and the expected workload on the database.

At the end of the article there is an example calculation assuming 100000 visitors/day.

Here’s a general approach to estimating the necessary resources:

1. Understand Traffic Patterns

  • Number of Daily Active Users (DAU): Estimate the number of unique users who will interact with your site daily.
  • Peak Traffic: Identify peak times when user activity is at its highest (e.g., during certain hours of the day).
  • Requests Per Second (RPS): Estimate the average and peak number of requests your database will need to handle per second.

2. Estimate Query Load

  • Simple vs. Complex Queries: Categorize the types of queries (e.g., simple read queries vs. complex joins and transactions).
  • Read vs. Write Ratio: Determine the ratio of read to write operations (e.g., 80% reads, 20% writes).
  • Cache Hit Ratio: Estimate how many of these requests can be served from a cache (like Redis) rather than hitting the database.

3. Estimate Data Size

  • Total Data Size: Estimate the total size of your database, including tables, indexes, and logs.
  • Data Growth: Consider how much data will grow over time and the retention policy for older data.
  • Working Set: Determine the “hot” data that will be accessed frequently.

4. Resource Estimation

  • CPU:
    • High CPU demand is needed for complex queries, large joins, or OLAP workloads.
    • For simpler queries or OLTP workloads, fewer CPUs may be required.
    • Start with at least 2-4 vCPUs and scale based on load testing.
  • Memory:
    • More memory allows more of your database to be kept in memory, reducing disk I/O.
    • Consider 1GB of RAM per 10GB of database size as a baseline. This varies depending on the workload.
    • Aim for at least 8-16GB of RAM to start, increasing as necessary based on your database size and access patterns.
  • Disk I/O:
    • Disk performance is critical, especially for write-heavy workloads.
    • Use SSDs for better I/O performance.
    • Estimate disk I/O needs based on write volume, transaction log size, and frequency of full-table scans.
  • Storage:
    • Provision storage based on data size, with at least 20-30% extra space for growth.
    • Ensure proper backup and replication strategies are in place.

5. Scaling Strategies

  • Vertical Scaling: Start with a small instance and scale up CPU, memory, and I/O resources as needed.
  • Horizontal Scaling: If you anticipate very high traffic, consider setting up replication (e.g., read replicas) to distribute the read load.

6. Load Testing

  • Simulate Traffic: Use load testing tools like JMeter or Locust to simulate traffic and measure the database’s response.
  • Monitor Performance: Monitor CPU, memory usage, disk I/O, and query performance under different loads.

7. Continuous Monitoring and Optimization

  • Continuously monitor the database performance, adjust resource allocations, and optimize queries as traffic patterns evolve.

Example Calculation

Let’s consider a website with the following metrics:

  • 100,000 daily active users
  • 5 page views per user per day
  • 10 queries per page view
  • 80% reads, 20% writes

Request Estimation:

  • Total Requests per Day: 100,000×5×10=5,000,000 queries/day
  • 86,400 seconds/day=24 hours/day×60 minutes/hour×60 seconds/minute
  • Average RPS: 5,000,000 / 86400 = 58 RPS

Resource Estimation:

  • CPU: Start with 4-8 vCPUs, given the expected RPS and query complexity.
  • Memory: Start with 16-32GB RAM, considering both working set size and data size.
  • Disk I/O: Ensure SSDs are used; monitor for bottlenecks and scale accordingly.
  • Storage: Provision based on data size (e.g., if database size is 100GB, start with 120-150GB of storage).

This estimation will need to be refined based on actual performance monitoring and further testing.