Home

Choosing your Compute Add-on

Choosing the right Compute Add-on for your vector workload.

You have two options for scaling your vector workload:

  1. Increase the size of your database. This guide will help you choose the right size for your workload.
  2. Spread your workload across multiple databases. You can find more details about this approach in Engineering for Scale.

Dimensionality#

The number of dimensions in your embeddings is the most important factor in choosing the right Compute Add-on. In general, the lower the dimensionality the better the performance. We've provided guidance for some of the more common embedding dimensions below. For each benchmark, we used Vecs to create a collection, upload the embeddings to a single table, and create an inner-product index for the embedding column. We then ran a series of queries to measure the performance of different compute add-ons:

1536 Dimensions#

This benchmark uses the dbpedia-entities-openai-1M dataset, which contains 1,000,000 embeddings of text. Each embedding is 1536 dimensions created with the OpenAI Embeddings API.

PlanVectorsListsRPSLatency MeanLatency p95RAM UsageRAM
Free20,000401350.372 sec0.412 sec1 GB + 200 Mb Swap1 GB
Small50,0001001400.357 sec0.398 sec1.8 GB2 GB
Medium100,0002001300.383 sec0.446 sec3.7 GB4 GB
Large250,0005001300.378 sec0.434 sec7 GB8 GB
XL500,00010002350.213 sec0.271 sec13.5 GB16 GB
2XL1,000,00020003800.133 sec0.236 sec30 GB32 GB
4XL1,000,00020007200.068 sec0.120 sec35 GB64 GB
8XL1,000,000200012500.039 sec0.066 sec38 GB128 GB
12XL1,000,000200016000.030 sec0.052 sec41 GB192 GB
16XL1,000,000200017900.029 sec0.051 sec45 GB256 GB

For 1,000,000 vectors 10 probes results to precision of 0.91. And for 500,000 vectors and below 10 probes results to precision in the range of 0.95 - 0.99. To increase precision, you need to increase the number of probes.

multi database

960 Dimensions#

This benchmark uses the gist-960-angular dataset, which contains 1,000,000 embeddings of images. Each embedding is 960 dimensions.

PlanVectorsListsRPSLatency MeanLatency p95RAM UsageRAM
Free30,00030750.065 sec0.088 sec1 GB + 100 Mb Swap1 GB
Small100,000100780.064 sec0.092 sec1.8 GB2 GB
Medium250,000250580.085 sec0.129 sec3.2 GB4 GB
Large500,000500550.088 sec0.140 sec5 GB8 GB
XL1,000,00010001100.046 sec0.070 sec14 GB16 GB
2XL1,000,00010002350.083 sec0.136 sec10 GB32 GB
4XL1,000,00010004200.071 sec0.106 sec11 GB64 GB
8XL1,000,00010008150.072 sec0.106 sec13 GB128 GB
12XL1,000,000100011500.052 sec0.078 sec15.5 GB192 GB
16XL1,000,000100013450.072 sec0.106 sec17.5 GB256 GB

512 Dimensions#

This benchmark uses the GloVe Reddit comments dataset, which contains 1,623,397 embeddings of text. Each embedding is 512 dimensions. Random vectors were generated for queries.

PlanVectorsListsRPSLatency MeanLatency p95RAM UsageRAM
Free100,0001002500.395 sec0.432 sec1 GB + 300 Mb Swap1 GB
Small250,0002504400.223 sec0.250 sec2 GB + 200 Mb Swap2 GB
Medium500,0005004250.116 sec0.143 sec3.7 GB4 GB
Large1,000,00010005150.096 sec0.116 sec7.5 GB8 GB
XL1,623,39712754650.212 sec0.272 sec14 GB16 GB
2XL1,623,397127514000.061 sec0.075 sec22 GB32 GB
4XL1,623,397127518000.027 sec0.043 sec20 GB64 GB
8XL1,623,397127528500.032 sec0.049 sec21 GB128 GB
12XL1,623,397127537000.020 sec0.036 sec26 GB192 GB
16XL1,623,397127537000.025 sec0.042 sec29 GB256 GB

note

It is possible to upload more vectors to a single table if Memory allows it (for example, 4XL plan and higher for OpenAI embeddings). But it will affect the performance of the queries: RPS will be lower, and latency will be higher. Scaling should be almost linear, but it is recommended to benchmark your workload to find the optimal number of vectors per table and per database instance.

Performance tips#

There are various ways to improve your pgvector performance. Here are some tips:

Pre-warming your database#

It's useful to execute a few thousand “warm-up” queries before going into production. This helps help with RAM utilization. This can also help to determine that you've selected the right instance size for your workload.

Increase the number of lists#

You can increase the Requests per Second by increasing the number of lists. This also has an important caveat: building the index takes longer with more lists.

multi database

Check out more tips and the complete step-by-step guide in Going to Production for AI applications.

Benchmark Methodology#

We follow techniques outlined in the ANN Benchmarks methodology. A Python test runner is responsible for uploading the data, creating the index, and running the queries. The pgvector engine is implemented using vecs, a Python client for pgvector.

multi database

Each test is run for a minimum of 30-40 minutes. They include a series of experiments executed at different concurrency levels to measure the engine's performance under different load types. The results are then averaged.

As a general recommendation, we suggest using a concurrency level of 5 or more for most workloads and 30 or more for high-load workloads.