Choosing your Compute Add-on

Choosing the right Compute Add-on for your vector workload.

You have two options for scaling your vector workload:

Increase the size of your database. This guide will help you choose the right size for your workload.
Spread your workload across multiple databases. You can find more details about this approach in Engineering for Scale.

Dimensionality#

The number of dimensions in your embeddings is the most important factor in choosing the right Compute Add-on. In general, the lower the dimensionality the better the performance. We've provided guidance for some of the more common embedding dimensions below. For each benchmark, we used Vecs to create a collection, upload the embeddings to a single table, and create an inner-product index for the embedding column. We then ran a series of queries to measure the performance of different compute add-ons:

1536 Dimensions#

This benchmark uses the dbpedia-entities-openai-1M dataset, which contains 1,000,000 embeddings of text. Each embedding is 1536 dimensions created with the OpenAI Embeddings API.

Plan	Vectors	Lists	RPS	Latency Mean	Latency p95	RAM Usage	RAM
Free	20,000	40	135	0.372 sec	0.412 sec	1 GB + 200 Mb Swap	1 GB
Small	50,000	100	140	0.357 sec	0.398 sec	1.8 GB	2 GB
Medium	100,000	200	130	0.383 sec	0.446 sec	3.7 GB	4 GB
Large	250,000	500	130	0.378 sec	0.434 sec	7 GB	8 GB
XL	500,000	1000	235	0.213 sec	0.271 sec	13.5 GB	16 GB
2XL	1,000,000	2000	380	0.133 sec	0.236 sec	30 GB	32 GB
4XL	1,000,000	2000	720	0.068 sec	0.120 sec	35 GB	64 GB
8XL	1,000,000	2000	1250	0.039 sec	0.066 sec	38 GB	128 GB
12XL	1,000,000	2000	1600	0.030 sec	0.052 sec	41 GB	192 GB
16XL	1,000,000	2000	1790	0.029 sec	0.051 sec	45 GB	256 GB

For 1,000,000 vectors 10 probes results to precision of 0.91. And for 500,000 vectors and below 10 probes results to precision in the range of 0.95 - 0.99. To increase precision, you need to increase the number of probes.

960 Dimensions#

This benchmark uses the gist-960-angular dataset, which contains 1,000,000 embeddings of images. Each embedding is 960 dimensions.

Plan	Vectors	Lists	RPS	Latency Mean	Latency p95	RAM Usage	RAM
Free	30,000	30	75	0.065 sec	0.088 sec	1 GB + 100 Mb Swap	1 GB
Small	100,000	100	78	0.064 sec	0.092 sec	1.8 GB	2 GB
Medium	250,000	250	58	0.085 sec	0.129 sec	3.2 GB	4 GB
Large	500,000	500	55	0.088 sec	0.140 sec	5 GB	8 GB
XL	1,000,000	1000	110	0.046 sec	0.070 sec	14 GB	16 GB
2XL	1,000,000	1000	235	0.083 sec	0.136 sec	10 GB	32 GB
4XL	1,000,000	1000	420	0.071 sec	0.106 sec	11 GB	64 GB
8XL	1,000,000	1000	815	0.072 sec	0.106 sec	13 GB	128 GB
12XL	1,000,000	1000	1150	0.052 sec	0.078 sec	15.5 GB	192 GB
16XL	1,000,000	1000	1345	0.072 sec	0.106 sec	17.5 GB	256 GB

512 Dimensions#

This benchmark uses the GloVe Reddit comments dataset, which contains 1,623,397 embeddings of text. Each embedding is 512 dimensions. Random vectors were generated for queries.

Plan	Vectors	Lists	RPS	Latency Mean	Latency p95	RAM Usage	RAM
Free	100,000	100	250	0.395 sec	0.432 sec	1 GB + 300 Mb Swap	1 GB
Small	250,000	250	440	0.223 sec	0.250 sec	2 GB + 200 Mb Swap	2 GB
Medium	500,000	500	425	0.116 sec	0.143 sec	3.7 GB	4 GB
Large	1,000,000	1000	515	0.096 sec	0.116 sec	7.5 GB	8 GB
XL	1,623,397	1275	465	0.212 sec	0.272 sec	14 GB	16 GB
2XL	1,623,397	1275	1400	0.061 sec	0.075 sec	22 GB	32 GB
4XL	1,623,397	1275	1800	0.027 sec	0.043 sec	20 GB	64 GB
8XL	1,623,397	1275	2850	0.032 sec	0.049 sec	21 GB	128 GB
12XL	1,623,397	1275	3700	0.020 sec	0.036 sec	26 GB	192 GB
16XL	1,623,397	1275	3700	0.025 sec	0.042 sec	29 GB	256 GB

note

It is possible to upload more vectors to a single table if Memory allows it (for example, 4XL plan and higher for OpenAI embeddings). But it will affect the performance of the queries: RPS will be lower, and latency will be higher. Scaling should be almost linear, but it is recommended to benchmark your workload to find the optimal number of vectors per table and per database instance.

Performance tips#

There are various ways to improve your pgvector performance. Here are some tips:

Pre-warming your database#

It's useful to execute a few thousand “warm-up” queries before going into production. This helps help with RAM utilization. This can also help to determine that you've selected the right instance size for your workload.

Increase the number of lists#

You can increase the Requests per Second by increasing the number of lists. This also has an important caveat: building the index takes longer with more lists.

Check out more tips and the complete step-by-step guide in Going to Production for AI applications.

Benchmark Methodology#

We follow techniques outlined in the ANN Benchmarks methodology. A Python test runner is responsible for uploading the data, creating the index, and running the queries. The pgvector engine is implemented using vecs, a Python client for pgvector.

Each test is run for a minimum of 30-40 minutes. They include a series of experiments executed at different concurrency levels to measure the engine's performance under different load types. The results are then averaged.

As a general recommendation, we suggest using a concurrency level of 5 or more for most workloads and 30 or more for high-load workloads.

Edit this page on GitHub