The vector-DB market has consolidated. Picking one isn't capability-driven anymore — capability differences across the top 4 are small. It's cost-model driven.
The top 4 in 2026
- Pinecone (docs) — managed, serverless.
- Weaviate (docs) — open-source + managed cloud option.
- pgvector (repo) — Postgres extension.
- Cloudflare Vectorize (docs) — edge-local.
(Honourable mentions: Qdrant, Milvus, MongoDB Atlas Vector Search — all viable but smaller market share. ChromaDB is increasingly used in dev environments but rarely production.)
The four cost models
Pinecone — pay per query + storage
Per-namespace pricing. Storage tier (~$0.33/GB/month) + per-second of pod time (varies by pod size + replication). At 1M vectors × 1536-dim = ~6GB; with a single small pod, ~$70/month at minimum, scaling with query volume.
Right when: small-to-medium scale, want zero-ops, query volume is the variable. Most agentic firms below ~50M vectors.
Weaviate Cloud — pay per cluster
Cluster pricing (managed) + storage. Tiered by RAM. A 4GB cluster runs ~$100/month for storage of ~2–4M vectors; bigger clusters scale linearly.
Right when: want hybrid search (vector + keyword) without a separate engine. Want the open-source escape hatch (you can self-host the same software if managed-cloud pricing changes).
pgvector — pay for Postgres + your own ops
Free extension on a Postgres instance you already run. Cost is whatever your Postgres costs. At small scale on shared infrastructure: essentially free. At large scale (>10M vectors, high query rate): substantial Postgres compute.
Right when: already on Postgres, want a single data plane, willing to manage indexes + tuning. Especially right for early-stage when the vector store is one piece of a broader SQL workload.
Cloudflare Vectorize — pay per query + storage, edge-local
Edge-local pricing. ~$0.01 per million queried vectors + storage. Globally distributed.
Right when: need low-latency global queries (consumer-facing apps in particular). Want zero-ops + edge distribution. Pairs naturally with Cloudflare Workers for the agent runtime.
Decision shortcuts
If you're already on Postgres at <10M vectors: pgvector. Single data plane wins.
If you're consumer-facing with global users: Cloudflare Vectorize. Latency wins.
If you want maximum ops simplicity and you're at small scale: Pinecone. Default for "I don't want to think about it."
If you want hybrid search + the option to escape to self-hosting: Weaviate.
If you're at >100M vectors and willing to operate it: re-evaluate. Self-hosted Weaviate or Milvus at that scale typically beats managed pricing.
The cost components
For any vector DB, total cost = storage + query + index-build + replication. The four DBs above weight these differently:
| DB | Storage | Query | Index-build | Replication |
|---|---|---|---|---|
| Pinecone | Medium | Pod-bound | Auto | Replica pods |
| Weaviate | Medium | Cluster-bound | Auto | Replica cluster |
| pgvector | Low (Postgres) | CPU-bound | Manual | Postgres replica |
| Cloudflare Vectorize | Low | Per-query | Auto | Global by default |
The dominant cost for most agentic firms is query. Optimising for query patterns (right index type, right replica count, right filter usage) matters more than picking between the four.
The hidden cost: embeddings
Vector DBs charge for storing + querying vectors. They don't generate them. Embedding generation is a separate cost (OpenAI text-embedding-3-small at ~$0.02/M tokens; open-source alternatives free if you host).
For a small firm at ~500K-document corpus: one-time embedding cost ~$50–200. For a firm re-embedding nightly: monthly cost grows with corpus size.
The vector-db-cost-calculator models all four DBs + embedding costs at varying corpus + query scale.
FAQ
Q: What about latency differences?
A: p95 latencies for all 4 are <50ms at small-to-medium scale. Cloudflare wins at global edge, the others tie. At very large scale (>100M vectors), latency curves diverge — benchmark before committing.
Q: Migration risk?
A: Open-source options (Weaviate, pgvector) have lower migration risk by definition. Pinecone is the most locked-in but exports clean to other DBs if needed.
Q: Embedding-model choice?
A: Separate question. Most firms in 2026 use OpenAI text-embedding-3-large or Anthropic's voyage models. The cost difference is small; the quality difference on domain-specific tasks is measurable. Run an eval before committing.
Q: How does this map to the Pillar P8 essay?
A: P8 covers the three memory layers + the RAG-vs-context decision tree. This spoke is the buyer-side guidance for the specific component that powers semantic memory.
Want to model the cost for your firm? Try the calculator →