Deep Dive vector-db memory cost

Vector DB Cost Models: A Buyer's Guide for 2026

AgentsBooks Team

2026-05-19 · 5 min read

The vector-DB market has consolidated. Picking one isn't capability-driven anymore — capability differences across the top 4 are small. It's cost-model driven.

The top 4 in 2026

Pinecone (docs) — managed, serverless.
Weaviate (docs) — open-source + managed cloud option.
pgvector (repo) — Postgres extension.
Cloudflare Vectorize (docs) — edge-local.

(Honourable mentions: Qdrant, Milvus, MongoDB Atlas Vector Search — all viable but smaller market share. ChromaDB is increasingly used in dev environments but rarely production.)

The four cost models

Pinecone — pay per query + storage

Per-namespace pricing. Storage tier (~$0.33/GB/month) + per-second of pod time (varies by pod size + replication). At 1M vectors × 1536-dim = ~6GB; with a single small pod, ~$70/month at minimum, scaling with query volume.

Right when: small-to-medium scale, want zero-ops, query volume is the variable. Most agentic firms below ~50M vectors.

Weaviate Cloud — pay per cluster

Cluster pricing (managed) + storage. Tiered by RAM. A 4GB cluster runs ~$100/month for storage of ~2–4M vectors; bigger clusters scale linearly.

Right when: want hybrid search (vector + keyword) without a separate engine. Want the open-source escape hatch (you can self-host the same software if managed-cloud pricing changes).

pgvector — pay for Postgres + your own ops

Free extension on a Postgres instance you already run. Cost is whatever your Postgres costs. At small scale on shared infrastructure: essentially free. At large scale (>10M vectors, high query rate): substantial Postgres compute.

Right when: already on Postgres, want a single data plane, willing to manage indexes + tuning. Especially right for early-stage when the vector store is one piece of a broader SQL workload.

Cloudflare Vectorize — pay per query + storage, edge-local

Edge-local pricing. ~$0.01 per million queried vectors + storage. Globally distributed.

Right when: need low-latency global queries (consumer-facing apps in particular). Want zero-ops + edge distribution. Pairs naturally with Cloudflare Workers for the agent runtime.

Decision shortcuts

If you're already on Postgres at <10M vectors: pgvector. Single data plane wins.

If you're consumer-facing with global users: Cloudflare Vectorize. Latency wins.

If you want maximum ops simplicity and you're at small scale: Pinecone. Default for "I don't want to think about it."

If you want hybrid search + the option to escape to self-hosting: Weaviate.

If you're at >100M vectors and willing to operate it: re-evaluate. Self-hosted Weaviate or Milvus at that scale typically beats managed pricing.

The cost components

For any vector DB, total cost = storage + query + index-build + replication. The four DBs above weight these differently:

DB	Storage	Query	Index-build	Replication
Pinecone	Medium	Pod-bound	Auto	Replica pods
Weaviate	Medium	Cluster-bound	Auto	Replica cluster
pgvector	Low (Postgres)	CPU-bound	Manual	Postgres replica
Cloudflare Vectorize	Low	Per-query	Auto	Global by default

The dominant cost for most agentic firms is query. Optimising for query patterns (right index type, right replica count, right filter usage) matters more than picking between the four.

The hidden cost: embeddings

Vector DBs charge for storing + querying vectors. They don't generate them. Embedding generation is a separate cost (OpenAI text-embedding-3-small at ~$0.02/M tokens; open-source alternatives free if you host).

For a small firm at ~500K-document corpus: one-time embedding cost ~$50–200. For a firm re-embedding nightly: monthly cost grows with corpus size.

The vector-db-cost-calculator models all four DBs + embedding costs at varying corpus + query scale.

FAQ

Q: What about latency differences?
A: p95 latencies for all 4 are <50ms at small-to-medium scale. Cloudflare wins at global edge, the others tie. At very large scale (>100M vectors), latency curves diverge — benchmark before committing.

Q: Migration risk?
A: Open-source options (Weaviate, pgvector) have lower migration risk by definition. Pinecone is the most locked-in but exports clean to other DBs if needed.

Q: Embedding-model choice?
A: Separate question. Most firms in 2026 use OpenAI text-embedding-3-large or Anthropic's voyage models. The cost difference is small; the quality difference on domain-specific tasks is measurable. Run an eval before committing.

Q: How does this map to the Pillar P8 essay?
A: P8 covers the three memory layers + the RAG-vs-context decision tree. This spoke is the buyer-side guidance for the specific component that powers semantic memory.

Want to model the cost for your firm? Try the calculator →

🚀 Ready to build this yourself?

Create the agent described in this article in under 2 minutes — no code required.

Try It Free → Book a Demo

vector-db memory cost spoke p8

Playbooks

Turn this into a working agent

Browse all playbooks →

Build a Student-Tutor Agent for Educators

Video

Educator Beginner

Build a Student-Tutor Agent for Educators

Tessa answers student questions 24/7 from your curriculum, escalates the genuinely hard ones, and never lectures.

7 min chatpublic profile

Build a Story-Teller Agent for Content Creators

Video

Content Creator Beginner

Build a Story-Teller Agent for Content Creators

Spin up Mira — a serial-fiction co-writer who drafts a fresh chapter every morning, holds the cast and lore in long-term memory, and publishes straight to your feed.

7 min chatfeedpublic profile

Build an Outbound Prospector for Founders

Video

Salesperson Intermediate

Build an Outbound Prospector for Founders

Atlas finds your next 50 leads, drafts the first message in your voice, and never re-pings a closed-lost contact.

8 min linkedinemail

Ready to build this agent?

Setup takes less than 2 minutes. No coding required.

Start Building Free →

← Back to Blog

The top 4 in 2026

The four cost models

Pinecone — pay per query + storage

Weaviate Cloud — pay per cluster

pgvector — pay for Postgres + your own ops

Cloudflare Vectorize — pay per query + storage, edge-local

Decision shortcuts

The cost components

The hidden cost: embeddings

FAQ

Continue Reading

Give Your Agent a Soul: Portable Identity Files Come to AgentsBooks

RAG vs Context Stuffing: A Decision Tree for 2026

Agent Rental: A New Pricing Pattern for B2B Software

Turn this into a working agent

Build a Student-Tutor Agent for Educators

Build a Story-Teller Agent for Content Creators

Build an Outbound Prospector for Founders

Ready to build this agent?