The following are just notes for self while building

Architecting

Tech stack choice (Tentative)

DB
- Convex (Postgres) : OSS (both selfhostable and DBaaS (offers much more than just DB)) and Built on top of planetscale postgres
- Vector DB : Convex
Backend
- Convex : Convex is a reactive backend/database where server logic, data schema and API surface live together as TypeScript functions. So no need for a backend yet, will need if workload becomes waayy too much (billions of vectors), not needed rn ig…
- I will also not have to deal with websockets since convex provides real time sync using either websockets or optimized HTTP polling.
- Expose the required API endpoints via Convex HTTP actions
- tRPC if needed
Auth (not yet decided)
- Not setting up my own auth
- Convex Auth : Inbuilt auth, easiest (Using this)
- BetterAuth : OSS, convex integration is in very alpha stage right now, dont know if ill run into issues
- Clerk : Managed, easy compared to other options, convex has nice integration with clerk
Backend Host
- Vercel : love fluid compute
- Or Fly.io
- Or Railwiy
DB Host
- Self Host convex
- Use convex cloud
Analytics (if)
- PostHog : OSS
Captcha / Ratelimiting (if)
- Vercel Bot ID
FrontEnd
- ReactJS + Vite
Deployment (Will handle in the end)
- Docker compose

Keeping scope limited to 1:1 conversations (no groups) and text only messages (no file uploads) although the design allows easy implementation

Schema

4 entities : User, Messages, Conversation and MessageEmbeddings

export default defineSchema({
  ...authTables,
 
  users: defineTable({
    name: v.string(),
    email: v.string(),
  }).index("by_email", ["email"]),
 
  conversations: defineTable({
    participants: v.array(v.id("users")),
    type: v.union(v.literal("direct"), v.literal("group")),
    lastMessageTime: v.optional(v.number()),
  })
    .index("by_participants", ["participants"])
    .index("by_last_activity", ["lastMessageTime"]),
 
  conversationParticipants: defineTable({
    conversationId: v.id("conversations"),
    userId: v.id("users"),
    joinedAt: v.number(),
  })
    .index("by_userId_and_joinedAt", ["userId", "joinedAt"])
    .index("by_conversationId_and_userId", ["conversationId", "userId"]),
 
  messages: defineTable({
    senderId: v.id("users"),
    conversationId: v.id("conversations"),
    body: v.string(),
  })
    .index("by_conversation_time", ["conversationId"]),
 
  messageEmbeddings: defineTable({
    messageId: v.id("messages"),
    conversationId: v.id("conversations"), // used for filtering by conversation access before vector search
    embedding: v.array(v.float64()),
    senderId: v.id("users"),
  })
    .vectorIndex("by_embedding", {
    vectorField: "embedding",
    dimensions: 1536,
    filterFields: ["conversationId"],
    staged: false,
  }),
});

Both embedding generation (staged) and vector searches are implemented as Convex actions (asynchronous)

Backend

Most query/mutation/action start with requireAuth()
User-facing mutations like sendMessage are fast - Expensive stuff (in terms of time) (embedding generation etc.) happens async in bkg.
All the heavy queries use indexes - user lookup by email, conversations by participants, messages by conversation+time.
Vector searches filtered by conversation for security + speed.
Workflow : sendMessage → quick insert + schedule background job → handleSentMessage workflow → update conversation activity + generate embeddings.

Vector

Chose convex vector storage, since everything is convex, this also convex, it currently efficiently supports vectors in the order of millions, more than enough for school chats + we are never vector searching through the whole vector db.
Used cohere API and its typescript SDK for generating embeddings at time of storing messages and vector ann (cosine similarity) search.
Vector search and storage pretty much follow the convex docs.
Scaling : Convex is scalable, we could shard the vectordb but Convex doesn’t seem to support sharding. 2 options, either use a seperate vectordb like qdrant, or can hack it and logically split data across multiple Convex tables/deployments.

Auth Setup

Convex password auth setup, only email & password (has many more options like oauth, magic links etc… can be easily integrated)
Convex Auth issues JWT access tokens and refresh tokens (one refresh token used once. On reissue of JWT, another pair of JWT and refresh is sent). The access token is sent over the WebSocket connection to the backend for authentication. Are stored in localStorage for persistence.
Wrapped main.tsx (frontend) in ConvexAuthProvider
Added authtables in schema (authtables is provided by convex)
Added auth checks in queries and mutations

All functions

Analysis

This architecture, although is very abstracted away, is similar to a backend connected to postgres. The abstraction has very minimal performance overhead compared to if we built an express backend. But this comes with a bunch of QOL improvements, esp for an app like this which requires real time features.

Break point estimation

The most critical part of the app : Database Connection & Query Performance.

Primary Limiting Factor: CPU

process data, auth check, DB queries, websocket push etc a lot of work

Let’s assume x ms of CPU time per message consumed
Since 2vCPU ⇒ 2 * 1000 = 2000 cpu time (ms) / second we have
That means our server can serve 2000/x messages a second
A decent assumption might be the CPU takes 5ms per message ⇒ 400 messages a second
Assuming 1 person sends 2 messages a sec. ⇒ 200 concurrent users

Other parts

Network : Datacenters have fast internet, even if it is a very conservative 100Mbps, and assuming one message with all protocol overhead is about 1KB, thats a lot of network bandwidth, other parts will bottleneck early.
Disk : Assume 3000 IOPS and 2 IOPS/message: 3000 ÷ 2 = 1500 msg/sec
Sockets : depends on system how many open files it allows : ❯ ulimit -n = 1024

Bottleneck Identification

Tools → Prometheus + Grafana: To collect and visualize metrics like request duration, memory usage, and CPU utilization

Application profiling tools also might help reveal bottleneck functions (more cpu time spent on them)

Considering for our app CPU is probably the best guess for bottlneck

Monitor

cpu_usage_percent for all cores
load_average_1m/5m/15m
iowait /proc/stat exposes these 3

Scaling

Horizontal scaling is the straight and easy solution.

Could go either serverless(might not be good for long lived websockets) or servers with load balancers (Cool paper : https://www.usenix.org/conference/nsdi24/presentation/wydrowski)

On application logic layer, heavy loads are already async.

A side note about vector searches : They are the main bottleneck right now. After each message, embedding is generated, however this is not ideal

It’s impractical for any user to search for a message immediately after sending the message
Batch embedding requests are cheaper Therefore, do not send embedding requests immedietly.

Shivang Jhalani

Naviagte

Yet another chat app