Chat Application Architecture

Build room-based messaging with ordering guarantees and reconnect resilience.

Quick take: Use WebSocket for active rooms and recover from snapshot + cursor on reconnect. / Sequence IDs prevent out-of-order rendering.

Production notes: Persist send queue locally during transient disconnect. / Throttle typing indicators and presence fanout.

TL;DR

Use sequenced events over WebSocket for active rooms, plus cursor-based history and deterministic reconnect.

Steel thread

Open room, send one message, receive echo and teammate reply, then reload and replay latest history window.

C - Collect

  • What ordering guarantees are required per room?
  • Is message edit/delete in scope for V1?
  • Which accessibility requirements apply for live announcements?

C - Component structure

  • RoomPage coordinates connection lifecycle.
  • MessageList handles virtualization and anchor positioning.
  • Composer owns input and optimistic send queue.

D - Data modeling

  • Message entities keyed by messageId.
  • Per-room ordered index keyed by monotonic sequence.
  • Pending outbound map keyed by client-generated IDs.

A - API design

  • GET /rooms/:id/messages?cursor=... for history slices.
  • WebSocket message.send, message.ack, message.event.
  • Reconnect endpoint returns snapshot and latest sequence token.

O - Optimization

  • Coalesce typing and presence events to reduce chatter.
  • Virtualize long histories after the latest 100 messages.
  • Announce only high-signal live updates for screen-reader users.
  • realtime-transport-sse-websocket-long-poll to select active transport.
  • optimistic-ui-rollback for pending sends and retry/failed states.
  • pagination-offset-cursor-infinite for deterministic history retrieval.

Failure modes

  • Sequence gaps appear after reconnect and messages render out of order.
  • Duplicate outbound sends after retry produce repeated bubbles.
  • Aggressive live-region updates interrupt screen-reader workflow.

Testing checklist

Testing + Accessibility Rubric

CategoryLevelRequirementDone When
functionalunitIncoming events respect sequence and dedupe semantics.Out-of-order, duplicate, and missing event tests are all covered.
functionalintegrationReconnect path correctly hydrates snapshot and applies deltas.Reconnect test proves no duplicate rows or missing updates.
a11yintegrationLive updates are announced without overwhelming screen readers.Announcement batching and wording pass manual SR checks.
a11ye2eFocus remains stable when real-time updates occur.Keyboard-only user can continue primary task uninterrupted.

Explain it clearly

  • Short version: room connection, sequence ordering, reconnect from snapshot.
  • Longer version: transport fallback, pending queue semantics, and a11y announcements.

Production hardening notes

  • Track reconnect success rate and sequence-gap recovery latency.
  • Canary new transport behaviors by region before global rollout.

Related Patterns