Build room-based messaging with ordering guarantees and reconnect resilience.
Quick take: Use WebSocket for active rooms and recover from snapshot + cursor on reconnect. / Sequence IDs prevent out-of-order rendering.
Production notes: Persist send queue locally during transient disconnect. / Throttle typing indicators and presence fanout.
TL;DR
Use sequenced events over WebSocket for active rooms, plus cursor-based history and deterministic reconnect.
Steel thread
Open room, send one message, receive echo and teammate reply, then reload and replay latest history window.
C - Collect
- What ordering guarantees are required per room?
- Is message edit/delete in scope for V1?
- Which accessibility requirements apply for live announcements?
C - Component structure
RoomPage coordinates connection lifecycle.
MessageList handles virtualization and anchor positioning.
Composer owns input and optimistic send queue.
D - Data modeling
- Message entities keyed by
messageId.
- Per-room ordered index keyed by monotonic
sequence.
- Pending outbound map keyed by client-generated IDs.
A - API design
GET /rooms/:id/messages?cursor=... for history slices.
- WebSocket
message.send, message.ack, message.event.
- Reconnect endpoint returns snapshot and latest sequence token.
O - Optimization
- Coalesce typing and presence events to reduce chatter.
- Virtualize long histories after the latest 100 messages.
- Announce only high-signal live updates for screen-reader users.
Pattern links
realtime-transport-sse-websocket-long-poll to select active transport.
optimistic-ui-rollback for pending sends and retry/failed states.
pagination-offset-cursor-infinite for deterministic history retrieval.
Failure modes
- Sequence gaps appear after reconnect and messages render out of order.
- Duplicate outbound sends after retry produce repeated bubbles.
- Aggressive live-region updates interrupt screen-reader workflow.
Testing checklist
Testing + Accessibility Rubric
| Category | Level | Requirement | Done When |
|---|
| functional | unit | Incoming events respect sequence and dedupe semantics. | Out-of-order, duplicate, and missing event tests are all covered. |
| functional | integration | Reconnect path correctly hydrates snapshot and applies deltas. | Reconnect test proves no duplicate rows or missing updates. |
| a11y | integration | Live updates are announced without overwhelming screen readers. | Announcement batching and wording pass manual SR checks. |
| a11y | e2e | Focus remains stable when real-time updates occur. | Keyboard-only user can continue primary task uninterrupted. |
Explain it clearly
- Short version: room connection, sequence ordering, reconnect from snapshot.
- Longer version: transport fallback, pending queue semantics, and a11y announcements.
Production hardening notes
- Track reconnect success rate and sequence-gap recovery latency.
- Canary new transport behaviors by region before global rollout.