Whiteboard Collaboration Architecture

Build a collaborative whiteboard with low-latency drawing sync and conflict-resilient state.

Quick take: Use local-first drawing with server reconciliation. / Model stroke stream with snapshot + delta recovery.

Production notes: Persist periodic snapshots for fast room hydration. / Backpressure high-frequency events on constrained networks.

TL;DR

Render strokes locally first, sync deltas over low-latency transport, and recover state from snapshot plus operation log.

Steel thread

Draw one stroke, broadcast to collaborator, undo action, and recover identical canvas after reconnect.

C - Collect

  • What maximum collaboration room size should V1 support?
  • Is strict ordering needed across all tool types?
  • Which non-pointer interactions are required for accessibility?

C - Component structure

  • WhiteboardPage coordinates room and connection state.
  • CanvasStage renders local and remote strokes.
  • ToolPanel manages drawing mode and undo/redo commands.

D - Data modeling

  • Stroke, Layer, Operation, and Snapshot entities.
  • Keep append-only operation log with sequence numbers.
  • Maintain per-user cursors/presence metadata separately from drawing model.

A - API design

  • GET /boards/:id/snapshot and GET /boards/:id/ops?since=...
  • WebSocket event channels for stroke.create, operation.undo, presence.update
  • POST /boards/:id/snapshot for periodic compaction.

O - Optimization

  • Batch points into stroke segments to reduce network overhead.
  • Use dirty-rectangle rendering for large canvases.
  • Provide keyboard shortcuts and text alternatives for tool changes.
  • realtime-transport-sse-websocket-long-poll for event delivery trade-offs.
  • optimistic-ui-rollback for local-first drawing and conflict handling.
  • a11y-testing-strategy-dynamic-uis for non-visual collaboration support.

Failure modes

  • Network jitter reorders operations and distorts final stroke shape.
  • Snapshot compaction drops undo history unexpectedly.
  • Presence and update announcements overwhelm assistive technologies.

Testing checklist

Testing + Accessibility Rubric

CategoryLevelRequirementDone When
functionalunitIncoming events respect sequence and dedupe semantics.Out-of-order, duplicate, and missing event tests are all covered.
functionalintegrationReconnect path correctly hydrates snapshot and applies deltas.Reconnect test proves no duplicate rows or missing updates.
a11yintegrationLive updates are announced without overwhelming screen readers.Announcement batching and wording pass manual SR checks.
a11ye2eFocus remains stable when real-time updates occur.Keyboard-only user can continue primary task uninterrupted.

Explain it clearly

  • Short version: local-first canvas, sequenced ops, reconnect via snapshot + delta.
  • Longer version: conflict strategy, compaction policy, and collaboration a11y flow.

Production hardening notes

  • Track op delivery latency, sequence-gap events, and reconnect time-to-consistency.
  • Gate high-frequency collaboration features for low-end devices.

Related Patterns