Companies building SaaS products for distributed teams (real-time collaboration platforms, async-first communication tools, virtual workspace software, and employee experience platforms) face a shared infrastructure challenge: real-time presence and messaging at scale require WebSocket connection management across millions of concurrent users; document co-editing requires operational transformation or CRDT-based conflict resolution; and enterprise customers demand SSO integration, SCIM provisioning, GDPR data residency controls, and SOC 2 compliance before they will deploy to their workforce. Scrums.com builds dedicated engineering teams for product companies building the tools that distributed teams depend on daily.
Remote Work App Development
Build remote work platforms with Scrums.com. Teams for real-time collaboration, WebSocket presence, SCIM provisioning, and SOC 2. Deploy in 21 days.
Real-Time Collaboration Infrastructure
Real-time collaboration at scale requires a pub-sub architecture where WebSocket connections from individual clients connect to stateless gateway servers that subscribe to a shared message broker (Redis Pub/Sub, NATS, or Kafka depending on throughput requirements). This decouples the connection layer from the application logic layer, allowing horizontal scaling of gateway servers without sticky sessions. Each workspace, channel, or document is a pub-sub topic; subscriptions are managed by the gateway server based on the client's authorised resource set.
Presence systems (showing which users are online, typing, or active) are one of the harder scalability problems in collaboration platforms. The correct approach is an in-memory presence store (Redis sorted set or hash keyed by workspace_id with member = user_id and value = last heartbeat timestamp) updated via periodic heartbeats from the client every 15-30 seconds. Presence consumers read from the in-memory store; stale entries with heartbeats older than the TTL are treated as offline. Heartbeat writes should be batched at the gateway server (collect N heartbeats, write one Redis pipeline per interval) to avoid per-connection write amplification at scale.
Document co-editing requires a conflict resolution strategy for concurrent edits. The two established approaches are Operational Transformation (OT, used by Google Docs, complex to implement correctly for all edge cases) and Conflict-Free Replicated Data Types (CRDTs, used by Figma and Notion, simpler correctness guarantees, larger document state footprint). For most SaaS collaboration tools, using a battle-tested CRDT library such as Yjs or Automerge is preferable to implementing OT from scratch. The CRDT state for each document is stored server-side as the authoritative snapshot, with client updates transmitted as CRDT delta operations, applied server-side, and then broadcast to other connected clients.
Async Communication and Notification Routing
Async-first communication platforms (async video messaging, thread-based discussions, document commenting) require a notification routing layer that handles four delivery modes: real-time push for active users (WebSocket push to connected clients), push notifications for inactive mobile users (APNs for iOS, FCM for Android), email digest for users inactive for more than N hours (configurable per user), and an in-app notification inbox that persists unread notifications for users who come online after the event. The routing decision for each notification is made at publish time based on the recipient's current presence state and their stored notification preferences.
Email digest delivery requires a batch aggregation pipeline: rather than sending one email per notification event, the digest collects all undelivered notifications for a user over a configurable window and sends a single summarised email. The aggregation pipeline must be idempotent: if the digest job re-runs due to failure, it must not re-send already-delivered digests. The authoritative record of digest delivery is a digest_delivery event written to the notification log at send time; the digest job skips users with a delivered digest within the current window.
Async video message delivery requires: client-side recording (screen, audio, and optional camera), upload to a cloud storage bucket via a pre-signed upload URL (never proxy large binary uploads through the application server), server-side transcoding (AWS Elastic Transcoder, Mux, or Cloudinary), thumbnail generation, and transcript generation (AWS Transcribe or OpenAI Whisper). The video record references the original uploaded file, the transcoded delivery formats (HLS for adaptive bitrate streaming), the thumbnail URL, and the transcript blob, with status fields tracking each stage of the processing pipeline. Remote work platform builds are one of the specialist disciplines within our mobile app development service.
Enterprise Identity, SSO, and Multi-Tenancy
Enterprise SaaS platforms for distributed teams live or die on their SSO and directory integration story. SAML 2.0 SSO is the baseline requirement for enterprise buyers (Okta, Azure AD, Google Workspace, Ping Identity). OIDC is increasingly preferred by modern enterprise IT teams. The SP-initiated SAML flow (user clicks Log in with SSO on your app) is straightforward; the IdP-initiated flow (user clicks an app tile in Okta) requires careful handling of the SAML response to prevent CSRF attacks: validate the InResponseTo field or implement a separate nonce mechanism for IdP-initiated flows.
SCIM 2.0 (System for Cross-domain Identity Management) is the enterprise standard for automated user provisioning and deprovisioning. A SCIM endpoint on your platform allows the enterprise's IdP to create users, update user attributes (name, department, manager), and, critically, deprovision users when they leave the organisation (deactivation, not deletion, to preserve content and audit history). SCIM implementation requires: a /Users endpoint (GET, POST, PUT, PATCH with RFC 7644-compliant filter queries), a /Groups endpoint for mapping IdP groups to workspace roles, and correct handling of the SCIM PATCH operation (sparse updates with an operations array, not full replacement). Incorrect PATCH handling is the most common SCIM implementation bug caught during enterprise IT validation.
Workspace multi-tenancy requires per-tenant data isolation, configurable workspace settings (allowed SSO domains, invite permissions, content retention policies, feature flag overrides), and an admin console for workspace administrators to manage members, roles, and integrations without requiring platform operator involvement. The workspace entity is the top-level tenant boundary in the data model; all user-generated content (messages, documents, files) carries a workspace_id foreign key enforced by row-level security at the database layer.
Scrums.com deploys dedicated engineering teams experienced in WebSocket infrastructure, enterprise SSO, and SCIM provisioning; start a conversation to discuss your remote work platform requirements.
Analytics, Data Residency, and Compliance
Enterprise buyers of remote work SaaS platforms require compliance documentation before purchase. SOC 2 Type II (access controls, availability SLA evidence, change management audit trail, incident response) is the minimum bar for selling to mid-market and enterprise customers. GDPR compliance for a collaboration platform requires: a DPA template for EU customers, documented data flows for all subprocessors, data residency controls (EU customers' workspace data must remain in EU-region data centres), right-to-erasure workflows cascading deletion across messages, documents, file attachments, and search indexes, and DSAR export pipelines that produce a structured archive of all user-generated content for a given user.
Workspace analytics (showing administrators which teams are active, which integrations are in use, which features have adoption, and where engagement drops) are a product differentiator for enterprise sales. The analytics pipeline must be separate from the user-visible workspace content: user activity events (message sent, document opened, meeting joined, feature clicked) are streamed to a separate analytics store (ClickHouse or BigQuery) with PII pseudonymisation applied at ingestion. Workspace-level aggregate metrics (messages per day, active users per week, feature adoption rates) are pre-computed and served via a dedicated analytics API, never via live queries over the operational database.
Audit logging is a compliance requirement for enterprise customers and a legal hold mechanism. An immutable audit log must record: authentication events (login, logout, failed login, SSO assertion), content operations (message edit/delete, document share, file download), administrative operations (role change, member add/remove, SSO configuration change), and integration events (OAuth grant, API key creation). The audit log must be append-only with no delete API, exportable via SIEM integration or periodic export to customer-owned storage, and retained for a configurable period (typically 1-7 years depending on the enterprise contract).
Engineering teams building remote work platforms with Scrums.com typically include backend engineers with WebSocket and real-time systems experience, and platform engineers for enterprise SSO/SCIM. View team composition options or explore the FinTech software context for compliance-heavy remote work infrastructure.
Frequently Asked Questions
How do you architect WebSocket presence at scale without database polling?
The correct approach is an in-memory presence store in Redis. Each connected client sends a heartbeat (a lightweight message) to the gateway server every 15-30 seconds. The gateway server batches heartbeats and writes them to a Redis hash or sorted set keyed by workspace_id, with user_id as the field or member and the last heartbeat timestamp as the value. Presence readers compute online/offline status by comparing the stored timestamp against the current time minus the TTL. The gateway server also writes an offline entry when a WebSocket connection closes (normal disconnect), providing instant offline signalling without waiting for the TTL to expire. At scale, the key optimisation is batching: a gateway server handling 10,000 connections should write one Redis pipeline call per interval containing all batched heartbeats, not one call per connection.
Should we use OT or CRDTs for document co-editing?
For most SaaS collaboration tools, CRDTs are the better choice. Operational Transformation has been implemented correctly in production by a small number of teams (Google, Apache Wave) and the edge cases in concurrent operation transformation are subtle and difficult to test exhaustively. CRDTs (specifically Yjs or Automerge) provide mathematical correctness guarantees for convergence and have mature open-source implementations with active maintenance. The trade-off is document state size: CRDT documents retain tombstones for deleted content, so very long-lived collaborative documents grow larger over time. The standard mitigation is periodic document compaction (producing a new CRDT snapshot that discards unreachable tombstones), run as a background job when the document has no active editing sessions.
How do you implement SCIM 2.0 so enterprise IT teams can automate user lifecycle management?
A compliant SCIM 2.0 endpoint requires: a /Users endpoint supporting GET (list with filter), GET by ID, POST (create), PUT (full replace), and PATCH (partial update with RFC 7644 operations array); a /Groups endpoint for group-to-role mapping; a /ServiceProviderConfig endpoint declaring your supported SCIM features; and Bearer token authentication (a long-lived token generated per enterprise customer, not OAuth). The most common implementation errors are: (1) not handling the PATCH operations array correctly: each operation has op (add/replace/remove), path, and value, and the server must process them in order; (2) not returning the correct resource location in the Location header on POST; (3) deprovisioning users with a DELETE rather than a PATCH setting active=false, which destroys content history and breaks audit trails. Test against Okta's SCIM tester tool before claiming enterprise readiness.
How should the notification routing system decide between real-time push, mobile push, and email digest?
The routing decision is a function of two inputs: the recipient's current presence state and their notification preference for the event type. Presence state is read from the Redis presence store at notification dispatch time (not at event creation time, to reflect the recipient's current state). If the recipient is online (heartbeat within TTL), dispatch via WebSocket push and mark the notification as delivered. If the recipient is offline and has a mobile device registered, dispatch via APNs/FCM push and mark as push-delivered. If neither condition is met, add the notification to the user's digest queue for the next digest window. Each notification must have a unique idempotency key (notification_id) checked before dispatch to prevent duplicate delivery if the routing job re-processes an event.
What does SOC 2 Type II compliance require from a remote work SaaS platform's engineering team?
SOC 2 Type II requires evidence that controls were operative over a continuous observation period (typically 6-12 months), not just that they exist at a point in time. The controls most relevant to a collaboration platform are: access control (role-based access with quarterly access reviews, MFA enforcement for administrative access, off-boarding process with documented evidence), change management (all production changes deployed via a reviewed and approved process: git PR with approval, CI/CD pipeline logs as evidence), availability (uptime monitoring with SLA measurement, incident post-mortems for downtime events), and confidentiality (encryption at rest and in transit, key management documentation, data classification policy). The engineering team's contribution is the audit trail: every deployment, access grant, configuration change, and security incident must produce a timestamped log entry that the auditor can review as evidence.
Build your Remote Work app with Scrums.com
Build remote work platforms with Scrums.com. Teams for real-time collaboration, WebSocket presence, SCIM provisioning, and SOC 2. Deploy in 21 days.
DEDICATED TEAMS · OPERATED DELIVERY · FIRST SPRINT IN 21 DAYS