Automerge Migration

Date: 2025-01-05

Note: This is an exploration document. For the actual implementation plan, see 505-automerge-migration-plan.

This document explores migrating Minuta to Automerge, a mature CRDT library, instead of implementing custom CRDT primitives.

What is Automerge?

Automerge is a library of data structures for building collaborative applications. Key characteristics:

  • Conflict-free Replicated Data Type (CRDT) - automatic merge without conflicts
  • Cross-platform - Swift, JavaScript, Rust, WASM
  • Network-agnostic - works with any transport (WebSocket, Bluetooth, file sync)
  • Immutable state - each change produces a new document state
  • Change history - full audit trail, branching, time travel
  • Binary format - compact storage, efficient sync

Automerge-Swift

automerge-swift v0.6.1 provides:

  • Document - the core CRDT container
  • Built-in types: Text, List, Map, Counter
  • fork() / merge() for branching
  • Binary serialization (document.save() / Document(data))
  • Sync protocol for efficient delta exchange

Automerge-Repo-Swift

automerge-repo-swift extends the base library with:

  • Multi-document management
  • Pluggable storage providers
  • Pluggable network providers
  • Document discovery and sharing

Note: automerge-repo-swift is pre-release and API is not stable yet.


Custom CRDT vs Automerge

AspectCustom CRDT (503)Automerge
Implementation effort~2000 lines~200 lines (integration)
Battle-testedNoYes (years of production use)
Cross-platform syncManualBuilt-in (JS, Rust, Swift)
Storage formatJSON (readable)Binary (compact)
DebuggabilityHigh (JSON files)Lower (binary, need tools)
Text collaborationLWW onlyCharacter-level CRDT
DependenciesNoneautomerge-swift (~2MB)
FlexibilityFull controlAutomerge’s model
Sync protocolBuild yourselfBuilt-in, efficient
PerformanceUnknownOptimized (Rust core)

Document Model Design

Option A: Single Document (All Data)

One Automerge document contains all tags and records:

// Document structure
{
    "tags": {
        "<uuid>": { "name": "Work", "color": "#4A90D9", ... },
        "<uuid>": { "name": "Personal", "color": "#E57373", ... }
    },
    "records": {
        "<uuid>": { "startTime": ..., "endTime": ..., "tagId": ..., ... },
        "<uuid>": { ... }
    }
}

Pros:

  • Simple to implement
  • Atomic operations across tags and records
  • Single sync unit

Cons:

  • Entire document syncs on any change
  • Poor scalability (1000+ records = large doc)
  • No partial sync

Option B: Document Per Entity

Each tag and record is a separate Automerge document:

~/Documents/Minuta/
├── tags/
│   └── {uuid}.automerge          # One doc per tag
└── records/
    └── 2024/01/
        └── {uuid}.automerge      # One doc per record

Pros:

  • Granular sync (only changed docs transfer)
  • Better scalability
  • Parallel loading

Cons:

  • More files to manage
  • Cross-document references need care
  • Requires automerge-repo for multi-doc management

Option C: Hybrid (Recommended)

  • Tags: Single document (small, rarely changes)
  • Records: Document per month or per record
~/Documents/Minuta/
├── tags.automerge                # All tags in one doc
└── records/
    └── 2024/
        └── 01.automerge          # All records for Jan 2024

Pros:

  • Balanced granularity
  • Monthly documents keep size manageable (~100 records/month)
  • Simpler than per-record files

Data Structure Mapping

Current Tag → Automerge

// Current
public struct Tag {
    public let id: UUID
    public var name: String
    public var color: String
    public let createdAt: Date
    public var isArchived: Bool
}

// Automerge document structure
// Root is a Map with tag IDs as keys
doc.putObject(obj: .ROOT, key: tagId.uuidString, ty: .Map)
doc.put(obj: tagObj, key: "name", value: .String(name))
doc.put(obj: tagObj, key: "color", value: .String(color))
doc.put(obj: tagObj, key: "createdAt", value: .Timestamp(createdAt))
doc.put(obj: tagObj, key: "isArchived", value: .Boolean(isArchived))

Current TimeRecord → Automerge

// Current
public struct TimeRecord {
    public let id: UUID
    public var startTime: Date
    public var endTime: Date?
    public var tagId: UUID?
    public var comment: String?
    public var images: [String]
}

// Automerge document structure
let recordObj = doc.putObject(obj: .ROOT, key: recordId.uuidString, ty: .Map)
doc.put(obj: recordObj, key: "startTime", value: .Timestamp(startTime))
if let endTime = endTime {
    doc.put(obj: recordObj, key: "endTime", value: .Timestamp(endTime))
}
if let tagId = tagId {
    doc.put(obj: recordObj, key: "tagId", value: .String(tagId.uuidString))
}
if let comment = comment {
    // Use Automerge Text for collaborative editing
    let textObj = doc.putObject(obj: recordObj, key: "comment", ty: .Text)
    doc.spliceText(obj: textObj, start: 0, delete: 0, value: comment)
}
// Images as a List
let imagesObj = doc.putObject(obj: recordObj, key: "images", ty: .List)
for (i, image) in images.enumerated() {
    doc.insert(obj: imagesObj, index: UInt64(i), value: .String(image))
}

Wrapper Types

AutomergeTag

import Automerge

/// Wrapper to read/write Tag from Automerge document
struct AutomergeTag {
    let document: Document
    let objectId: ObjId

    var id: UUID {
        // ID is the key in parent map, stored externally
        fatalError("ID should be tracked by caller")
    }

    var name: String {
        get {
            guard case .Scalar(.String(let value)) = try? document.get(obj: objectId, key: "name") else {
                return ""
            }
            return value
        }
        set {
            try? document.put(obj: objectId, key: "name", value: .String(newValue))
        }
    }

    var color: String {
        get {
            guard case .Scalar(.String(let value)) = try? document.get(obj: objectId, key: "color") else {
                return "#808080"
            }
            return value
        }
        set {
            try? document.put(obj: objectId, key: "color", value: .String(newValue))
        }
    }

    var createdAt: Date {
        get {
            guard case .Scalar(.Timestamp(let value)) = try? document.get(obj: objectId, key: "createdAt") else {
                return Date()
            }
            // Automerge.Timestamp is milliseconds since epoch
            return Date(timeIntervalSince1970: Double(value) / 1000.0)
        }
        set {
            // Automerge.Timestamp is milliseconds since epoch
            try? document.put(obj: objectId, key: "createdAt", value: .Timestamp(Int64(newValue.timeIntervalSince1970 * 1000)))
        }
    }

    var isArchived: Bool {
        get {
            guard case .Scalar(.Boolean(let value)) = try? document.get(obj: objectId, key: "isArchived") else {
                return false
            }
            return value
        }
        set {
            try? document.put(obj: objectId, key: "isArchived", value: .Boolean(newValue))
        }
    }

    /// Convert to plain Tag for UI
    func asTag(id: UUID) -> Tag {
        Tag(
            id: id,
            name: name,
            color: color,
            createdAt: createdAt,
            isArchived: isArchived
        )
    }
}

AutomergeTimeRecord

import Automerge

/// Wrapper to read/write TimeRecord from Automerge document
struct AutomergeTimeRecord {
    let document: Document
    let objectId: ObjId

    var startTime: Date {
        get {
            guard case .Scalar(.Timestamp(let value)) = try? document.get(obj: objectId, key: "startTime") else {
                return Date()
            }
            // Automerge.Timestamp is milliseconds since epoch
            return Date(timeIntervalSince1970: Double(value) / 1000.0)
        }
        set {
            // Automerge.Timestamp is milliseconds since epoch
            try? document.put(obj: objectId, key: "startTime", value: .Timestamp(Int64(newValue.timeIntervalSince1970 * 1000)))
        }
    }

    var endTime: Date? {
        get {
            guard case .Scalar(.Timestamp(let value)) = try? document.get(obj: objectId, key: "endTime") else {
                return nil
            }
            // Automerge.Timestamp is milliseconds since epoch
            return Date(timeIntervalSince1970: Double(value) / 1000.0)
        }
        set {
            if let date = newValue {
                // Automerge.Timestamp is milliseconds since epoch
                try? document.put(obj: objectId, key: "endTime", value: .Timestamp(Int64(date.timeIntervalSince1970 * 1000)))
            } else {
                try? document.delete(obj: objectId, key: "endTime")
            }
        }
    }

    var tagId: UUID? {
        get {
            guard case .Scalar(.String(let value)) = try? document.get(obj: objectId, key: "tagId") else {
                return nil
            }
            return UUID(uuidString: value)
        }
        set {
            if let id = newValue {
                try? document.put(obj: objectId, key: "tagId", value: .String(id.uuidString))
            } else {
                try? document.delete(obj: objectId, key: "tagId")
            }
        }
    }

    var comment: String? {
        get {
            guard case .Object(let textId, .Text) = try? document.get(obj: objectId, key: "comment") else {
                return nil
            }
            return try? document.text(obj: textId)
        }
        set {
            if let text = newValue {
                // Get or create text object
                if case .Object(let textId, .Text) = try? document.get(obj: objectId, key: "comment") {
                    // Replace existing text
                    let currentLen = (try? document.length(obj: textId)) ?? 0
                    try? document.spliceText(obj: textId, start: 0, delete: Int64(currentLen), value: text)
                } else {
                    // Create new text object
                    let textId = try? document.putObject(obj: objectId, key: "comment", ty: .Text)
                    if let textId = textId {
                        try? document.spliceText(obj: textId, start: 0, delete: 0, value: text)
                    }
                }
            } else {
                try? document.delete(obj: objectId, key: "comment")
            }
        }
    }

    var images: [String] {
        get {
            guard case .Object(let listId, .List) = try? document.get(obj: objectId, key: "images") else {
                return []
            }
            let length = (try? document.length(obj: listId)) ?? 0
            var result: [String] = []
            for i in 0..<length {
                if case .Scalar(.String(let value)) = try? document.get(obj: listId, index: i) {
                    result.append(value)
                }
            }
            return result
        }
        set {
            // Delete existing list and recreate
            try? document.delete(obj: objectId, key: "images")
            let listId = try? document.putObject(obj: objectId, key: "images", ty: .List)
            if let listId = listId {
                for (i, image) in newValue.enumerated() {
                    try? document.insert(obj: listId, index: UInt64(i), value: .String(image))
                }
            }
        }
    }

    /// Convert to plain TimeRecord for UI
    func asTimeRecord(id: UUID) -> TimeRecord {
        TimeRecord(
            id: id,
            startTime: startTime,
            endTime: endTime,
            tagId: tagId,
            comment: comment,
            images: images
        )
    }
}

Storage Service

AutomergeStorageService

import Automerge
import Foundation

/// Storage service using Automerge for CRDT sync
actor AutomergeStorageService: FileStorageServiceProtocol {
    private let storageURL: URL
    private var tagsDocument: Document
    private var recordDocuments: [String: Document] = [:]  // "YYYY-MM" -> Document

    init(storageURL: URL) async throws {
        self.storageURL = storageURL

        // Load or create tags document
        let tagsURL = storageURL.appendingPathComponent("tags.automerge")
        if FileManager.default.fileExists(atPath: tagsURL.path) {
            let data = try Data(contentsOf: tagsURL)
            self.tagsDocument = try Document(data)
        } else {
            self.tagsDocument = Document()
        }
    }

    // MARK: - Tags

    func loadTags() async throws -> [Tag] {
        var tags: [Tag] = []
        let keys = try tagsDocument.keys(obj: .ROOT)

        for key in keys {
            guard let id = UUID(uuidString: key),
                  case .Object(let objId, .Map) = try tagsDocument.get(obj: .ROOT, key: key) else {
                continue
            }
            let wrapper = AutomergeTag(document: tagsDocument, objectId: objId)
            tags.append(wrapper.asTag(id: id))
        }

        return tags
    }

    func saveTag(_ tag: Tag) async throws {
        let key = tag.id.uuidString

        // Create or get existing object
        let objId: ObjId
        if case .Object(let existingId, .Map) = try? tagsDocument.get(obj: .ROOT, key: key) {
            objId = existingId
        } else {
            objId = try tagsDocument.putObject(obj: .ROOT, key: key, ty: .Map)
        }

        // Update fields
        var wrapper = AutomergeTag(document: tagsDocument, objectId: objId)
        wrapper.name = tag.name
        wrapper.color = tag.color
        wrapper.createdAt = tag.createdAt
        wrapper.isArchived = tag.isArchived

        // Persist
        try await persistTagsDocument()
    }

    func deleteTag(id: UUID) async throws {
        try tagsDocument.delete(obj: .ROOT, key: id.uuidString)
        try await persistTagsDocument()
    }

    private func persistTagsDocument() async throws {
        let data = tagsDocument.save()
        let url = storageURL.appendingPathComponent("tags.automerge")
        try data.write(to: url)
    }

    // MARK: - Records

    func loadRecords(from startDate: Date, to endDate: Date) async throws -> [TimeRecord] {
        var records: [TimeRecord] = []

        // Determine which monthly documents to load
        let months = monthsBetween(start: startDate, end: endDate)

        for month in months {
            let doc = try await loadOrCreateRecordDocument(for: month)
            let keys = try doc.keys(obj: .ROOT)

            for key in keys {
                guard let id = UUID(uuidString: key),
                      case .Object(let objId, .Map) = try doc.get(obj: .ROOT, key: key) else {
                    continue
                }
                let wrapper = AutomergeTimeRecord(document: doc, objectId: objId)
                let record = wrapper.asTimeRecord(id: id)

                // Filter by date range
                if record.startTime >= startDate && record.startTime <= endDate {
                    records.append(record)
                }
            }
        }

        return records.sorted { $0.startTime > $1.startTime }
    }

    func saveRecord(_ record: TimeRecord) async throws {
        let month = monthKey(for: record.startTime)
        let doc = try await loadOrCreateRecordDocument(for: month)
        let key = record.id.uuidString

        // Create or get existing object
        let objId: ObjId
        if case .Object(let existingId, .Map) = try? doc.get(obj: .ROOT, key: key) {
            objId = existingId
        } else {
            objId = try doc.putObject(obj: .ROOT, key: key, ty: .Map)
        }

        // Update fields
        var wrapper = AutomergeTimeRecord(document: doc, objectId: objId)
        wrapper.startTime = record.startTime
        wrapper.endTime = record.endTime
        wrapper.tagId = record.tagId
        wrapper.comment = record.comment
        wrapper.images = record.images

        // Persist
        try await persistRecordDocument(doc, for: month)
    }

    func deleteRecord(_ record: TimeRecord) async throws {
        let month = monthKey(for: record.startTime)
        let doc = try await loadOrCreateRecordDocument(for: month)
        try doc.delete(obj: .ROOT, key: record.id.uuidString)
        try await persistRecordDocument(doc, for: month)
    }

    // MARK: - Helpers

    private func monthKey(for date: Date) -> String {
        let calendar = Calendar.current
        let year = calendar.component(.year, from: date)
        let month = calendar.component(.month, from: date)
        return String(format: "%04d-%02d", year, month)
    }

    private func monthsBetween(start: Date, end: Date) -> [String] {
        var months: [String] = []
        var current = start
        let calendar = Calendar.current

        while current <= end {
            months.append(monthKey(for: current))
            guard let next = calendar.date(byAdding: .month, value: 1, to: current) else { break }
            current = next
        }

        return months
    }

    private func loadOrCreateRecordDocument(for month: String) async throws -> Document {
        if let cached = recordDocuments[month] {
            return cached
        }

        let components = month.split(separator: "-")
        let year = String(components[0])
        let monthNum = String(components[1])

        let dirURL = storageURL.appendingPathComponent("records/\(year)")
        try FileManager.default.createDirectory(at: dirURL, withIntermediateDirectories: true)

        let url = dirURL.appendingPathComponent("\(monthNum).automerge")
        let doc: Document

        if FileManager.default.fileExists(atPath: url.path) {
            let data = try Data(contentsOf: url)
            doc = try Document(data)
        } else {
            doc = Document()
        }

        recordDocuments[month] = doc
        return doc
    }

    private func persistRecordDocument(_ doc: Document, for month: String) async throws {
        let components = month.split(separator: "-")
        let year = String(components[0])
        let monthNum = String(components[1])

        let dirURL = storageURL.appendingPathComponent("records/\(year)")
        try FileManager.default.createDirectory(at: dirURL, withIntermediateDirectories: true)

        let url = dirURL.appendingPathComponent("\(monthNum).automerge")
        let data = doc.save()
        try data.write(to: url)
    }
}

Folder Sync Integration

Automerge works exceptionally well with folder-based cloud sync (Dropbox, iCloud Documents, Google Drive, WebDAV). The key insight: folder sync services create conflict files, Automerge merges them automatically.

The Problem

Folder sync services don’t understand file contents - they sync bytes. When two devices edit the same file offline:

Device A (offline): Edits 01.automerge
Device B (offline): Edits 01.automerge
Both come online...

Dropbox creates:
├── 01.automerge                                    # "Winner" (last upload)
└── 01 (conflicted copy 2025-01-05).automerge      # "Loser"

The Solution: Auto-Merge Conflict Files

actor FolderSyncConflictResolver {
    private let storageURL: URL

    /// Scan for and resolve conflict files
    func resolveConflicts() async throws {
        let files = try FileManager.default.contentsOfDirectory(
            at: storageURL,
            includingPropertiesForKeys: nil
        )

        // Group by base name
        let groups = groupConflictFiles(files)

        for (baseName, conflictFiles) in groups where conflictFiles.count > 1 {
            try await mergeConflictFiles(baseName: baseName, files: conflictFiles)
        }
    }

    private func mergeConflictFiles(baseName: String, files: [URL]) async throws {
        // Load primary document
        let primaryURL = files.first { !$0.lastPathComponent.contains("conflicted") }!
        var primaryDoc = try Document(Data(contentsOf: primaryURL))

        // Merge all conflict copies into primary
        for conflictURL in files where conflictURL != primaryURL {
            let conflictDoc = try Document(Data(contentsOf: conflictURL))
            try primaryDoc.merge(other: conflictDoc)  // <-- Automerge magic!

            // Delete conflict file after successful merge
            try FileManager.default.removeItem(at: conflictURL)
        }

        // Save merged document
        try primaryDoc.save().write(to: primaryURL)

        AppLogger.sync.info("Merged \(files.count) versions of \(baseName)")
    }

    private func groupConflictFiles(_ files: [URL]) -> [String: [URL]] {
        var groups: [String: [URL]] = [:]

        for file in files where file.pathExtension == "automerge" {
            let name = file.lastPathComponent
            let baseName = extractBaseName(from: name)
            groups[baseName, default: []].append(file)
        }

        return groups
    }

    private func extractBaseName(from filename: String) -> String {
        // Handle various conflict patterns:
        // Dropbox: "file (conflicted copy 2025-01-05).ext"
        // iCloud: "file 2.ext" or "file (conflict).ext"
        // Google Drive: "file (1).ext"

        let patterns = [
            #"\s*\(conflicted copy[^)]*\)"#,  // Dropbox
            #"\s*\(\d+\)"#,                    // Google Drive
            #"\s*\d+(?=\.)"#,                  // iCloud numeric suffix
            #"\s*\(conflict[^)]*\)"#           // Generic
        ]

        var result = filename.replacingOccurrences(of: ".automerge", with: "")
        for pattern in patterns {
            result = result.replacingOccurrences(
                of: pattern,
                with: "",
                options: .regularExpression
            )
        }
        return result.trimmingCharacters(in: .whitespaces)
    }
}

Sync Flow Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                         Folder Sync + Automerge                          │
│                                                                          │
│  ┌──────────────┐                              ┌──────────────┐         │
│  │   Device A   │                              │   Device B   │         │
│  │              │                              │              │         │
│  │ 01.automerge │                              │ 01.automerge │         │
│  │ (edit locally)│                             │ (edit locally)│        │
│  └──────┬───────┘                              └──────┬───────┘         │
│         │                                             │                  │
│         │ sync                                   sync │                  │
│         v                                             v                  │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │                     Cloud Folder                              │       │
│  │  ┌─────────────────┐  ┌────────────────────────────────────┐ │       │
│  │  │ 01.automerge    │  │ 01 (conflicted copy).automerge     │ │       │
│  │  │ (Device B wins) │  │ (Device A's version)               │ │       │
│  │  └─────────────────┘  └────────────────────────────────────┘ │       │
│  └──────────────────────────────────────────────────────────────┘       │
│                                  │                                       │
│                                  │ sync down to both devices             │
│                                  v                                       │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │                    App Conflict Resolution                    │       │
│  │                                                               │       │
│  │   1. Detect conflict files (pattern matching)                │       │
│  │   2. Load both Automerge documents                           │       │
│  │   3. primaryDoc.merge(other: conflictDoc)  <-- AUTOMATIC!    │       │
│  │   4. Save merged document                                    │       │
│  │   5. Delete conflict file                                    │       │
│  │                                                               │       │
│  │   Result: Single 01.automerge with ALL changes from A and B  │       │
│  └──────────────────────────────────────────────────────────────┘       │
│                                  │                                       │
│                                  │ syncs back to cloud                   │
│                                  v                                       │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │                     Cloud Folder                              │       │
│  │  ┌─────────────────┐                                         │       │
│  │  │ 01.automerge    │  <-- Merged, no conflicts               │       │
│  │  └─────────────────┘                                         │       │
│  └──────────────────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────────────────┘

Integration with Cloud Adapters

/// Extended storage service with conflict resolution
actor AutomergeFolderSyncService {
    private let localStorage: AutomergeStorageService
    private let cloudAdapter: CloudAdapterProtocol
    private let conflictResolver: FolderSyncConflictResolver

    /// Full sync cycle
    func sync() async throws {
        // 1. Download all remote files
        let remoteFiles = try await cloudAdapter.listFiles()
        for file in remoteFiles {
            let localPath = localURL(for: file.path)

            if shouldDownload(remote: file, local: localPath) {
                let data = try await cloudAdapter.download(path: file.path)
                try data.write(to: localPath)
            }
        }

        // 2. Resolve any conflict files created by folder sync
        try await conflictResolver.resolveConflicts()

        // 3. Upload local changes
        let localFiles = try listLocalAutomergeFiles()
        for localFile in localFiles {
            let remotePath = remotePath(for: localFile)

            if shouldUpload(local: localFile, remotePath: remotePath) {
                let data = try Data(contentsOf: localFile)
                _ = try await cloudAdapter.upload(path: remotePath, data: data)
            }
        }

        // 4. Clean up remote conflict files (already merged locally)
        for file in remoteFiles where isConflictFile(file.path) {
            try await cloudAdapter.delete(path: file.path)
        }
    }
}

App Integration

/// On app launch or periodic sync
func onSyncTrigger() async {
    // 1. Let folder sync do its thing (Dropbox/iCloud/etc syncs files)
    await waitForFolderSyncToSettle()

    // 2. Resolve any conflicts Automerge-style
    try await conflictResolver.resolveConflicts()

    // 3. Reload data from merged documents
    await appState.reloadData()
}

/// On file change (FSEvents / DispatchSource)
func onFileChange(url: URL) async {
    if isConflictFile(url) {
        // New conflict detected - merge immediately
        try await conflictResolver.resolveConflicts()
        await appState.reloadData()
    } else if url.pathExtension == "automerge" {
        // Regular update from another device
        await appState.reloadData()
    }
}

Comparison: JSON CRDT vs Automerge for Folder Sync

AspectCustom CRDT (JSON)Automerge (Binary)
Conflict detectionSame (file patterns)Same (file patterns)
Merge complexityManual field-by-fieldOne line: doc.merge(other)
Merge correctnessMust implement correctlyBattle-tested
Text conflictsLWW (loses edits)Character-level merge
History after mergeLostPreserved
Debug conflictsEasy (read JSON)Hard (binary)

Why Automerge Excels at Folder Sync

  1. Conflict files are input to merge() - designed for this
  2. Merging is automatic and correct - no custom merge logic
  3. No data loss - concurrent edits are merged, not overwritten
  4. Simple integration - detect conflict files, call merge(), delete conflict

Sync Implementation

Sync Protocol

Automerge provides a built-in sync protocol for efficient delta exchange:

import Automerge

class AutomergeSyncManager {
    private var syncStates: [String: SyncState] = [:]  // peerId -> state

    /// Generate sync message to send to peer
    func generateSyncMessage(for document: Document, peerId: String) -> Data? {
        let state = syncStates[peerId] ?? SyncState()

        guard let message = document.generateSyncMessage(state: state) else {
            return nil  // Already in sync
        }

        return message
    }

    /// Receive sync message from peer
    func receiveSyncMessage(_ message: Data, for document: Document, peerId: String) throws {
        var state = syncStates[peerId] ?? SyncState()
        try document.receiveSyncMessage(state: &state, message: message)
        syncStates[peerId] = state
    }

    /// Check if in sync with peer
    func isInSync(for document: Document, peerId: String) -> Bool {
        let state = syncStates[peerId] ?? SyncState()
        return document.generateSyncMessage(state: state) == nil
    }
}

Cloud Sync with Automerge

/// Sync Automerge documents to cloud storage
class CloudAutomergeSyncService {
    private let cloudAdapter: CloudAdapterProtocol
    private let syncManager: AutomergeSyncManager

    func syncDocument(_ document: Document, path: String) async throws {
        // 1. Download remote document if exists
        if let remoteData = try? await cloudAdapter.download(path: path) {
            let remoteDoc = try Document(remoteData)
            // 2. Merge remote into local
            try document.merge(other: remoteDoc)
        }

        // 3. Upload merged document
        let data = document.save()
        _ = try await cloudAdapter.upload(path: path, data: data)
    }

    /// Efficient sync using sync protocol (for real-time)
    func syncWithPeer(
        _ document: Document,
        peerId: String,
        send: (Data) async throws -> Void,
        receive: () async throws -> Data?
    ) async throws {
        // Exchange sync messages until converged
        while true {
            // Send our changes
            if let outgoing = syncManager.generateSyncMessage(for: document, peerId: peerId) {
                try await send(outgoing)
            }

            // Receive their changes
            if let incoming = try await receive() {
                try syncManager.receiveSyncMessage(incoming, for: document, peerId: peerId)
            }

            // Check if converged
            if syncManager.isInSync(for: document, peerId: peerId) {
                break
            }
        }
    }
}

Migration Strategy

Phase 1: Add Automerge Dependency

// Package.swift
dependencies: [
    .package(url: "https://github.com/automerge/automerge-swift.git", from: "0.6.1")
]

Phase 2: Create Migration Service

/// Migrates JSON data to Automerge format
class AutomergeMigrationService {
    private let legacyStorage: LocalFileStorageService
    private let automergeStorage: AutomergeStorageService

    func migrate() async throws {
        // 1. Migrate tags
        let tags = try await legacyStorage.loadTags()
        for tag in tags {
            try await automergeStorage.saveTag(tag)
        }

        // 2. Migrate records (in batches by month)
        let allRecords = try await legacyStorage.loadAllRecords()
        for record in allRecords {
            try await automergeStorage.saveRecord(record)
        }

        // 3. Mark migration complete
        UserDefaults.standard.set(true, forKey: "automerge_migration_complete")

        // 4. Optionally backup and remove legacy files
    }

    var needsMigration: Bool {
        !UserDefaults.standard.bool(forKey: "automerge_migration_complete")
    }
}

Phase 3: Dual-Read Support

/// Storage service that reads from both formats during transition
actor HybridStorageService: FileStorageServiceProtocol {
    private let legacy: LocalFileStorageService
    private let automerge: AutomergeStorageService
    private let migrationComplete: Bool

    func loadRecords(from: Date, to: Date) async throws -> [TimeRecord] {
        if migrationComplete {
            return try await automerge.loadRecords(from: from, to: to)
        } else {
            // Read from both, prefer Automerge if exists
            let automergeRecords = try await automerge.loadRecords(from: from, to: to)
            let legacyRecords = try await legacy.loadRecords(from: from, to: to)

            // Merge, preferring Automerge versions
            let automergeIds = Set(automergeRecords.map(\.id))
            let uniqueLegacy = legacyRecords.filter { !automergeIds.contains($0.id) }

            return automergeRecords + uniqueLegacy
        }
    }

    func saveRecord(_ record: TimeRecord) async throws {
        // Always write to Automerge
        try await automerge.saveRecord(record)
    }
}

Storage Format Comparison

JSON (Current)

records/2024/01/2024-01-15_093000.json (250 bytes)
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "startTime": "2024-01-15T09:30:00Z",
  "endTime": "2024-01-15T11:45:00Z",
  "tagId": "550e8400-e29b-41d4-a716-446655440000",
  "comment": "Working on feature X",
  "images": ["123e4567_0.jpg"]
}

Automerge Binary

records/2024/01.automerge (~150 bytes per record + overhead)
Binary format containing:
- Document metadata (actor ID, sequence numbers)
- Operations log (compressed)
- Current state snapshot
- Change history

Size comparison (100 records):

  • JSON: ~25KB (100 files)
  • Automerge: ~20KB (1 file with history)

Image Storage

Never store images inside Automerge documents. Store only filename references.

Why Not Store Images in Automerge

  1. History bloat - Every image modification is stored forever (tombstones)
  2. Sync inefficiency - Entire document re-syncs when any image changes
  3. Memory pressure - Document must load all images into memory
  4. No partial sync - Can’t download just the record without its images
  5. No true deletion - Removed images stay in history

Recommended Structure

~/Documents/Minuta/
├── tags.automerge
├── records/
│   └── 2024/
│       └── 01.automerge          # Only references image filenames
└── images/
    └── 2024/
        └── 01/
            ├── abc123_0.jpg      # Actual image files
            ├── abc123_1.jpg
            └── def456_0.jpg

Record Stores Only References

// In TimeRecord (inside Automerge document)
{
    "id": "abc123",
    "startTime": ...,
    "images": ["abc123_0.jpg", "abc123_1.jpg"]  // Just filenames, not data
}

Image Storage Service

/// Image storage - separate from Automerge
actor ImageStorageService {
    private let imagesURL: URL

    func saveImage(_ data: Data, filename: String) async throws {
        let url = imagesURL.appendingPathComponent(filename)
        try data.write(to: url)
    }

    func loadImage(filename: String) async throws -> Data {
        let url = imagesURL.appendingPathComponent(filename)
        return try Data(contentsOf: url)
    }

    func deleteImage(filename: String) async throws {
        let url = imagesURL.appendingPathComponent(filename)
        try FileManager.default.removeItem(at: url)  // Actually deleted!
    }
}

Image Conflict Handling

Images are immutable by filename - no conflicts possible:

// Filename includes record ID + index - unique per image
func imageFilename(recordId: UUID, index: Int, ext: String) -> String {
    "\(recordId.uuidString)_\(index).\(ext)"
}

// Same filename = same content = no conflict
// New image = new filename = no conflict

For content-addressable storage (optional):

// Hash-based filename (like Git)
func imageFilename(data: Data, ext: String) -> String {
    let hash = SHA256.hash(data: data).prefix(16).hexString
    return "\(hash).\(ext)"
}

Comparison

StorageIn AutomergeAs Separate Files
Size impactPermanent bloatDeletable
SyncFull doc re-syncOnly changed files
MemoryAll in RAMLoad on demand
ConflictsCRDT overheadNone (immutable)
True deletionNo (tombstone)Yes

Current Minuta approach is correct - images: [String] stores filenames, actual images are separate files. Keep this pattern with Automerge.


Data Deletion and Compaction

Automerge uses tombstones - deleted items are marked as deleted but remain in document history. This is necessary for CRDT correctness.

Options for Permanent Deletion

Option 1: Tombstones (Default)

try doc.delete(obj: .ROOT, key: recordId)  // Marked deleted, stays in history

Option 2: Rebuild Document (Breaks Sync)

/// Permanently remove deleted items by rebuilding
func compactDocument(_ doc: Document) -> Document {
    let newDoc = Document()
    // Copy only non-deleted items to new document
    // WARNING: Breaks sync with other devices!
    return newDoc
}

Option 3: Monthly Documents (Recommended)

With monthly document structure, old months can be deleted entirely:

func purgeOldData(olderThan years: Int) async throws {
    let cutoff = Calendar.current.date(byAdding: .year, value: -years, to: Date())!

    for monthFile in try listMonthlyFiles() {
        if monthFile.date < cutoff {
            // Delete entire file - truly gone
            try FileManager.default.removeItem(at: monthFile.url)
        }
    }
}

This is why monthly documents work well - delete entire .automerge files for old months without breaking sync for current data.

Deletion Summary

MethodTruly DeletesSync SafeUse Case
doc.delete()NoYesNormal deletion
Rebuild documentYesNoNever (breaks sync)
Delete old monthly filesYesYesData retention policy

Trade-offs Summary

Advantages of Automerge

  1. Proven implementation - Years of production use, edge cases handled
  2. Text CRDT - Character-level merging for comments (better than LWW)
  3. Built-in sync protocol - Efficient delta exchange
  4. Change history - Time travel, branching, undo built-in
  5. Cross-platform - Same document format in Swift, JS, Rust
  6. Active development - Regular updates, growing ecosystem

Disadvantages of Automerge

  1. Binary format - Can’t inspect/edit files manually
  2. Dependency - ~2MB added to app bundle
  3. Learning curve - Different mental model than JSON
  4. Pre-release repo - automerge-repo-swift not stable yet
  5. Debugging - Harder to diagnose sync issues
  6. Overkill - Full history may be unnecessary

When to Choose Automerge

  • Need real-time collaboration
  • Cross-platform sync (iOS + web)
  • Text editing with concurrent users
  • Want proven CRDT implementation

When to Choose Custom CRDT

  • JSON debuggability is critical
  • Simple sync (file-based, not real-time)
  • No cross-platform requirements
  • Want full control over format

Recommendation

For Minuta’s use case (single user, multiple devices, offline-first, multi-cloud folder sync):

Automerge is the better choice for folder-based sync because:

  1. Conflict resolution is trivial - doc.merge(other) handles everything
  2. No custom merge logic - battle-tested, correct by construction
  3. Text merging - character-level merge for comments (vs LWW losing edits)
  4. Works with any folder sync - Dropbox, iCloud, Google Drive, WebDAV

Trade-off accepted:

  • Binary format (can’t inspect files manually)
  • ~2MB dependency added to bundle

Custom CRDT (document 503) is better if:

  • JSON debuggability is critical for development
  • Want zero dependencies
  • Don’t need text collaboration

Recommendation: Start with Automerge for production sync. The folder sync + conflict merge pattern is simpler than implementing custom CRDT merge logic correctly.

Related

References