Differential State Management

In software engineering, the "Diff Problem" is that LLMs are bad at precise edits. in Novel-to-Video, this is the root cause of Video Flicker and Character Hallucination.

When you ask an AI to generate Scene 2, it doesn't "edit" Scene 1; it re-imagines it. It accidentally "refactors" your main character's face, their clothes, or the room layout because it didn't know how to execute a precise "diff."

The Mapping: Code vs. Narrative

We implement the Cursor/Composer "Diff Architecture" directly into Continuum Flow.

The "Cursor" Model (Code)
The "Codebase"
file.ts (1,000 lines)
The Diff Problem
LLM creates Syntax Errors by deleting a closing bracket '}'.
The Goal
Apply change ONLY to the function.
The "Continuum Flow" Model
🎬
The "Visual State"
TOON File (Scene & Characters)
The Hallucination
LLM gives sword but changes shirt from Blue to Red.
The Goal
Add sword, FREEZE all other pixels.

The Solution: "Narrative Edit Trajectories"

You can’t retrain a model easily, but you can force your Orchestrator to use State Diffs instead of State Descriptions.

The "Anti-Regeneration" Rule

Standard prompting forces re-rendering of known assets.

"Generate Scene 2: Arjun is standing in the cave. He is wearing armor. He draws his sword."
Risk: Armor design changes, Cave lighting shifts.

The "Diff-Based" Protocol

Orchestrator generates a PATCH, not a new file.

Step A: Baseline (Locked State)
State_Frame_01:
Arjun: [Wear: Rusty Armor], [Face: Scarred], [Hand: Empty]
Bg: [Cave, Wet Walls]
Step B: The Diff Command
{
  "operation": "UPDATE",
  "target": "Arjun.Hand",
  "value": "Iron Sword",
  "constraint": "PRESERVE_ALL_OTHER_ATTRIBUTES"
}

Implementation: The "Visual Patching" Workflow

To leverage "Search and Replace" logic, we implement a specific tool in the Orchestrator that acts like git apply.

State Patching Algorithm

1
Input Patch
Receive DIFF Packet
(Target, New Value)
2
Verify State
Check Old Value matches
Current State
Safety Guard
3
Apply Patch
Update State Buffer
Freeze other pixels

Why This Fixes Video Generation

The biggest issue in AI video is Temporal Stability.

  • βœ•
    Without Diffs:

    Frame 1 and Frame 2 are treated as two different paintings. The AI "guesses" the continuity.

  • βœ“
    With Diffs:

    You instruct the Video AI (ControlNet): "Keep 90% exactly the same. Only use diffusion to change the pixels around the hand."