Remaining Audit Fixes: Robustness, SSE, Frontend
Closed the remaining high, medium, and blind-spot issues from the audit with a pure stability and correctness session. No new features for their own sake. Just reliability work.
What We Set Out To Do
Session 01 solved mesh quality and added the viewer, but the rest of the audit backlog was still there. This session was about finishing the remaining issues across backend behavior, SSE delivery, frontend cleanup, and recovery paths. The goal was not feature expansion. The goal was to stop known weaknesses from remaining in the system.
Bug 5: SSE Subscriber Starvation
If two tabs subscribed to the same job stream, the second tab got nothing because the job used a single queue and the first subscriber consumed every event. This also made reconnection behavior fragile. The fix replaced the shared queue with per-subscriber queues plus an event log replay buffer.
Bug / Fix
- _queue: asyncio.Queue = field(default_factory=asyncio.Queue)
- async def push(self, event): await self._queue.put(event)
- async def stream(self): event = await self._queue.get()
+ _subscribers: list[asyncio.Queue] = field(default_factory=list)
+ _event_log: list[str] = field(default_factory=list)
+ async def push(self, event): append to log and broadcast to all subscribers
+ async def stream(self): subscribe, replay, then consume live events
Bug 6: Checkpoint Could Go Backwards
Retry paths could regress a job’s completed step count. That meant expensive earlier work could be rerun needlessly. The fix was a monotonic checkpoint guard.
Bug / Fix
- self.last_step = step
+ if step > self.last_step:
+ self.last_step = step
Bug 7: Quality Gate File Crash
Missing image files caused FileNotFoundError to bubble out of the quality gate and crash the worker without clean SSE feedback. The fix was to catch that state and return a structured failure result.
Bug / Fix
- gen = _load_rgb(generated_path, compare_size)
- ref = _load_rgb(reference_path, compare_size)
+ try:
+ gen = _load_rgb(generated_path, compare_size)
+ ref = _load_rgb(reference_path, compare_size)
+ except FileNotFoundError as exc:
+ return QualityResult(... failure_reasons=[f"Image file not found: {exc}"])
Bug 8: FBX Export Silently Returned None
Selecting FBX appeared to succeed while producing an empty or missing file. The fix was to detect unsupported FBX earlier, export GLB instead, and surface a clear runtime message.
Bug / Fix
- mesh.export(str(out_path))
+ if export_fmt == "fbx":
+ out_path = out_path.with_suffix(".glb")
+ mesh.export(str(out_path))
+ raise RuntimeError("FBX export is not yet supported ... exported as GLB instead.")
+ mesh.export(str(out_path))
Bug 9 and 10: ComfyUI Timeout + Upload Handle Leak
Large SDXL downloads were timing out too aggressively and the upload path could leak a file handle if the request failed mid-transfer. The timeout was raised from 30s to 60s, and uploads moved from open file handles to in-memory bytes.
Bug / Fix
- async with httpx.AsyncClient(timeout=30.0) as client:
+ async with httpx.AsyncClient(timeout=60.0) as client:
- files = {"image": (path.name, open(image_path, "rb"), "image/png")}
+ image_bytes = path.read_bytes()
+ files = {"image": (path.name, image_bytes, "image/png")}
Bug 11 and 12: Event Typing and EventSource Cleanup
The frontend type union for SSE events was missing multiple backend event types, and EventSource instances were never consistently cleaned up when tabs unmounted. The fix was to expand the frontend event types and hold stream references in refs with cleanup on unmount.
Bug / Fix
- "job_done" | "job_error"
+ "step_active" | "image_ready" | "view_ready" | "mesh_ready" | "log" | "done" | "error"
- const sse = new EventSource(...)
+ const sseRef = useRef<EventSource | null>(null)
+ useEffect(() => () => { sseRef.current?.close(); }, [])
+ sseRef.current = sse
Bug 13 and 14: Timeout Cleanup + Download Button Stub
Prospecting still had a stale 15-minute watchdog timeout that could fire after unmount, and the gallery download button literally did nothing beyond stopping propagation. Cleanup was added for the timeout, and the button was wired into Tauri’s save dialog plus file writing path.
Bug / Fix
+ useEffect(() => () => {
+ if (genTimeoutRef.current) clearTimeout(genTimeoutRef.current);
+ sseRef.current?.close();
+ }, [])
- onClick={e => e.stopPropagation()}
+ onClick={e => { e.stopPropagation(); handleDownload(src, i); }}
Feature: Error Boundary and Cancel Button
A render crash used to take out the entire app with a white screen. A class-based error boundary was added around tab content so failures become visible and recoverable. The session also exposed the existing backend cancel route in the Forge UI so in-flight jobs can be stopped.
Feature
+ class ErrorBoundary extends Component { ... }
+ <ErrorBoundary>{tab content}</ErrorBoundary>
- <button disabled>Processing...</button>
+ <button disabled>Processing...</button>
+ <button onClick={cancelJob} title="Cancel pipeline">x</button>
Performance Benchmarks
How To Reproduce / Verify
Open the same job stream in two tabs to reproduce subscriber starvation. Pre-fix, tab two gets nothing. Post-fix, both tabs get the full stream. To verify checkpoint monotonicity, call checkpoint(3) then checkpoint(1). Pre-fix, the checkpoint regresses. Post-fix, it stays at three.
For quality gate handling, pass nonexistent paths into the identity check. Pre-fix, the worker crashes with FileNotFoundError. Post-fix, it returns a structured failed result. For download behavior, generate a Prospecting image and click the gallery download control. Pre-fix, nothing happens. Post-fix, the save dialog opens.
Test Results
Backlog
- Real FBX export via Blender headless.
- Disk persistence and project browser support.
- MiDaS depth estimation UI.
- Texture baking.
- Sprite pipeline completion.
- Anvil layer system.
- Auto-retry for ComfyUI disconnects.
- CSS consolidation and shared token cleanup.