This article discusses an architectural shift in visual regression testing for embedded devices, moving from storing large golden image files and performing pixel-by-pixel comparisons to using MD5 hashes of raw frame buffers. This change significantly reduced storage requirements, improved CI pipeline speed, and eliminated flaky tests caused by image encoding variations. It highlights the specific conditions under which this technique is effective and its inherent trade-offs.
Read original on DZone MicroservicesTraditional visual regression testing often relies on storing "golden" images and performing pixel-by-pixel comparisons against new outputs. This approach presents several architectural and operational challenges, particularly in CI/CD environments. The most prominent issues include large storage requirements (gigabytes of PNGs), slow comparison times, and flaky tests due to inconsistencies in image encoders across different runs or environments. This often leads to increased repository size, slower cloning, and less efficient code reviews for UI changes.
The article introduces frame buffer hashing as an alternative, where instead of storing entire images, an MD5 hash of the raw GPU frame buffer is computed and stored as a reference. This dramatically reduces storage (from 18GB to 19KB in their case) and makes comparisons instantaneous string equality checks. Intentional UI changes then appear as clear diffs in JSON reference files within source control, streamlining code reviews.
import hashlib
import json
from pathlib import Path
REFERENCE_FILE = Path("references/visual_hashes.json")
def frame_hash(frame_bytes: bytes) -> str:
"""MD5 of the raw GPU frame buffer."""
return hashlib.md5(frame_bytes).hexdigest()
def load_references() -> dict:
if REFERENCE_FILE.exists():
return json.loads(REFERENCE_FILE.read_text())
return {}
def check_frame(test_id: str, frame_bytes: bytes, references: dict) -> tuple[bool, str]:
"""Returns (passed, actual_hash)."""
actual = frame_hash(frame_bytes)
expected = references.get(test_id)
if expected is None:
return False, actual # no reference yet
return actual == expected, actual
def on_failure(test_id: str, frame_bytes: bytes, actual: str):
"""Only called when hashes diverge. Save the frame for review."""
artifact_dir = Path(f"artifacts/{test_id}")
artifact_dir.mkdir(parents=True, exist_ok=True)
(artifact_dir / f"{actual}.raw").write_bytes(frame_bytes)Trade-offs and Limitations
The primary drawback is failure diagnosis. When hashes diverge, you only know *that* there's a difference, not *what* the difference is. Reconstructing the reference image for a side-by-side comparison requires re-running the test against a known-good build, which can be cumbersome if failures are frequent. Additionally, this approach does not support fuzzy matching, as it relies on exact byte-for-byte determinism.