No Code, No Cloud: On-Device Mockup-to-Code with Lightweight Vision-Language AI

Overhead view of a designer's workspace showing hands working with color swatches, a smartphone displaying a mobile interface mockup, wireframe sketches on paper, and a laptop on a wooden desk.
Image Credit: Photo by Firmbee on Pixabay (SourceLicense)

AI Summary of Scholarly Research

This page presents an AI-generated summary of a published research paper. The original authors did not write or review this article. See full disclosure ↓

Publication Signals show what we were able to verify about where this research was published.STANDARDAvailable publication signals for this source were verified. Publication Signals reflect the source’s verifiable credentials, not the quality of the research.

Fewer signals were independently confirmable for this source. That reflects the limits of what’s on record — not a judgment about the research.

  • ✔ Published in indexed journal
  • ✔ No retraction or integrity flags

Overview

LiteViT5 is a lightweight, on-device vision-language model designed to generate HTML code directly from design mockup images. Comprising 235M parameters organized as a ViT-T5 encoder-decoder architecture, the model operates without cloud infrastructure, enabling private prototyping workflows. The system addresses the persistent gap between visual design artifacts and functional code implementation, particularly targeting small teams and non-programmer practitioners constrained by dependencies on proprietary APIs or computationally intensive model architectures.

Methods and approach

LiteViT5 employs a compact ViT-T5 encoder-decoder framework to process mockup images and generate corresponding HTML markup. The model was evaluated on both in-distribution (WebSight benchmark) and out-of-distribution (Design2Code benchmark) datasets. Performance assessment encompassed structure-based metrics, positional accuracy, color fidelity, and CLIP-based similarity measures. Comparative evaluation against substantially larger models—PaliGemma-3B, LLaVA-7B, and DeepSeek-VL-7B—quantified efficiency gains. A user study with 24 participants assessed perceived accuracy, code quality, editability, and practical utility in design iteration workflows.

Key Findings

LiteViT5 achieved competitive performance on both benchmark datasets despite containing 10-30 times fewer parameters than comparable baseline models. Quantitative metrics across structural, positional, color, and perceptual similarity dimensions demonstrated comparable results to substantially larger architectures. User study findings indicated that the model supports rapid design iteration cycles and reduces friction in developer-designer handoff workflows. Generated code exhibited sufficient quality and editability to function as practical output in iterative design processes rather than requiring complete regeneration.

Implications

The model's efficient parameter footprint and on-device execution capability reduce infrastructure costs and eliminate privacy constraints associated with cloud-dependent solutions. The demonstration of competitive performance at 235M parameters suggests that model compression and architectural efficiency can be achieved without proportional degradation in code generation quality, potentially reshaping economic and accessibility considerations in interface design tooling. Results indicate that lightweight generative models can effectively support non-expert practitioners in translating visual designs to functional prototypes, with implications for democratizing web development practices.

Disclosure

  • Research title: No Code, No Cloud: On-Device Mockup-to-Code with Lightweight Vision-Language AI
  • Authors: Abinas Kuganathan, Mitra Purandare, Markus Stolze
  • Institutions: Ostschweizer Fachhochschule OST
  • Publication date: 2026-03-03
  • DOI: https://doi.org/10.1145/3742413.3789144
  • OpenAlex record: View
  • Image credit: Photo by Firmbee on Pixabay (SourceLicense)
  • Disclosure: This post was generated by Claude (Anthropic). The original authors did not write or review this post.

Get the weekly research newsletter

Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.

More posts