No Code, No Cloud: On-Device Mockup-to-Code with Lightweight Vision-Language AI

AI Summary of Scholarly Research

This page presents an AI-generated summary of a published research paper. The original authors did not write or review this article. See full disclosure ↓

2026-03-03·View original paper ↗·Follow this topic (RSS)

Computer Science & AI Artificial Intelligence & Machine Learning

Publication Signals show what we were able to verify about where this research was published.Available publication signals for this source were verified.ⓘ Publication Signals reflect the source’s verifiable credentials, not the quality of the research.

Fewer signals were independently confirmable for this source. That reflects the limits of what’s on record — not a judgment about the research.

✔ Published in indexed journal
✔ No retraction or integrity flags

Overview

LiteViT5 is a lightweight, on-device vision-language model designed to generate HTML code directly from design mockup images. Comprising 235M parameters organized as a ViT-T5 encoder-decoder architecture, the model operates without cloud infrastructure, enabling private prototyping workflows. The system addresses the persistent gap between visual design artifacts and functional code implementation, particularly targeting small teams and non-programmer practitioners constrained by dependencies on proprietary APIs or computationally intensive model architectures.

Methods and approach

LiteViT5 employs a compact ViT-T5 encoder-decoder framework to process mockup images and generate corresponding HTML markup. The model was evaluated on both in-distribution (WebSight benchmark) and out-of-distribution (Design2Code benchmark) datasets. Performance assessment encompassed structure-based metrics, positional accuracy, color fidelity, and CLIP-based similarity measures. Comparative evaluation against substantially larger models—PaliGemma-3B, LLaVA-7B, and DeepSeek-VL-7B—quantified efficiency gains. A user study with 24 participants assessed perceived accuracy, code quality, editability, and practical utility in design iteration workflows.

Key Findings

LiteViT5 achieved competitive performance on both benchmark datasets despite containing 10-30 times fewer parameters than comparable baseline models. Quantitative metrics across structural, positional, color, and perceptual similarity dimensions demonstrated comparable results to substantially larger architectures. User study findings indicated that the model supports rapid design iteration cycles and reduces friction in developer-designer handoff workflows. Generated code exhibited sufficient quality and editability to function as practical output in iterative design processes rather than requiring complete regeneration.

Implications

The model's efficient parameter footprint and on-device execution capability reduce infrastructure costs and eliminate privacy constraints associated with cloud-dependent solutions. The demonstration of competitive performance at 235M parameters suggests that model compression and architectural efficiency can be achieved without proportional degradation in code generation quality, potentially reshaping economic and accessibility considerations in interface design tooling. Results indicate that lightweight generative models can effectively support non-expert practitioners in translating visual designs to functional prototypes, with implications for democratizing web development practices.

Disclosure

Research title: No Code, No Cloud: On-Device Mockup-to-Code with Lightweight Vision-Language AI
Authors: Abinas Kuganathan, Mitra Purandare, Markus Stolze
Institutions: Ostschweizer Fachhochschule OST
Publication date: 2026-03-03
DOI: https://doi.org/10.1145/3742413.3789144
OpenAlex record: View
Image credit: Photo by Firmbee on Pixabay (Source • License)
Disclosure: This post was generated by Claude (Anthropic). The original authors did not write or review this post.

No Code, No Cloud: On-Device Mockup-to-Code with Lightweight Vision-Language AI

Overview

Methods and approach

Key Findings

Implications

Disclosure

More posts

Next-to-leading power terms can be significant in slepton pair production

Modular symmetry shapes quintessence and de Sitter vacua

BIR-Adapter reduces training needs for blind image restoration

Gamma-limit analysis of thin incompressible magnetoelastic shallow shells

No Code, No Cloud: On-Device Mockup-to-Code with Lightweight Vision-Language AI

Overview

Methods and approach

Key Findings

Implications

Disclosure

Get the weekly research newsletter

Related research in Computer Science & AI

More posts

Next-to-leading power terms can be significant in slepton pair production

Modular symmetry shapes quintessence and de Sitter vacua

BIR-Adapter reduces training needs for blind image restoration

Gamma-limit analysis of thin incompressible magnetoelastic shallow shells