Coframe – Coframe Develops Model Fine-Tuned for UI Code Generation with OpenAI

Visually compelling code generation has been a constant challenge for people working with websites. We are changing that.

‍

Over the past few months, we’ve been quietly building a set of powerful new capabilities that will allow our customers to run far more complex and impactful optimizations on their websites and UIs. These capabilities, falling under the umbrella of generative UI, make it possible to create new website and UI sections using visually-grounded code generation, accelerating the work of growth and frontend engineering teams.

‍

A recent study conducted by a leading player in the experimentation space found overall that larger, more complex experiments, more variations rather than fewer, and higher experiment velocity led to superior results. As it turns out, these findings are at the core of what differentiates Coframe from the prior generation of experimentation products and allows us to drive significant metric improvements for our customers.

‍

However, as experiment complexity grows, UI complexity grows. To date, one of the most difficult challenges faced by code generation tools has been generating compelling, on-brand UI. Existing UI-specific offerings are capable of producing highly opinionated UI given a design system, but the current capabilities of generally available vision language models have limited ability to faithfully pick up on existing aesthetics and brand design. This is due to today’s LLMs not having visual grounding for this task during their training.

‍

Today, this changes.

‍

We've developed a first-of-its-kind model fine-tuned to generate high-quality, grounded, aesthetically aligned UI code, in collaboration with OpenAI.

‍

Working with OpenAI on an alpha version of their recently announced GPT-4o Vision Fine-Tuning, we fine-tuned GPT-4o on a large collection of websites, achieving a clear jump in improvement on current models and demonstrating the potential for AI to generate on-brand, high-quality UI. This will allow our customers to quickly and easily create compelling new sections on their websites, and for Coframe to continuously personalize and optimize them over time with our Optimizer algorithm.

Since there isn’t a straightforward way to quantitatively assess performance in this domain, we constructed an LLM-as-a-judge benchmark which we’ve named DesignAlign. DesignAlign consists of 194 held-out websites in our validation set, one version with one section omitted for each and one without, as well as natural language instructions for reproducing the omitted section, in our validation set. The task defined for a model is to generate a section for each of the websites. The judge LLM will then compare the full website with the website with the AI-generated section inserted in the place where the section was omitted, assigning a style alignment score between 0 and 1, based on logprobs. With GPT-4o base as a judge, our fine-tuned model achieved a 26% increase in StyleBench performance above GPT-4o. We will be continuing to work on DesignAlign with the intention of releasing it to the community soon.

‍

We’re excited about the potential these models have to accelerate growth for businesses by making website optimization 1000x faster and cheaper, enabling methods of experimentation and personalization that simply weren’t possible before. We’d like to thank OpenAI for working with us and enabling this use-case to come to life.

‍

We are currently working on incorporating responsiveness as well as interoperability with our generative data model for easy editing into the code generated and will be releasing this capability to our customers soon.

‍

Our work with OpenAI is featured in their blog. You can read more about it here.