Published October 9th, 2024, Motiff Tech

The engineering practices behind AI Generates UI

Kai Fan
Kai Fan
Director of AI Engineering
Share This Post

Motiff firmly believes that AI will revolutionize UI design, and has introduced several industry-first AI features. With the rapid advancements in large language models, our vision for AI capabilities has grown even broader. We aim to develop AI capabilities not just as a designer’s copilot, but also as an autopilot, potentially generating complete design drafts.

After numerous trials and setbacks, Motiff has finally launched the AI Generates UI feature. Now, users can simply input their design requirements and receive a polished UI design draft immediately.

In this blog post, I'd like to share the journey and insights we gained during the development of this feature.

Early Exploration

When we first embarked on the AI Generates UI project, we faced numerous uncertainties regarding the technical direction.

The core question was whether to generate every detail from scratch or to assemble designs using UI components. While utilizing components appeared simpler in theory, we believed that allowing the AI to control every detail would yield the best outcome in terms of design expressive power. This became our initial approach.

Generating UI as Images

At first glance, a UI design draft resembles an image. Our tests confirm that text-to-image models can generate visually appealing UI draft images. However, several significant issues crop up:

  • It is challenging to control the layout and content at the pixel level.
  • The output is an unstructured image, making it difficult to edit and optimize further.
  • Text generation poses a substantial challenge; even the best text-to-image models struggle with generating longer, coherent texts.

These issues are unacceptable for a professional tool, leading us to conclude that this approach isn't viable.

Generating UI as Web Pages

Another natural direction was to generate UI as structured text (web pages), but this approach also had its drawbacks. was to generate UI as structured text (web pages), but this approach also had its drawbacks.

Realistic UI drafts contain numerous details, requiring a substantial amount of output tokens**,** which strains most mainstream LLMs. We experimented with custom DSLs to enhance efficiency, but the improvements were minimal.

More critically, there was a noticeable quality gap between AI-generated web pages and high-quality UI designs, particularly regarding complex visual effects, as the models often failed to capture design aesthetics.

Rethinking UI Generation

UI drafting is a unique task requiring rigorous logical structure in layout, pixel-level precision in detail, and high design aesthetics in visual effects. Current AI models struggle to effectively generate satisfactory results for this task, prompting us to reassess our technical direction.

The content of a UI draft is mainly composed of three categories:

  1. 1.The overall structure, such as navigation bars, titles, search boxes, content cards, etc.
  2. 2.Page content, such as texts and images for each section.
  3. 3.Proper placement and detailed control of UI elements to achieve a good visual presentation.

While LLMs have demonstrated significant capabilities in generating overall structure and content, they typically fall short in precise UI detail control. By leveraging a predefined UI component library instead of generating every minute detail, we could address these shortcomings.

Certainly, there are challenging scenarios, like densely packed product cards overflowing with tags, which are hard to match with suitable components. However, these cases stem from a lack of UI personalization, necessitating information overloading in limited spaces.

We believe that as AI reduces software development costs, future applications will become more intelligent and personalized, minimizing such scenarios.

Consequently, we adjusted our technical approach by employing a rich set of expressive UI components to ensure well-designed UI details. Simultaneously, we let LLMs focus on generating the overall page structure and specific content.

Implementing the Solution

After adopting the new approach, we initially thought the implementation would be straightforward; we believed that by simply describing the page requirements and the component library to the LLMs, we could generate complete pages. However, the results were far from satisfactory.

Despite our efforts to meticulously describe each UI component, the LLMs struggled to select the appropriate components. One of the reasons was our need for diverse components to achieve high-quality UI effects. Many components were quite similar—for instance, lists that either display simple text or more complex configurations with images and buttons.

We also attempted to enhance accuracy using few-shot prompting, but generating a UI page involves a long content, making it challenging to provide a meaningful number of examples in a single inference. The improvements were thus limited.

These practices made it clear that although a component-based approach significantly reduced complexity, enabling the LLM to directly generate pages wasn't straightforward.

Adopting Flow Engineering

We quickly shifted to Flow Engineering, fracturing the generation process into multiple sub-tasks, collaborating to complete the entire function.

One splitting method involved dividing the process into two major steps:

  • Generation: First, the LLM generates a reasonable page layout and content using its preferred format (like HTML or ASCII Art) based on the requirements.
  • Conversion: Attempt to convert the generated page into a component-based format, matching each part of the page to a predefined component.

While the "generation" step worked reasonably well, the "conversion" process proved to be exceptionally challenging. High-freedom page elements were hard to match with limited components. We abandoned attempts to define manual conversion rules due to scalability issues. Multiple iterations of splitting the process yielded less-than-ideal results.

We realized the core problem remained: getting the LLM to better understand the UI design task.

Developing a Proprietary UI-Specific Model

UI design is a niche and professional field, and existing large general-purpose models have demonstrated clear limitations.

Not only for AI Generates UI, but other AI features also need a model that could better understand UI specifics. Hence, we developed Motiff’s own MLLM tailored for the UI industry—Motiff AI Model.

This integrated expert model connects pre-trained language and visual models through a modal converter, enabling them to work collaboratively. By feeding it substantial professional data, the model was trained to acquire fundamental UI domain capabilities.

Through targeted optimizations, the Motiff AI Model significantly outperformed general-purpose models in various UI-specific tasks, such as ScreenQA, Screen2Words, and MoTIF-Automation.

For more details on the training process, you can read "MLLM by Motiff: Shaping the future of UI design".

Further Optimization

Leveraging the Motiff AI Model, we finally merged the expressiveness of the UI component library with the generative capabilities of the LLM. By integrating processes like content generation, component matching, and image generation, we can accurately produce design drafts based on user requirements.

Our internal evaluations showed that Motiff’s AI-generated results had notably improved compared to other solutions and supported multi-language generation effectively.

Motiff continues iterating on both the process and the model. We've also developed a feature to generate two different design drafts simultaneously, allowing users to choose the most suitable design. After generating the UI, users can seamlessly utilize to Motiff's powerful professional features, such as rapid prototyping or further editing and optimization.

Experiences and Reflections

The Motiff AI team encountered numerous challenges and setbacks throughout this exploration. Here, we share some insights and experiences from AI application development.

Stay Open to Product Design

Given the technical uncertainties, we maintained a flexible attitude toward the functional form of AI products. For instance, the project began with a vague product direction, becoming more refined through technical exploration.

Currently, much of the project's work focuses on finding Technology-Product-Fit, differing significantly from previous development experiences. While technology has always been crucial, in past projects, the tech stack was mature, making it easier to gauge a demand's technical feasibility and development cost.

However, rapidly evolving AI models and our ongoing learning curve regarding underlying logic make it tougher to ascertain technical feasibility.

To tackle these challenges, we adopt a iterative approach. Instead of aiming for a complete product solution, we prefer to land functionalities early and gather user feedback. This reduces product-side uncertainties and enhances our understanding of technological boundaries within real-world applications.

Focus on Flow Engineering

Prompt engineering remains fundamental. However, as models evolve and team experience grows, we usually fine-tune prompts quickly to a reasonable level. However, pursuing better accuracy or functionality significantly spikes prompt tuning difficulty. Comparatively, splitting tasks into simpler sub-tasks reduces single inference complexity.

Task-dividing makes collaboration and engineering easier compared to the black-box nature of LLM inference.

Many existing engineering experiences and infrastructures seamlessly integrate into this approach, like performance monitoring, error alerts, and rate-limiting retries. These are vital for stable online operations.

Avoid Premature Optimization

In system design, we emphasize relying on stable abstractions rather than concrete implementations.

For AI applications, we need to consider the underlying capabilities that support the product. Over the past year, the context length of LLMs has increased by an order of magnitude, costs have decreased by an order of magnitude, and inference speeds have significantly improved. Amidst such rapid changes, we must avoid letting current metrics dictate the fundamental logic of our system architecture. Motiff has learned hard lessons, such as when we previously adopted complex concurrent processes to optimize generation speed, only to find it counterproductive.

Thus, we believe it is crucial to carefully assess long-term limitations when building systems. Focus on delivering core capabilities, and delay decisions on other aspects until bottlenecks arise.

Streamline the Exploration Process

We believe the most essential engineering capability today is enabling the team to quickly experiment, iterate, and learn from failures. Therefore, we have invested substantial resources to refine our experimentation platform, allowing team members to flexibly customize new processes and parameters. All operational data from these processes are recorded in databases for easier debugging and evaluation.

In traditional software development, we are used to the determinism of program results. However, in AI applications, we must adapt to result uncertainty. For example, we've aimed for comprehensive end-to-end automated testing but found it exceedingly difficult. Consequently, we had to adjust our workflows, implementing one to two rounds of manual evaluations daily to verify effectiveness.

Build Data Moats

Ultimately, the success of AI Generates UIs through our in-house model is closely tied to Motiff’s long-term accumulation of high-quality, specialized data in the UI domain. These valuable data sets are costly to acquire and require significant resources for curation and annotation.

Looking ahead, the logical capabilities of LLMs will become increasingly powerful and affordable. Data, being a more scarce resource, will serve as the paramount moat.

While we aim to quickly launch new products with a lightweight approach, ensuring service quality typically requires heavier investment in the longer term, and AI applications are no exception. Therefore, for sustainable growth, we must build our data moat as early as possible.

Just the Beginning

The current capabilities of AI Generates UI are merely a starting point.

Motiff is continuously monitoring advancements in AI foundational capabilities and iterating its technical solutions. We are excited to discover new ideas brought forth by recent LLM progress. Our goal is to rapidly implement these improvements to achieve better generation outcomes, further revolutionizing the future of design through AI.

Subscribe to Motiff Blog
I agree to opt-in to Motiff's mailing list.
By clicking "Subscribe" you agree to our TOS and Privacy Policy.