Industry

LLMs, Model Effect

Client

ByteDance, Dreamina AI

yEAR

2024-2025

Enhance the Model Effect of Dreamina AI

Overview

During my internship at ByteDance’s Seed-Large Image Model Team, I participated in designing the AI image-generating model for ByteDance’s products including Dreamina, Doubao and Capcut.
As a result, the upcoming version of the Prompt Engineering function just launched in Dreamina AI on 9.11.2024 with a 175% increase in aesthetic performance.

My Role

User Research
Prompt Engineer
AI Product Design

Time

2024.3-2024.8

Outcome

The latest version launched on September 11th with the most updated PE.

Aesthetic Performance

170%+

The image with new PE Model generation

Model Saving Rate

40%+

Users that save the model as their image generating tool

Image Download

50%+

From user generated image with model after PE

problem

ByteDance's new image generation model requires prompt optimization to improve image aesthetics due to variable prompt quality. Two major issues are:

Textual Problems

Semantic errors, disorganized content, logical flaws, lack of detail, style mismatches, and unrecognized keywords.

Image Problems

Limited aesthetic improvement, chaotic visuals, lack of responsiveness to input styles, and poor image structure.

solutions

Collaborative Prompt Refinement

AI designers and labelers refine language models + images together

Aesthetic Style Enhancement

Exploring strategies to boost model responsiveness to aesthetics (color, lighting, texture…)

Precision in User Preference Analysis

Enhancing image quality and model performance accurately.

Case Study

From May to July 2024, I refined PE templates to improve image aesthetics. I tested strategies across style types (horizontal) and fine-tuned tactics (vertical), conducting trials to compare aesthetic enhancements. Effective strategies include:

1. Sequence and Word Number

Original:

Short, less descriptive.

After 1st PE version:

More descriptions and aesthetics references.

Prompt Length

Reduce 10%-30% prompt length to the original prompt after PE.

Switch Prompt Sequence

Switch from Object-Style-Aesthetics to Aesthetics-Object-Style

Conclusion

  1. Effective order structure: When art and cg aesthetic dimensions are placed at the top, the aesthetic feeling is greatly improved (aesthetic feeling - main content - style). The order of other categories remains unchanged.

  2. Reasonable word count: 250 words as a whole, no more than 90 words for each of the three dimensions, preferably at 1:1:1

  1. Supplementary explanation of key style words

Original:

Short, less descriptive.

After 1st PE version:

More descriptions and aesthetics references.

Add Style Word

Superimposed style can significantly improve the aesthetics

Conclusion

  1. In the description of PE style dimension, whether the superimposed style can significantly improve the aesthetic feeling

  2. Create a list of high-frequency words and each mapping 20 style words with different similarity

  3. Reverse style words of high-frequency words (to avoid model automatic generation and reduce graphic matching and aesthetic feeling)

Key learnings
  1. Make Effective Design

StrategiesThe model effects trained by developers are often highly uncontrollable. Therefore making effective design strategies to optimize AI generative model effect is crucial. It is always important to keep insightful and curious. It is often worth testing out new strategies for the best outcome.

  1. Be A Good Collaborator

Working on a multi-faceted product involves collaborating with people from various backgrounds, including data scientists and product managers. Don’t be afraid to ask technical questions while working in liaison with them. Learning how the model operates from their perspectives fosters smooth communication and insightful ideation.