Skip to content

Feature Request: PhotoBuilder #90

@manuelkiessling

Description

@manuelkiessling

The SiteBuilder application needs to be extended with a PhotoBuilder sub-application.

The goal is to allow users to easily generate images that naturally match the contents of their web pages, allowing a flow of "generate new content page, generate images that match the content page, enhance the content page with the generated images".

The workflow should be as follows:

From the content editor UI, the list of preview pages features a "generate matching images" CTA for each page

The CTA leads to a dedicated, self-contained PhotoBuilder UI for the chosen page.

When this page is opened, a waiting/loading animation is shown, while on the backend, an agent is fed the contents of the chosen page, together with an app-provided system prompt, invisible to the user, and a pre-filled user-editable prompt, like this (the User Prompt language is set depending on the currently chosen SiteBuilder UI language):

== SYSTEM PROMPT ==
You are a a friendly AI assistant that helps the user to generate 5 prompts that each will be fed into an LLM-backed AI image generation agent, in order to generate images that shall be used on a web page with the following contents:

{content_page_html}

Think about what each of the 5 images should show in order to optimally fit the narative of the web page content.

== USER PROMPT ==

The generated images should convey professionalism and competence.

Once the 5 different image generation prompts have been generated from this, the waiting/loading animation is hidden, and the actual PhotoBuilder UI is presented:

  • A textarea with the User Prompt, and a "Regenerate image prompts" call-to-action.
  • A grid with 5 elements, one for each image.
  • Each grid element consists of 5 sub-elements:
    1. The generated image, or, if it is currently being generated, a placeholder element that conveys the notion of "this is currently being generated"
    1. A textarea with the text prompt that is used for generating the image; on initial page load, this is prefilled with the text prompt that was generated by the agent
    1. A checkbox, set to unchecked by default, with label "Keep prompt"
    1. A call-to-action labeled "Regenerate image"
    1. A call-to-action labeled "Upload to media store"
  • The Media Store Upload and Browser UI, identical to the one embedded on the Content Editor UI
  • An "Embed generated images into content page" CTA

On first page load, after all prompts have been generated, the frontend immediately triggers image generation on all five images using the generated prompts. As soon as an image has been generated, it is shown on its grid element. While at least one image generation is ongoing, all CTAs ("Regenerate image prompts", "Regenerate image") are disabled and cannot be triggered by the user.

Once all images have been generated, the user can do several things:

  • Edit the user prompt section of the "master prompt", and clicking "Regenerate image prompts", which updates the image prompts for all images where "Keep prompt" is unchecked, as well as triggering image regeneration for all images whose prompts have been thus modified
  • Edit a single image prompt, which immediately checks the "Keep prompt" checkbox for that image element (as soon as one character of the prompt has been edited)
  • Click the "Regenerate image" CTA for a single image, which triggers regeneration
  • Click the "Upload to media store" CTA for a single image, which has the same effect as uploading an image file from the users computer to the media store via the Content Editor UI
  • Click the "Embed generated images into content page" CTA, which results in the user being sent to the Content Editor UI, with the user chat message pre-filled with "Embed images 1.jpg, 2.jpg, 3.jpg, 4.jpg, 5.jpg into page x.html".

Caveats: The embedded Media Store upload and browse element, and the "Upload to media store" CTA on each image grid element, and the "Embed generated images into content page" CTA, are only shown if the project (that the page belongs to) has an assets upload target defined.

General considerations: The image generation process must return two pieces of information to the frontend whenever a new image has been generated: The URL to the image itself, so that it can be presented in the UI, and an LLM-generated image file name that is very descriptive (think "a-cozy-cafe-with-people-of-all-ages-in-a-winterly-city-jpg", not "care.jpg" or "83476346.jpg"). This is so that when the images are put on the Media Store, their name alone tells the Content Editor LLM what the image depicts.

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions