Build a Text-to-Image Generator Web App: Flask and Streamlit

Lu Zhenna
Towards AI
Published in
9 min readJan 8, 2024

--

Summary

This article uses Hugging Face DiffusionPipeline to generate images from several popular pre-trained checkpoints, specifically without a GPU. Subsequently, it demonstrates how to make a front-end web app, using both Flask and Streamlit, for users to type a text prompt and then download the generated image.

Target Audience

  • Data scientists who want to learn the difference between Flask and Streamlit.
  • Python developers who want to make web applications that support text input and image file output.

Outline

  1. Run StableDiffusion Pipeline without a GPU
  2. Conceptualize the web app UI
  3. Build a web app using Streamlit
  4. Build a web app using Flask
  5. Compare images generated by different models

Hugging Face has developed a base class, diffusers.DiffusionPipeline, for users to load different pre-trained models for inference. The online documentation assumes users have a GPU. However, if you do not have a GPU and want to run some quick Stable Diffusion experiments, please read section 1.

  1. Run StableDiffusion Pipeline without a GPU

(This section is not applicable if your machine can use CUDA.)

This is the code snippet I copied from the Hugging Face website.

The original code to load the Stable Diffusion Pipeline.

Because my laptop doesn’t have a GPU, running the code directly returned this error message: Torch not compiled with CUDA enabled.

Error message for running Stable Diffusion pipeline without a GPU.

So I removed the .to("cuda") from the code snippet. However, I ran into another error while running the inference code.

The original code for inference using the Stable Diffusion pipeline.

The error message is "LayerNormKernelImpl" not implemented for 'Half'.

So I searched for the error online, apparently, the CPU does not support the “half format”, a.k.a. torch.float16. So I changed it to torch.float32. It finally ran, albeit slowly.

The code for inference without a GPU.

If you are not sure whether GPU is available, you can use the code below:

from diffusers import StableDiffusionPipeline
import torch

if torch.cuda.is_available():
print("Using GPU")
distilled = StableDiffusionPipeline.from_pretrained(
"nota-ai/bk-sdm-small",
torch_dtype=torch.float16,
use_safetensors=True,
).to("cuda")
else:
print("Using CPU")
distilled = StableDiffusionPipeline.from_pretrained(
"nota-ai/bk-sdm-small",
torch_dtype=torch.float32,
use_safetensors=True,
)

Now you should be able to use the Stable Diffusion pipeline to generate images from a text prompt. To read my code for inference, feel free to check out my GitHub repo. For this article, I will try out 7 different pre-trained checkpoints and make comparisons later on.

2. Conceptualize the web app UI

We must design the web app with users’ needs in mind.

First, the text-to-image generator should allow users to type a text message and download the generated image. Second, users may want to try out different checkpoints for the diffusion pipeline class. Last, users may also want to know how long these checkpoints take to load the pipeline components and then make an inference.

In sum, we need a text box input, a drop-down list input, a text output to show the processing time, and an image output with a download button.

In terms of the sequence of user web app interactions, the users should be able to submit a text input and the selected model by clicking on a “submit” button. Subsequently, the inference function can capture these inputs and return an image with timestamps. Last, the image will be displayed on the web page with the corresponding processing time.

Let’s get started!

3. Build a web app using Streamlit

Streamlit is a free and open-source framework to rapidly build and share ML web apps.

To start, make sure you have installed streamlit by running pip install streamlit. Next, create a Python script and add all components, i.e., header, buttons, text box, spinner, etc, in this script.

import streamlit as st
from inference import text2image
from io import BytesIO


def app():
st.header("Text-to-image Web App")
st.subheader("Powered by Hugging Face")
user_input = st.text_area(
"Enter your text prompt below and click the button to submit."
)

option = st.selectbox(
"Select model (in order of processing time)",
(
"nota-ai/bk-sdm-small",
"CompVis/stable-diffusion-v1-4",
"runwayml/stable-diffusion-v1-5",
"prompthero/openjourney",
"hakurei/waifu-diffusion",
"stabilityai/stable-diffusion-2-1",
"dreamlike-art/dreamlike-photoreal-2.0",
),
)

with st.form("my_form"):
submit = st.form_submit_button(label="Submit text prompt")

if submit:
with st.spinner(text="Generating image ... It may take up to 1 hour."):
im, start, end = text2image(prompt=user_input, repo_id=option)

buf = BytesIO()
im.save(buf, format="PNG")
byte_im = buf.getvalue()

hours, rem = divmod(end - start, 3600)
minutes, seconds = divmod(rem, 60)

st.success(
"Processing time: {:0>2}:{:0>2}:{:05.2f}.".format(
int(hours), int(minutes), seconds
)
)

st.image(im)

st.download_button(
label="Click here to download",
data=byte_im,
file_name="generated_image.png",
mime="image/png",
)


if __name__ == "__main__":
app()

Last, test the web app by running streamlit run <script-name>.py in the terminal and then paste the local host url http://localhost:8501 in your browser.

Run `streamlit run app.py` in the terminal

Let’s try it out!

Streamlit has done a great job! Without any HTML and CSS, we can render a pretty decent web application!

Enter a text prompt in the Streamlit image generator web app.

I entered a text prompt and chose a model. FYI, mine was not a great example due to a lack of details. “Chinese woman” is a vague term, I am not sure how the chosen model was trained. If the training images did not include enough samples of Chinese women, then it’s hard to generate one. I also did not elaborate on “meditating”. To generate a high-quality image, we should use simple verbs and provide more details as if we are talking to a visually challenged person.

Select a pre-trained model from the drop-down menu.

After clicking the “submit” button, we just patiently wait for the image to be generated!

An image generated by “stabilityai/stable-diffusion-2–1”.

Here is my favorite image generated by "stabilityai/stable-diffusion-2-1". AI is way too smart. It was partially my fault that I did not describe how this woman should look like. So, most generated images did not show the front face. In the meantime, the hairstyle, dressing style, and even the makeup portrayed an Oriental lady.

Off on a tangent. I totally understand why the artists in one of my former companies hate AI image generators. Generative AI is a smart copycat. When AI copies your creativity and charges a lowball price, you do not know whose face to punch. Creativity is the highest form of intelligence, at least before the advent of Generative AI.

4. Build a web app using Flask

If a minimalistic web app does not satisfy you, you might want to try Flask instead. It means you have more decisions to make, for instance, how to align the text, what color your buttons should be, etc. “With great freedom comes great responsibility.” You have to pick up HTML, CSS, JavaScript, and more.

Here is my code to run a flask app, HTML templates can be found in my GitHub repo.

from flask import Flask, request, render_template, flash, send_file
from inference import text2image

app = Flask(__name__)
app.secret_key = "super secret key"
# RuntimeError: The session is unavailable because no secret key was set.

# os.mkdir("static")
IMAGE_PATH = "static/image.jpg"


@app.route("/", methods=["GET", "POST"])
def index():
return render_template("index.html")


@app.route("/download")
def download_img():
return send_file(IMAGE_PATH, as_attachment=True)


@app.route("/index_response", methods=["GET", "POST"])
def generate_image():
user_input = request.form.get("user_input")
model = request.form.get("models")

im, start, end = text2image(prompt=user_input, repo_id=model)
im.save(IMAGE_PATH)

hours, rem = divmod(end - start, 3600)
minutes, seconds = divmod(rem, 60)

msg = "Processing time: {:0>2}:{:0>2}:{:05.2f}.".format(
int(hours), int(minutes), seconds
)
flash(
msg,
category="success",
)

return render_template(
"index_response.html",
image_url=IMAGE_PATH,
)

I will highlight the key differences between Streamlit and Flask.

First, let’s compare how Streamlit and Flask implement the success message. The former only requires one line of code: st.success("<message-content>"). Whereas the latter needs the flash library AND a session secret key. In short, without a properly configured Flask session, you cannot show flash messages and will encounter the RuntimeError: The session is unavailable because no secret key was set.

Second, different from Streamlit, which allows you to configure a download button itself, button text label, the corresponding file, its file type, and download file path within one function st.download_button(), Flask requires you to define a separate endpoint for the download function and the logic to invoke this endpoint in HTML templates, not to mention the additional CSS styling syntax to beautify the button.

Third, you need HTML templates for Flask apps. If you have great aesthetic sensitivity, this is probably a plus point. Otherwise, you might end up making ugly web apps after all your efforts. With Streamlit, you can easily create minimalistic web pages that will never be ugly.

Last, Flask does not allow me to create a loader while the backend function is making an inference without JavaScript functions. On the other hand, it can be done with st.spinner() in Streamlit.

If your script is named app.py, simply run flask run to open the UI.

Run `flask run` in the terminal.

Let’s take a look at my Flask app.

Enter a text prompt in the Flask app.
The Flask app displays the generated image.

By now, I hope you are convinced to learn Streamlit. I am now motivated to learn JavaScript because I want to get better at Flask and probably Django, too.

6. Compare images generated by different models

With the same text prompt, let’s look at images generated by different models and my not-so-objective reviews.

Image generated by “nota-ai/bk-sdm-small”

The "nota-ai/bk-sdml-small" is the fastest. I should not complain.

Image generated by “CompVis/stable-diffusion-v1–4”

This garden looks haunted. Well, to be fair, CompVis/stable-diffusion-v1–4 dares to show a front face. I don’t even show my face without makeup.

Image generated by “runwayml/stable-diffusion-v1–5”

Beautiful sunset. However, I asked for a pond and a garden, and the runwayml/stable-diffusion-v1-5 overdelivered. Good job. AI knows where to meditate.

Image generated by “prompthero/openjourney”

It looks haunted too.

Image generated by “hakurei/waifu-diffusion”

It could not generate a sunset. With a different seed, you can see an Asian woman. Overall, the art style is still impressive. Maybe “hakurei/waifu-diffusion” is more catered to anime-themed content.

Image generated by “stabilityai/stable-diffusion-2–1”

Beautiful garden generated by “stabilityai/stable-diffusion-2–1”.

Image generated by “dreamlike-art/dreamlike-photoreal-2.0”

The “dreamlike-art/dreamlike-photoreal-2.0” It is very impressive, although it takes the longest time. As long as it does not show a front face, it can easily pass as a real photo.

End of AI art exhibition. Artists can rest assured that AI is still not as competent as you! Joking aside, the AI models were trained on fantastic art pieces created by humans. AI is great at detecting patterns. However, spontaneous sparks of creativity are a unique gift of the human mind that is not easily simulated by a computer.

Follow me on LinkedIn | 👏🏽 for my story | Follow me on Medium

--

--

Data Scientist | Data Engineer | Cognitive Psychology and Neuroscience PhD | 🤝 Connect with me on https://www.linkedin.com/in/zhenna-lu/