A Comprehensive Guide to Setting Up and Using Stable Diffusion 3 Medium Locally

Understanding Stable Diffusion 3 Medium

Stability AI champions the open-source philosophy, making powerful AI models like Stable Diffusion 3 Medium accessible for local deployment. This approach grants users a significant advantage in terms of security and control, as the model weights and parameters can be run on personal servers and infrastructure, unlike closed-source alternatives.

Downloading SD3 Medium Model Weights

To begin, you will need to download the SD3 Medium model weights. Stability AI provides several variants, each with different components and resource requirements:

sd3_medium.safetensors: This file contains the MMDiT and VAE weights but excludes text encoders.
sd3_medium_incl_clips_t5xxlfp16.safetensors: This comprehensive package includes all necessary weights, featuring the fp16 version of the T5XXL text encoder.
sd3_medium_incl_clips_t5xxlfp8.safetensors: Offers a balance between quality and resource usage, incorporating the fp8 version of the T5XXL text encoder.
sd3_medium_incl_clips.safetensors: This version includes all essential weights except for the T5XXL text encoder. It requires minimal resources, though its performance may differ without the T5XXL component.

Additionally, the downloaded package includes a text_encoders folder containing three text encoders with their original model card links. It is important to note that all components within the text_encoders folder are subject to their respective original licenses. An example_workflows folder is also provided, containing pre-configured workflows for ComfyUI.

Setting Up with ComfyUI

ComfyUI is a popular open-source interface developed by the Stable Diffusion community, widely adopted for its flexible node-based workflow system. This allows users to construct custom image generation pipelines by connecting various pre-built or community-developed nodes.

For optimal use with ComfyUI, it is recommended to use the sd3_medium_incl_clips.safetensors model. This specific file includes the necessary CLIP embedding models, eliminating the need for separate loading and simplifying the workflow setup.

While specific installation steps for ComfyUI are beyond the scope of this text, numerous video tutorials and resources are available within the Stable Diffusion community and linked in the provided resources section to guide you through the installation process.

Utilizing the Hugging Face Diffusers Library with Python

For users who prefer a programmatic approach or wish to integrate SD3 Medium into their Python applications, the Hugging Face Diffusers library offers a powerful and convenient solution.

Installing Necessary Libraries

Begin by installing the required libraries. Open your terminal and execute the following commands:

# For installing via your terminal
pip install -U "huggingface_hub[cli]"
pip install torch diffusers
pip install torch transformers accelerate
pip install --upgrade diffusers

If you are working within a Jupyter Notebook environment, use the following commands:

# For installing in your Jupyter runtime
%pip install -U "huggingface_hub[cli]"
%pip install torch diffusers
%pip install torch transformers accelerate
%pip install --upgrade diffusers

Setting Up Hugging Face Credentials

To authenticate with Hugging Face and access models, run the following command in your terminal and follow the prompts to log in:

huggingface-cli login

Generating Images with Diffusers

Once the libraries are installed and credentials are set up, you can generate images using the following Python script. This script checks for CUDA availability to leverage GPU acceleration if possible.

import torch
from diffusers import StableDiffusion3Pipeline

# Check if a CUDA-enabled GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model
# For optimal performance, use torch.float16 on CUDA-enabled GPUs
pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16 if device == "cuda" else torch.float32
)
pipe.to(device)

# Define the prompt and generation parameters
prompt = "a photo of a cat holding a sign that says hello world"
negative_prompt = ""  # Add any negative prompts here to guide the generation away from unwanted elements
num_inference_steps = 28
height = 1024
width = 1024
guidance_scale = 7.0

# Generate an image
# The pipeline returns a list of images; we take the first one.
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=num_inference_steps,
    height=height,
    width=width,
    guidance_scale=guidance_scale
).images[0]

# Save the generated image
image.save("sd3_hello_world.png")

print("Image generated and saved as sd3_hello_world.png")

System Requirements Considerations

When running Stable Diffusion 3 Medium locally, consider the hardware requirements. For the full model including the T5XXL text encoder, a GPU with approximately 12 GB of VRAM is recommended. A smaller variant, such as the one without the T5XXL encoder (e.g., sd3_medium_incl_clips.safetensors), can operate with around 8 GB of VRAM, making it more accessible for a wider range of consumer hardware.

Further Resources

For more in-depth information and community support, refer to the following resources:

Hugging Face: Explore the official model repository for detailed documentation, additional files, and community discussions.
ComfyUI: Visit the ComfyUI project page for installation guides, tutorials, and community-developed workflows. Specific resources for ComfyUI installation and management can be found through community channels.

By following this guide, you can successfully set up and utilize Stable Diffusion 3 Medium on your local machine, unlocking its potential for advanced image generation tasks while maintaining control over your data and workflow.