Text-to-Image Task
How to define a text-to-image task
The Stable Diffusion Task Framework has two components:
A generalized schema to define a Stable Diffusion task.
An execution engine that runs the task defined in the above schema.
The task definition is represented in the key-value pairs that can be transformed into, among many other formats, a JSON string, which can be validated using a JSON schema. And the validation tools exist for most of the popular programming languages.
The execution engine is integrated into the node of the Hydrogen Network, and the JSON string format of the task definition is used to send tasks in the Hydrogen Network.
The following is an intuitive look at a task definition:
Acceleration of the Image Generation
SDXL Turbo
1. Use the SDXL Turbo model as the base model:
2. Set the timestep_spacing
scheduler argument:
timestep_spacing
scheduler argument:3. Set cfg
to zero, and set steps to 1-4:
cfg
to zero, and set steps to 1-4:Latent Consistency Models (LCM)
Negative prompts won't work with LCM methods.
There are two ways LCM could be used in a Stable Diffusion task: LCM and LCM-LoRA:
Base Model
The base model could be the original Stable Diffusion models, such as the Stable Diffusion 1.5 and the Stable Diffusion XL, or a checkpoint that is fine-tuned based on the original Stable Diffusion models.
The model can be specified in two ways: a Huggingface model ID, or a file download URL.
Huggingface Model ID
The Huggingface model ID for the original Stable Diffusion models are listed below:
Stable Diffusion 1.5
Stable Diffusion 2.1
Stable Diffusion XL
Custom Fine-tuned Checkpoints
File Download URL
A URL can also be used as the base model. The execution engine will download the file before executing the task.
We could use the model in the task as following:
LoRA Model
LoRA models can be specified using the same format as the base model: the Huggingface model ID or the file download URL. The weight of the LoRA model can also be set in the arguments:
The weight should be an integer between 1 and 100.
If the LoRA model given is not compatible with the base model, for example, a LoRA model fine-tuned on the Stable Diffusion 1.5 is used, but the base model is set to be Stable Diffusion XL, the execution engine will also throw an exception.
Controlnet
The Controlnet section has two parts: the Controlnet model, and the preprocess method.
The Controlnet model also supports the Huggingface ID and the download URL, which is exactly the same as the LoRA model.
The control image should be a PNG image encoded in the DataURL format. The DataURL string should be filled in the image_dataurl
field.
Image Preprocessing
Here is a list of all the available preprocess methods and their arguments:
canny
high_threshold, low_threshold
scribble_hed
scribble_hedsafe
softedge_hed
softedge_hedsafe
depth_midas
mlsd
thr_v, thr_d
openpose
openpose_face
openpose_faceonly
openpose_full
openpose_hand
dwpose
scribble_pidinet
apply_filter
softedge_pidinet
apply_filter
scribble_pidisafe
apply_filter
softedge_pidisafe
apply_filter
normal_bae
lineart_coarse
lineart_realistic
lineart_anime
depth_zoe
gamma_corrected
depth_leres
thr_a, thr_b
depth_leres++
thr_a, thr_b
shuffle
h, w, f
mediapipe_face
max_faces, min_confidence
If preprocessing is not needed, just set the value of the controlnet
section to be null, or just delete the section from the JSON.
Prompt
Unlike the basic SD models, the length of the prompt is not limited in this framework. The prompt and the negative prompt are specified separately:
Prompt Weighting
Textual Inversion
Textual Inversion models are also supported:
VAE
The VAE model used in the Stable Diffusion pipeline can also be replaced with another one, either from the Huggingface ID, or a file download URL:
SDXL Refiner
If the Stable Diffusion XL is selected as the base model in the task, the SDXL Refiner could also be used to further refine the image, which is by design of the SDXL:
Task Config
There are also some config options that can be tuned:
Last updated