Text-to-Image Task
How to define a text-to-image task
The Stable Diffusion Task Framework has two components:
A generalized schema to define a Stable Diffusion task.
An execution engine that runs the task defined in the above schema.
The task definition is represented in the key-value pairs that can be transformed into, among many other formats, a JSON string, which can be validated using a JSON schema. And the validation tools exist for most of the popular programming languages.
The execution engine is integrated into the node of the Hydrogen Network, and the JSON string format of the task definition is used to send tasks in the Hydrogen Network.
The following is an intuitive look at a task definition:
More examples of the different Stable Diffusion tasks can be found in the GitHub repository.
Acceleration of the Image Generation
SDXL Turbo
SDXL Turbo is an adversarial time-distilled Stable Diffusion XL (SDXL) model capable of running inference in as little as 1 step. To use SDXL Turbo in your task:
1. Use the SDXL Turbo model as the base model:
2. Set the timestep_spacing
scheduler argument:
timestep_spacing
scheduler argument:3. Set cfg
to zero, and set steps to 1-4:
cfg
to zero, and set steps to 1-4:Latent Consistency Models (LCM)
Negative prompts won't work with LCM methods.
Latent Consistency Models (LCMs) enable fast high-quality image generation by directly predicting the reverse diffusion process in the latent rather than pixel space. In other words, LCMs try to predict the noiseless image from the noisy image in contrast to typical diffusion models that iteratively remove noise from the noisy image. By avoiding the iterative sampling process, LCMs are able to generate high-quality images in 2-4 steps instead of 20-30 steps.
There are two ways LCM could be used in a Stable Diffusion task: LCM and LCM-LoRA:
Base Model
The base model could be the original Stable Diffusion models, such as the Stable Diffusion 1.5 and the Stable Diffusion XL, or a checkpoint that is fine-tuned based on the original Stable Diffusion models.
The model can be specified in two ways: a Huggingface model ID, or a file download URL.
Huggingface Model ID
The Huggingface model ID for the original Stable Diffusion models are listed below:
Stable Diffusion 1.5
Stable Diffusion 2.1
Stable Diffusion XL
Custom Fine-tuned Checkpoints
Other custom fine-tuned checkpoints based on the original SD models can also be used, for example, the ChilloutMix model on the Huggingface:
File Download URL
A URL can also be used as the base model. The execution engine will download the file before executing the task.
For example, if we want to use an SDXL fined-tuned checkpoint on Civitai. The webpage of the model is https://civitai.com/models/169868/thinkdiffusionxl and the download link of the model file can be copied from the download button on the webpage:
https://civitai.com/api/download/models/190908
We could use the model in the task as following:
Only safetensors
format is supported in the download URL.
The execution engine assumes the download URL to be a binary stream of a model file in the safetensors
format. If other formats are used, or the content of the link is not a model file at all, the execution engine will throw an exception during the execution.
LoRA Model
LoRA models can be specified using the same format as the base model: the Huggingface model ID or the file download URL. The weight of the LoRA model can also be set in the arguments:
The weight should be an integer between 1 and 100.
If the LoRA model given is not compatible with the base model, for example, a LoRA model fine-tuned on the Stable Diffusion 1.5 is used, but the base model is set to be Stable Diffusion XL, the execution engine will also throw an exception.
Controlnet
The Controlnet section has two parts: the Controlnet model, and the preprocess method.
The Controlnet model also supports the Huggingface ID and the download URL, which is exactly the same as the LoRA model.
The control image should be a PNG image encoded in the DataURL format. The DataURL string should be filled in the image_dataurl
field.
Image Preprocessing
The image preprocessing function is implemented using the controlnet_aux
project. All the preprocessing methods and models in this project can be used:
Here is a list of all the available preprocess methods and their arguments:
canny
high_threshold, low_threshold
scribble_hed
scribble_hedsafe
softedge_hed
softedge_hedsafe
depth_midas
mlsd
thr_v, thr_d
openpose
openpose_face
openpose_faceonly
openpose_full
openpose_hand
dwpose
scribble_pidinet
apply_filter
softedge_pidinet
apply_filter
scribble_pidisafe
apply_filter
softedge_pidisafe
apply_filter
normal_bae
lineart_coarse
lineart_realistic
lineart_anime
depth_zoe
gamma_corrected
depth_leres
thr_a, thr_b
depth_leres++
thr_a, thr_b
shuffle
h, w, f
mediapipe_face
max_faces, min_confidence
If preprocessing is not needed, just set the value of the controlnet
section to be null, or just delete the section from the JSON.
Prompt
Unlike the basic SD models, the length of the prompt is not limited in this framework. The prompt and the negative prompt are specified separately:
Prompt Weighting
Prompt weighting is supported using the Compel library. The basic idea is to put more plus signs (+
) to give the word more weights. More complex usages can be found in the documentation of the Compel library.
Textual Inversion
Textual Inversion models are also supported:
VAE
The VAE model used in the Stable Diffusion pipeline can also be replaced with another one, either from the Huggingface ID, or a file download URL:
SDXL Refiner
If the Stable Diffusion XL is selected as the base model in the task, the SDXL Refiner could also be used to further refine the image, which is by design of the SDXL:
The denoising_cutoff
is used to stop the denoising process earlier in the pipeline, when the noise level reaches the cutoff value, and leave the rest to the refiner model, which is called the ensemble of expert denoisers.
If the Controlnet is used with the Stable Diffusion XL base model, the denoising_cutoff
argument is not supported due to the current limitations in the diffusers library. If refiner is configured, it will be executed after the base model generation is completed, the cutoff value is ignored.
Task Config
There are also some config options that can be tuned:
Hydrogen Network requires a deterministic image generation process, which means the images generated on the different nodes, given the same task definition, should be as close as possible. This is a requirement for the consensus protocol to work. The seed is left as a required argument in the task definition so that all the nodes could use the same seed to initialize their random number generators, which will hopefully produce the same random numbers across all the nodes.
Beside the seed, the Stable Diffusion Task Framework has been implemented to maximize the reproducibility, for all the components used, across the whole image generation process.
Last updated