Fine-Tune Your Text-to-Image Models with lucataco/train-text-to-image-lora Actions

22 Apr 2025
Fine-Tune Your Text-to-Image Models with lucataco/train-text-to-image-lora Actions

In today's rapidly evolving world of AI-generated content, fine-tuning pre-trained models for specific tasks has become essential. The lucataco/train-text-to-image-lora specification provides a powerful Cognitive Action that allows developers to fine-tune Stable Diffusion models using Low-Rank Adaptation (LoRA). With this capability, you can efficiently adapt existing models for custom text-to-image generation while preserving their original weights.

Prerequisites

Before diving into the integration of Cognitive Actions, ensure you have:

  • An API key for the Cognitive Actions platform to authenticate your requests.
  • Familiarity with JSON payload structures to format your input correctly.

Authentication typically involves passing your API key in the request headers. Ensure you keep your API key secure and do not expose it in public repositories.

Cognitive Actions Overview

Fine-Tune Stable Diffusion Using LoRA

This action allows you to leverage the Hugging Face Diffusers to fine-tune Stable Diffusion models (SDv1.4, SDv1.5, SDv2.0, SDv2.1) with LoRA. This operation enhances the adaptation of pre-trained models while enabling control over the training process.

Input

The input schema for this action includes various fields that dictate how the fine-tuning process will be conducted. Here’s a breakdown of the required and optional fields:

  • Dataset (string): Specifies the Huggingface dataset identifier (default: lambdalabs/naruto-blip-captions).
  • ModelId (string): Identifies the Huggingface model for uploading results (default: naruto-lora).
  • Base Model (string): Defines the base model from Huggingface to be utilized (default: runwayml/stable-diffusion-v1-5).
  • Resolution (integer): Sets the resolution for training images (default: 512, range: 128-1024).
  • Learning Rate (number): Specifies the learning rate for training (default: 0.0001, range: 0.0001 to 0.01).
  • Validation Prompt (string): A prompt for validation purposes (default: "A naruto with blue eyes.").
  • Training Batch Size (integer): Specifies the batch size for training (default: 1, range: 1-4).
  • Maximum Gradient Norm (number): Sets the maximum allowable gradient norm (default: 1, range: 0.1-10).
  • Maximum Training Steps (integer): Defines the maximum number of training steps (default: 1000, range: 1-100,000).
  • Learning Rate Scheduler (string): Chooses the learning rate scheduling strategy (default: cosine).
  • Number of Training Epochs (integer): Specifies total training epochs (default: 100, range: 1-10,000).
  • Dataloader Number of Workers (integer): Indicates the number of workers for data loading (default: 8, range: 1-16).
  • Gradient Accumulation Steps (integer): Specifies the number of gradient accumulation steps (default: 4, range: 1-8).
  • Hf Token (string): An authentication token from Huggingface, required for accessing resources.
Example Input
{
  "dataset": "lambdalabs/naruto-blip-captions",
  "modelId": "naruto-lora",
  "baseModel": "runwayml/stable-diffusion-v1-5",
  "resolution": 512,
  "learningRate": 0.0001,
  "validationPrompt": "A naruto with blue eyes.",
  "trainingBatchSize": 1,
  "maximumGradientNorm": 1,
  "maximumTrainingSteps": 1000,
  "learningRateScheduler": "cosine",
  "numberOfTrainingEpochs": 100,
  "dataloaderNumberOfWorkers": 8,
  "gradientAccumulationSteps": 4
}

Output

Upon successful execution, this action typically returns a URL pointing to the output files generated from the fine-tuning process. For example:

https://assets.cognitiveactions.com/invocations/73487cd6-6539-43cd-9a71-39463689b266/620545bd-dc1f-4948-8537-4f364314be7b.tar

This URL can be used to access your trained model files.

Conceptual Usage Example (Python)

Here's a conceptual example of how you might call this action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "4afaf1df-158b-495b-9e7a-9ac288c1d09e" # Action ID for Fine-Tune Stable Diffusion Using LoRA

# Construct the input payload based on the action's requirements
payload = {
    "dataset": "lambdalabs/naruto-blip-captions",
    "modelId": "naruto-lora",
    "baseModel": "runwayml/stable-diffusion-v1-5",
    "resolution": 512,
    "learningRate": 0.0001,
    "validationPrompt": "A naruto with blue eyes.",
    "trainingBatchSize": 1,
    "maximumGradientNorm": 1,
    "maximumTrainingSteps": 1000,
    "learningRateScheduler": "cosine",
    "numberOfTrainingEpochs": 100,
    "dataloaderNumberOfWorkers": 8,
    "gradientAccumulationSteps": 4
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, the action ID and input payload are clearly defined. The endpoint URL and request structure are illustrative, aimed at helping you understand the integration process.

Conclusion

The lucataco/train-text-to-image-lora Cognitive Action provides an effective way to fine-tune Stable Diffusion models for customized text-to-image generation. By leveraging Hugging Face's resources, developers can create unique models tailored to their specific needs.

Explore the possibilities of AI-generated content by integrating these Cognitive Actions into your applications today!