Fine-Tune Your Imagery with the alexgenovese/train-sdxl-lora Cognitive Actions

22 Apr 2025
Fine-Tune Your Imagery with the alexgenovese/train-sdxl-lora Cognitive Actions

Integrating advanced image processing capabilities into your applications has never been easier. The alexgenovese/train-sdxl-lora Cognitive Actions offer developers the ability to enhance image datasets through custom training using Realistic Vision XL 4.0. This API provides pre-built actions that streamline the training process, allowing for advanced customization and optimization. Whether you aim to improve model accuracy or automate the creation of image datasets, these actions are designed to facilitate your development workflow.

Prerequisites

Before diving into the integration of Cognitive Actions, ensure you have the following set up:

  • An API key for accessing the Cognitive Actions platform.
  • Familiarity with handling JSON payloads and API requests.

Authentication typically involves passing your API key in the request headers to authenticate your calls.

Cognitive Actions Overview

Perform Custom Training with Realistic Vision XL 4.0

This action enables you to fine-tune image datasets using Realistic Vision XL 4.0 (RealVisXL 4). It supports mixed precision bf16 and LoRA integration, offering advanced customization options through various optimizers and learning rate schedulers.

  • Category: Image Processing

Input

The input schema for this action requires the following fields:

  • inputImages (required): A URI to a .zip or .tar file containing the images for fine-tuning.
  • seed (optional): An integer for a reproducible random seed.
  • verbose (optional): Boolean to enable detailed output logs.
  • resolution (optional): The resolution to which your images will be resized (default is 1024).
  • tokenString (optional): A unique identifier string for the concept in the images.
  • optimization (optional): The optimization algorithm to use (default is "AdamW"). Options: "AdamW", "AdaFactor", "AdamWeightDecay".
  • captionPrefix (optional): A prefix for captions used in automatic captioning.
  • loraAlphaRank (optional): Dimension rank for LoRA Alpha (default is 16).
  • cropBySalience (optional): Boolean to crop images based on salient regions.
  • trainingBatchSize (optional): Batch size for training (default is 3).
  • numberOfTrainingEpochs (optional): Number of times to loop over the training dataset (default is 20).
  • maximumTrainingSteps (optional): Total training steps, overriding epoch number.
  • checkpointingSteps (optional): Steps between saving model checkpoints (default is 999999).

Example Input

Here’s an example of the JSON payload you would use:

{
  "seed": 82,
  "verbose": true,
  "resolution": 1024,
  "inputImages": "https://replicate.delivery/pbxt/Jai9c4k3MPpZHVP6Ypro2J8viBe9wgzQ7hgvK5YE1cWm2yKS/image_dataset_1024.zip",
  "tokenString": "siduhc",
  "optimization": "AdamW",
  "captionPrefix": "a photo of siduhc blazer",
  "loraAlphaRank": 32,
  "cropBySalience": false,
  "useLoraTraining": true,
  "loraLearningRate": 0.0004,
  "objectClassToken": "blazer",
  "uNetLearningRate": 0.0001,
  "useFaceDetection": false,
  "loraEmbeddingRank": 32,
  "trainingBatchSize": 2,
  "checkpointingSteps": 999999,
  "clipSegTemperature": 1,
  "inputImageFileType": "infer",
  "learningRateScheduler": "constant",
  "numberOfTrainingEpochs": 100,
  "learningRateWarmupSteps": 0,
  "textualInversionLearningRate": 0.0001
}

Output

The output of this action typically returns a URI to the trained model's artifacts:

https://assets.cognitiveactions.com/invocations/94590fdc-13f8-42b0-920e-7a012c15ce00/f741233f-b545-4387-bcc2-db2a3362ca72.tar

Conceptual Usage Example (Python)

Here’s how you might call this action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "a2b65d6e-f934-40b3-9581-5d370aba0c06" # Action ID for Perform Custom Training with Realistic Vision XL 4.0

# Construct the input payload based on the action's requirements
payload = {
    "seed": 82,
    "verbose": true,
    "resolution": 1024,
    "inputImages": "https://replicate.delivery/pbxt/Jai9c4k3MPpZHVP6Ypro2J8viBe9wgzQ7hgvK5YE1cWm2yKS/image_dataset_1024.zip",
    "tokenString": "siduhc",
    "optimization": "AdamW",
    "captionPrefix": "a photo of siduhc blazer",
    "loraAlphaRank": 32,
    "cropBySalience": false,
    "useLoraTraining": true,
    "loraLearningRate": 0.0004,
    "objectClassToken": "blazer",
    "uNetLearningRate": 0.0001,
    "useFaceDetection": false,
    "loraEmbeddingRank": 32,
    "trainingBatchSize": 2,
    "checkpointingSteps": 999999,
    "clipSegTemperature": 1,
    "inputImageFileType": "infer",
    "learningRateScheduler": "constant",
    "numberOfTrainingEpochs": 100,
    "learningRateWarmupSteps": 0,
    "textualInversionLearningRate": 0.0001
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace the placeholder API key and endpoint with your actual values. The payload variable contains the structured input needed for the action, and you can see how the action ID is integrated into the request structure.

Conclusion

The alexgenovese/train-sdxl-lora Cognitive Actions present a powerful way to enhance your image datasets through custom training. By leveraging these actions, you can fine-tune your models effectively, harnessing the capabilities of Realistic Vision XL 4.0. As you explore the potential of these Cognitive Actions, consider various use cases such as product recognition, automated image captioning, and more. Start integrating today and take your image processing to the next level!