Creating Stunning Images from Text with Hunyuan-DiT Cognitive Actions

23 Apr 2025
Creating Stunning Images from Text with Hunyuan-DiT Cognitive Actions

In the rapidly evolving world of artificial intelligence, the ability to generate images from textual descriptions has gained significant traction. The Hunyuan-DiT Cognitive Actions provide developers with a robust API for generating high-quality images based on bilingual prompts in both English and Chinese. This integration unlocks a multitude of creative possibilities, enabling applications to produce visually appealing content tailored to user specifications.

Prerequisites

Before diving into the integration of the Hunyuan-DiT Cognitive Actions, ensure you have the necessary prerequisites in place:

  • An API key for accessing the Cognitive Actions platform.
  • Basic knowledge of making API calls, particularly in Python or any other programming language you prefer.
  • Familiarity with JSON data structure, as input and output will be in JSON format.

To authenticate your requests, you will typically pass your API key in the headers of your requests.

Cognitive Actions Overview

Generate Text-to-Image with Hunyuan-DiT

The Generate Text-to-Image with Hunyuan-DiT action allows you to leverage a state-of-the-art diffusion transformer model to create images from text prompts. This action supports multi-turn interactions, making it versatile for both casual and professional applications.

Input

The input schema for this action comprises several fields that guide the image generation process:

  • seed (optional): An integer value to randomize the output. Leaving it blank allows for automatic randomization.
  • size: A string that selects the output size. Options include:
    • square (default)
    • landscape
    • portrait
  • prompt: A string that serves as the main input prompt for image generation.
  • sampler: A string that determines which sampling algorithm to use:
    • ddpm (default)
    • ddim
    • dpmms
  • enhancePrompt (optional): A boolean indicating whether to enhance the prompt with additional details (default is false).
  • guidanceScale: A number that adjusts the scale for classifier-free guidance, ranging from 1 to 20 (default is 6).
  • inferenceSteps: An integer defining the number of denoising steps, with a range from 1 to 500 (default is 40).
  • negativePrompt: A string that lists features or details to exclude from the generated image.

Example Input:

{
  "size": "square",
  "prompt": "一只聪明的狐狸走在阔叶树林里, 旁边是一条小溪, 细节真实, 摄影",
  "sampler": "ddpm",
  "enhancePrompt": false,
  "guidanceScale": 6,
  "inferenceSteps": 20,
  "negativePrompt": "错误的眼睛,糟糕的人脸,毁容,糟糕的艺术,变形,多余的肢体,模糊的颜色,模糊,重复,病态,残缺"
}

Output

The action returns a URL string pointing to the generated image. For example:

https://assets.cognitiveactions.com/invocations/ffd1f57e-ffe5-44be-8363-42cb5a340c6c/c13c85ab-73cf-466b-a7b3-3c76e0c9ba4f.png

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet to demonstrate how you might invoke the Generate Text-to-Image action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "10fb81ba-246c-4e0b-b44d-a563cc958049" # Action ID for Generate Text-to-Image with Hunyuan-DiT

# Construct the input payload based on the action's requirements
payload = {
    "size": "square",
    "prompt": "一只聪明的狐狸走在阔叶树林里, 旁边是一条小溪, 细节真实, 摄影",
    "sampler": "ddpm",
    "enhancePrompt": False,
    "guidanceScale": 6,
    "inferenceSteps": 20,
    "negativePrompt": "错误的眼睛,糟糕的人脸,毁容,糟糕的艺术,变形,多余的肢体,模糊的颜色,模糊,重复,病态,残缺"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, make sure to replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The action_id holds the ID of the Generate Text-to-Image action, and the payload is structured according to the required input fields.

Conclusion

The Hunyuan-DiT Cognitive Actions offer an exciting opportunity for developers to create visually stunning images from text. By utilizing the advanced capabilities of the Hunyuan-DiT model, you can enhance your applications with rich, contextually relevant visual content. Whether you're building a creative project or integrating image generation into a larger application, these actions provide a powerful and flexible solution.

Consider exploring additional use cases, such as content generation for blogs, social media, or even personalized marketing materials. The possibilities are endless!