Create Stunning Images from Text Prompts with NVIDIA SANA-Sprint Actions

22 Apr 2025
Create Stunning Images from Text Prompts with NVIDIA SANA-Sprint Actions

The NVIDIA SANA-Sprint 1.6b API offers a powerful set of Cognitive Actions designed for generating high-quality images from textual descriptions. By utilizing advanced diffusion techniques and consistent distillation, developers can create visually striking images with minimal effort. These pre-built actions take the complexity out of image generation, allowing you to focus on crafting engaging content for your applications.

Prerequisites

Before diving into the integration of NVIDIA SANA-Sprint Cognitive Actions, ensure you meet the following prerequisites:

  • API Key: You will need an API key to authenticate your requests with the Cognitive Actions platform.
  • Basic Knowledge of JSON: Familiarity with JSON formatting is essential for constructing the input payload.

Authentication typically involves passing your API key in the request headers. This allows for secure access to the Cognitive Actions services.

Cognitive Actions Overview

Generate Image with Sana-Sprint

The Generate Image with Sana-Sprint action allows you to convert text prompts into stunning, high-resolution images. This action employs one-step diffusion and continuous-time consistency distillation to produce images that align closely with the provided descriptions.

  • Category: Text-to-Image

Input

The input schema for this action includes several fields that control the image generation process:

  • prompt (required): A string describing the desired content of your image.
    Example: "a tiny astronaut hatching from an egg on the moon"
  • seed (optional): An integer seed value for randomization. Use -1 for a random seed.
    Example: -1
  • width (optional): An integer specifying the width of the output image in pixels (between 256 and 4096).
    Example: 1024
  • height (optional): An integer specifying the height of the output image in pixels (between 256 and 4096).
    Example: 1024
  • cfgScale (optional): A number that determines adherence to the prompt, ranging from 1 to 20.
    Example: 4.5
  • outputFormat (optional): A string indicating the desired output format, with options for webp, jpg, and png.
    Example: "jpg"
  • outputQuality (optional): An integer for image quality, from 0 (lowest) to 100 (highest), ignored for PNG outputs.
    Example: 80
  • samplingSteps (optional): An integer indicating the number of sampling steps for image generation (valid values are 1 to 4).
    Example: 2
  • intermediateSteps (optional): A number for intermediate timesteps when samplingSteps is set to 2, recommended values range from 1.0 to 1.4.
    Example: 1.3

Example Input:

{
  "seed": -1,
  "width": 1024,
  "height": 1024,
  "prompt": "a tiny astronaut hatching from an egg on the moon",
  "cfgScale": 4.5,
  "outputFormat": "jpg",
  "outputQuality": 80,
  "samplingSteps": 2,
  "intermediateSteps": 1.3
}

Output

Upon executing this action, a URL to the generated image is returned. The output format will be based on your specified parameters.

Example Output:

https://assets.cognitiveactions.com/invocations/9171bfe1-73d1-4d30-9158-7fe40b543c32/03fb1460-bbf8-4914-94ea-7123004c90d5.jpg

Conceptual Usage Example (Python)

Below is a conceptual Python code snippet to illustrate how you might call the Generate Image with Sana-Sprint action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "6fa91a28-f0ea-4d40-a6a5-9c10dac73b33"  # Action ID for Generate Image with Sana-Sprint

# Construct the input payload based on the action's requirements
payload = {
    "seed": -1,
    "width": 1024,
    "height": 1024,
    "prompt": "a tiny astronaut hatching from an egg on the moon",
    "cfgScale": 4.5,
    "outputFormat": "jpg",
    "outputQuality": 80,
    "samplingSteps": 2,
    "intermediateSteps": 1.3
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The payload variable is structured according to the action's input schema, and the endpoint URL is illustrative.

Conclusion

The NVIDIA SANA-Sprint 1.6b Cognitive Actions provide an efficient way to generate high-quality images from text prompts, streamlining the creative process for developers. By leveraging these actions, you can enhance your applications with captivating visual content. Explore different prompts and settings to see the full potential of the image generation capabilities. Happy coding!