Transform Text into Stunning Videos with zsxkib/step-video-t2v Cognitive Actions

22 Apr 2025
Transform Text into Stunning Videos with zsxkib/step-video-t2v Cognitive Actions

In today's digital landscape, the ability to convert text into engaging video content is a game-changer for developers. The zsxkib/step-video-t2v API offers a powerful Cognitive Action that allows you to generate high-quality videos from descriptive text prompts. By leveraging advanced video generation techniques, you can create compelling visual narratives with ease, optimizing for GPU execution to enhance performance and reduce memory usage.

Prerequisites

Before you start using the Cognitive Actions from the zsxkib/step-video-t2v API, ensure you have the following:

  • An API key for the Cognitive Actions platform, which will be used for authentication.
  • Basic knowledge of making HTTP requests in your programming language of choice.
  • Familiarity with JSON, as you will be working with JSON payloads to specify input and handle output.

For authentication, you will typically pass your API key in the request headers.

Cognitive Actions Overview

Generate High-Quality Video from Text

Description:
This action transforms descriptive text prompts into high-quality videos using advanced techniques like FP8 quantization. It is optimized for single GPU execution, making it efficient for generating vivid motion visuals from simple text descriptions.

Category: video-generation

Input

The input schema for this action requires the following fields:

  • fps (integer): The frames per second in the output video, ranging from 10 to 60 (default is 25).
  • seed (integer, optional): A random seed for video generation. Leaving it blank will use a random value.
  • prompt (string): The descriptive text that guides the video content. Example: "An astronaut discovers a stone monument on the moon."
  • quality (integer): Video quality on a scale from 0 to 10, with 10 being the highest (default is 5).
  • negativePrompt (string): Describes elements to avoid in the video, such as "low resolution, text."
  • numberOfFrames (integer): Total number of frames in the video, valid from 17 to 204 (default is 51).
  • numberOfInferenceSteps (integer): The number of inference steps for video generation, ranging from 1 to 100 (default is 30).
  • classifierFreeGuidanceScale (number): Strength of classifier-free guidance, valid from 1 to 20 (default is 9).

Example Input:

{
  "fps": 25,
  "prompt": "An astronaut discovers a stone monument on the moon with the word 'stepfun' inscribed on it, glowing brightly",
  "quality": 5,
  "negativePrompt": "dark image, low resolution, bad hands, text, missing fingers, extra fingers, cropped, low quality, grainy, signature, watermark, username, blurry",
  "numberOfFrames": 51,
  "numberOfInferenceSteps": 30,
  "classifierFreeGuidanceScale": 9
}

Output

Upon successful execution, the action typically returns a URL pointing to the generated video.

Example Output:

https://assets.cognitiveactions.com/invocations/9b6487b1-31d7-4589-a5d3-a62329c903a4/a639aefc-a47e-4246-90c1-a585de9c7568.mp4

Conceptual Usage Example (Python)

Here's a conceptual Python code snippet demonstrating how to call this Cognitive Action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "926402c5-ea97-4b38-8b7e-84000aefe9b6"  # Action ID for Generate High-Quality Video from Text

# Construct the input payload based on the action's requirements
payload = {
    "fps": 25,
    "prompt": "An astronaut discovers a stone monument on the moon with the word 'stepfun' inscribed on it, glowing brightly",
    "quality": 5,
    "negativePrompt": "dark image, low resolution, bad hands, text, missing fingers, extra fingers, cropped, low quality, grainy, signature, watermark, username, blurry",
    "numberOfFrames": 51,
    "numberOfInferenceSteps": 30,
    "classifierFreeGuidanceScale": 9
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, replace the placeholder values with your actual API key and ensure the endpoint URL and request structure align with your specific use case. The input payload is structured to meet the requirements of the Generate High-Quality Video from Text action.

Conclusion

The zsxkib/step-video-t2v Cognitive Actions provide an innovative way for developers to generate high-quality videos from text prompts, unlocking new possibilities for content creation. By utilizing advanced video generation techniques, you can create engaging visuals that captivate your audience. Explore the potential of these actions in your applications, and consider how they can enhance user experiences across various domains. Whether for marketing, education, or entertainment, the ability to turn text into video is a powerful tool in your development arsenal.