Create Stunning Videos from Text Prompts with Step Video T2v

25 Apr 2025
Create Stunning Videos from Text Prompts with Step Video T2v

In today's fast-paced digital landscape, the ability to generate high-quality videos from simple text prompts can significantly enhance content creation processes. The Step Video T2v service offers developers a powerful API that utilizes advanced video generation technology, allowing for the creation of captivating videos with minimal effort. By leveraging the StepVideo model, optimized for single GPU performance and FP8 quantization, developers can enjoy faster and more efficient video generation.

This service opens up a world of possibilities for various use cases, including marketing campaigns, educational content, social media posts, and creative storytelling. Imagine transforming a vivid description into a visually engaging video that captures your audience's attention—all done seamlessly through API calls.

Prerequisites

Before diving into the integration of Step Video T2v, ensure you have a Cognitive Actions API key and a basic understanding of making API calls.

Generate Videos from Text Prompts

The Generate Videos from Text Prompts action is designed to create high-quality videos based on descriptive text inputs. This action addresses the challenge of video content creation by simplifying the process and allowing users to focus on their creative ideas rather than technical production details.

Input Requirements

To generate a video, you'll need to provide the following inputs:

  • FPS (Frames Per Second): Default is 25. This determines the smoothness of the video playback.
  • Seed: (Optional) A random seed for video generation. Leaving it blank will use a random seed.
  • Prompt: The descriptive text that influences the video content. Default is "An astronaut on the moon."
  • Quality: A quality rating from 0 to 10, where 10 is the highest. Default is 5.
  • Number of Frames: Total frames in the output video. Default is 51.
  • Negative Prompt Text: Text to specify undesirable features to avoid in the video. Default is "low resolution, text."
  • Number of Inference Steps: Affects video quality; more steps can lead to better results but require more processing time. Default is 30.
  • Classifier-Free Guidance Scale: A scale for enhancing adherence to the prompt. Default is 9.

Expected Output

The action will produce a high-quality video based on the provided inputs. You can expect a video URL as output, leading to the generated video.

Example Output:
https://assets.cognitiveactions.com/invocations/3b787176-c884-405f-a8b8-f9e90b010d04/7a619394-174a-4381-b5d0-05acab267fed.mp4

Use Cases for this Action

  • Marketing Campaigns: Create engaging promotional videos that capture product features and benefits directly from descriptive text.
  • Educational Content: Develop instructional videos by converting lesson outlines into visual aids for enhanced learning experiences.
  • Social Media Engagement: Generate eye-catching videos for social media posts that draw in viewers and encourage shares.
  • Creative Storytelling: Transform narratives or scripts into dynamic videos, providing a new medium for storytelling.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "93d64fc4-a0ca-467a-8209-6e021577779d" # Action ID for: Generate Videos from Text Prompts

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "fps": 25,
  "prompt": "An astronaut discovers a stone monument on the moon with the word 'stepfun' inscribed on it, glowing brightly",
  "quality": 5,
  "numberOfFrames": 51,
  "negativePromptText": "dark image, low resolution, bad hands, text, missing fingers, extra fingers, cropped, low quality, grainy, signature, watermark, username, blurry",
  "numberOfInferenceSteps": 30,
  "classifierFreeGuidanceScale": 9
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Conclusion

The Step Video T2v service empowers developers to easily create high-quality videos from text prompts, streamlining the video production process. With its diverse use cases ranging from marketing to education, this technology allows for innovative content creation that can engage audiences effectively. Start exploring how you can integrate this powerful API into your applications and unlock new creative possibilities today!