Create Stunning Videos from Text with cjwbw/damo-text-to-video Cognitive Actions

22 Apr 2025
Create Stunning Videos from Text with cjwbw/damo-text-to-video Cognitive Actions

In the realm of multimedia content creation, the ability to generate videos from text descriptions opens up a world of possibilities. The cjwbw/damo-text-to-video spec provides developers with a powerful Cognitive Action to create videos through a multi-stage text-to-video diffusion model. This model captures the essence of visual semantics, transforming written prompts into dynamic video content. In this guide, we'll explore how to effectively use this action to enhance your applications.

Prerequisites

Before diving into the integration of the Cognitive Actions, ensure you have the following:

  • An API key for accessing the Cognitive Actions platform.
  • Familiarity with making HTTP requests in your preferred programming language.
  • Basic understanding of JSON format for structuring your input and output data.

Authentication typically involves passing your API key in the headers of your requests.

Cognitive Actions Overview

Generate Video from Text

The Generate Video from Text action allows you to create videos based on English text descriptions. By utilizing a sophisticated diffusion model, this action generates videos by denoising from initial Gaussian noise, capturing intricate visual elements described in your prompts.

Input

The action requires a structured input schema defined as follows:

  • fps: (Integer, optional) The frames per second for the output video. Default is 8 FPS.
  • seed: (Integer, optional) A random seed for generating variations. If not provided, a random seed will be used.
  • prompt: (String, required) Descriptive text input guiding the generation process. For example: "A panda eating bamboo on a rock."
  • numberOfFrames: (Integer, optional) Specifies the total number of frames in the output video. Default is 16 frames.
  • numberOfInferenceSteps: (Integer, optional) Indicates the number of denoising steps applied during inference, ranging from 1 to 500. Default is 50 steps.

Example Input:

{
  "fps": 8,
  "prompt": "A panda eating bamboo on a rock.",
  "numberOfFrames": 50,
  "numberOfInferenceSteps": 50
}

Output

Upon successful execution, the action returns a URL pointing to the generated video. The output will typically look like this:

Example Output:

https://assets.cognitiveactions.com/invocations/2c467727-85e2-48db-b696-e029eeef0b81/eb965f68-85c2-4ec1-8639-c23008392cb0.mp4

Conceptual Usage Example (Python)

Here’s how you might call the Generate Video from Text action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "e9f894cb-675c-448b-9f00-6be8c177036e"  # Action ID for Generate Video from Text

# Construct the input payload based on the action's requirements
payload = {
    "fps": 8,
    "prompt": "A panda eating bamboo on a rock.",
    "numberOfFrames": 50,
    "numberOfInferenceSteps": 50
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The payload is structured according to the action's input schema, including the prompt and other parameters.
  • The response from the API is handled to extract and display the generated video URL.

Conclusion

The cjwbw/damo-text-to-video Cognitive Action provides a remarkable opportunity to generate videos from textual descriptions seamlessly. By integrating this action into your applications, you can unlock the potential for dynamic content creation, enhancing user engagement and creativity.

Explore the possibilities of multimedia storytelling by leveraging the power of AI-driven video generation in your next project!