Transforming Text and Images into 3D Point Clouds with cjwbw/point-e Actions

24 Apr 2025
Transforming Text and Images into 3D Point Clouds with cjwbw/point-e Actions

In the realm of 3D graphics and modeling, generating point clouds from text or images can open up exciting possibilities for developers. The cjwbw/point-e Cognitive Actions offer a streamlined way to convert complex prompts into 3D point clouds. This blog post will guide you through integrating these actions into your applications, highlighting their capabilities and practical implementation.

Prerequisites

Before diving into the Cognitive Actions, ensure you have the following prerequisites in place:

  • An API key for the Cognitive Actions platform, which you will use to authenticate your requests.
  • Basic knowledge of JSON and how to structure API calls.

To authenticate, you will typically pass your API key in the request headers, allowing you to access the Cognitive Actions functionalities securely.

Cognitive Actions Overview

Generate 3D Point Clouds

The Generate 3D Point Clouds action allows developers to create 3D point clouds from either text prompts or images. This action supports two modes of generation: text2pointcloud and img2pointcloud. The output can be formatted as either an animation or a JSON file, making it versatile for various applications.

Input

The input for this action requires specific fields to be populated based on the following schema:

  • image (Optional): A URI of the input image. If a prompt is provided, the image will be ignored.
  • prompt (Required): A textual input prompt to guide the generation of the point cloud (e.g., "a red motorcycle").
  • outputFormat (Optional): Specifies the output format, which can be either animation or json_file. The default is animation.

Example Input:

{
  "prompt": "a red motorcycle",
  "outputFormat": "animation"
}

Output

The action typically returns an output containing:

  • animation: A URL to the generated animation if the output format is set to animation.
  • json_file: A JSON object containing the point cloud data if the format is set to json_file. The JSON format includes:
    {
      "coords": [[X, Y, Z], ...],
      "colors": [[R, G, B], ...]
    }
    

Example Output:

{
  "animation": "https://assets.cognitiveactions.com/invocations/46af6a21-5718-423f-87f0-77d2610e3a01/ed75def8-3b1f-4f4e-8dd6-0c94c50c5a1d.gif",
  "json_file": null
}

Conceptual Usage Example (Python)

Here’s how you might call the Generate 3D Point Clouds action using a hypothetical Cognitive Actions execution endpoint:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "9be4edac-86be-4aeb-9c59-9f0fdf0547cd" # Action ID for Generate 3D Point Clouds

# Construct the input payload based on the action's requirements
payload = {
    "prompt": "a red motorcycle",
    "outputFormat": "animation"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID and input payload are set according to the requirements of the Generate 3D Point Clouds action. The code snippet demonstrates how to structure the request and handle the response effectively.

Conclusion

The Cognitive Actions in the cjwbw/point-e spec provide an exciting opportunity for developers to generate 3D point clouds from text and images. By utilizing the Generate 3D Point Clouds action, you can enrich your applications with dynamic 3D visualizations. As you explore these capabilities, consider various use cases, such as real-time rendering in gaming or creating interactive educational tools. Happy coding!