Unlock 3D Model Creation with the GRM Cognitive Actions

23 Apr 2025
Unlock 3D Model Creation with the GRM Cognitive Actions

Integrating 3D reconstruction capabilities into applications can significantly enhance user experience and engagement. The camenduru/grm Cognitive Actions provide developers with powerful tools to utilize the Large Gaussian Reconstruction Model (GRM) for efficient 3D model generation from input images. This blog post aims to guide you through using the Perform 3D Reconstruction Using GRM action, detailing its required inputs, expected outputs, and a conceptual example of how to implement it in Python.

Prerequisites

Before diving into the integration process, ensure you have the following:

  1. An API key for accessing the Cognitive Actions platform.
  2. Basic understanding of making HTTP requests and handling JSON data.

To authenticate your requests, you will typically pass the API key in the request headers, allowing you to securely access the Cognitive Actions.

Cognitive Actions Overview

Perform 3D Reconstruction Using GRM

This action allows you to utilize the GRM to efficiently reconstruct and generate 3D models from input images. It falls under the 3D Reconstruction category, making it an excellent choice for developers looking to integrate advanced visual capabilities into their applications.

Input:

The input schema for this action consists of the following fields:

  • inputImage (string, required): A URI pointing to the input image that will be processed for 3D reconstruction.
    • Example: https://replicate.delivery/pbxt/Kf2I8ezAPJ9a6YZJUnDkoGq7urlPtjrA5hRS02D0knxS2KrW/dragon2.png
  • seed (integer, optional): A seed value to ensure reproducibility of the output. Defaults to 42.
    • Example: 21
  • model (string, optional): Specifies the version of the model to use. Defaults to Zero123++ v1.2.
    • Example: Zero123++ v1.2
  • fuseMesh (boolean, optional): Indicates whether to fuse the mesh. Defaults to true.
    • Example: true

Example Input:

{
  "seed": 21,
  "model": "Zero123++ v1.2",
  "fuseMesh": true,
  "inputImage": "https://replicate.delivery/pbxt/Kf2I8ezAPJ9a6YZJUnDkoGq7urlPtjrA5hRS02D0knxS2KrW/dragon2.png"
}

Output:

Upon successful execution, the action returns an array of URIs linking to the generated 3D model assets. The output can include several formats, such as .mp4, .glb, and .ply.

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/5267704f-0f14-404a-ad9b-70e05d79411f/aa5215c4-aab4-468c-87e9-53a8f6bca2ac.mp4",
  "https://assets.cognitiveactions.com/invocations/5267704f-0f14-404a-ad9b-70e05d79411f/87631a7a-af8a-4e3c-8354-2ebc9e66c4c5.glb",
  "https://assets.cognitiveactions.com/invocations/5267704f-0f14-404a-ad9b-70e05d79411f/419f8c18-b4f2-4cc1-9b5f-3bfd6db3738a.ply",
  "https://assets.cognitiveactions.com/invocations/5267704f-0f14-404a-ad9b-70e05d79411f/8e1a0629-827c-4065-b106-370d713e5cc9.ply"
]

Conceptual Usage Example (Python): Here’s how you might call the Cognitive Actions execution endpoint for the 3D reconstruction action in Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "769703e8-a953-47aa-9266-cd25e49e4cb2"  # Action ID for Perform 3D Reconstruction Using GRM

# Construct the input payload based on the action's requirements
payload = {
    "seed": 21,
    "model": "Zero123++ v1.2",
    "fuseMesh": True,
    "inputImage": "https://replicate.delivery/pbxt/Kf2I8ezAPJ9a6YZJUnDkoGq7urlPtjrA5hRS02D0knxS2KrW/dragon2.png"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this Python code snippet, replace the COGNITIVE_ACTIONS_API_KEY with your actual API key, and utilize the provided action ID along with the correctly structured input payload.

Conclusion

The Perform 3D Reconstruction Using GRM Cognitive Action offers an accessible and powerful way to integrate 3D modeling capabilities into your applications, enhancing visual content and user experience. By following the guidelines provided, you can quickly harness this technology for various use cases, from gaming to virtual reality applications. Explore the possibilities, and consider experimenting with different input images and model settings to achieve your desired results!