Master Spatial Control in Your Applications with OminiControl Actions

Integrating advanced image processing capabilities into your applications can significantly enhance user experiences. The OminiControl Cognitive Actions, part of the chenxwh/ominicontrol-spatial API, provide developers with the tools to execute spatially aligned control operations effortlessly. This minimal yet universal control framework for Diffusion Transformer models supports functions like edge-guided and in-painting generation, using only 0.1% additional parameters to the base model.
In this article, we will explore the Perform Spatial Control with OminiControl action, detailing how to leverage its capabilities to manipulate images dynamically.
Prerequisites
To get started with the OminiControl Cognitive Actions, you will need:
- An API key for the Cognitive Actions platform.
- Basic knowledge of JSON structure and HTTP requests.
Authentication generally involves passing your API key in the request headers, allowing you to securely access the Cognitive Actions.
Cognitive Actions Overview
Perform Spatial Control with OminiControl
The Perform Spatial Control with OminiControl action is designed to execute various spatial control operations on images. This action can facilitate tasks such as in-painting and edge-guided generation, enabling developers to create visually compelling and contextually relevant images based on user-defined prompts.
Input
The input for this action follows the CompositeRequest schema, which includes the following required and optional fields:
- image (required): A valid URI pointing to the input image.
- seed (optional): A random seed value for consistent output. If left blank, a random seed is generated.
- model (optional): The selected task model. Options include
"fill","canny","depth","coloring", and"deblurring". Default is"fill". - prompt (optional): A descriptive prompt guiding the model. The default prompt is
"The Mona Lisa is wearing a white VR headset with 'Omini' written on it.". - guidanceScale (optional): Adjusts the influence of the prompt on the output. Valid range is from 1 to 20, with a default of 7.5.
- numInferenceSteps (optional): Specifies the number of steps for the denoising process. Valid range is from 1 to 500, with a default of 50.
Example Input:
{
"image": "https://replicate.delivery/pbxt/MF6fbaFCPPXLDUYYRQZhP0S93K3HQzzZDduHwxXQNAWiOgdb/masked.png",
"model": "fill",
"prompt": "The Mona Lisa is wearing a white VR headset with \"OMINI\" written on it.",
"guidanceScale": 7.5,
"numInferenceSteps": 50
}
Output
The action typically returns a URI pointing to the processed image. For example:
Example Output:
https://assets.cognitiveactions.com/invocations/24161643-f0cc-4ff5-90c3-bcdd0707c562/ffe81a8a-9e24-47f6-b471-9c89853eca83.png
Conceptual Usage Example (Python)
Here’s how you can call the Perform Spatial Control with OminiControl action using Python. Note that the endpoint URL and request structure are illustrative and should be adapted to your specific implementation.
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "55eea9bd-d3cc-401b-b55a-e7d67afeee8b" # Action ID for Perform Spatial Control with OminiControl
# Construct the input payload based on the action's requirements
payload = {
"image": "https://replicate.delivery/pbxt/MF6fbaFCPPXLDUYYRQZhP0S93K3HQzzZDduHwxXQNAWiOgdb/masked.png",
"model": "fill",
"prompt": "The Mona Lisa is wearing a white VR headset with \"OMINI\" written on it.",
"guidanceScale": 7.5,
"numInferenceSteps": 50
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In the above code, you replace the YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key and use the appropriate endpoint to execute the action. The input payload is constructed based on the required fields for the action.
Conclusion
The Perform Spatial Control with OminiControl action opens up a world of possibilities for image processing in your applications. By utilizing the flexibility of the OminiControl framework, developers can enhance their applications with sophisticated image manipulation capabilities. As a next step, consider experimenting with different model types and prompts to fully realize the potential of this action in your projects.