Effortlessly Infill Images Using SAM2 Cognitive Actions

24 Apr 2025
Effortlessly Infill Images Using SAM2 Cognitive Actions

Integrating AI into your applications has never been easier, especially with the powerful capabilities of the SAM2 Cognitive Actions. The "aaronhayes/sam2-infill-anything" spec provides developers with an efficient way to enhance images by automatically generating masks and filling in details without manual intervention. This guide will walk you through the "Infill with Automatic Mask Generation" action, helping you understand its capabilities and how to seamlessly integrate it into your applications.

Prerequisites

Before you dive into using Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Basic knowledge of JSON and RESTful API concepts.
  • Familiarity with making HTTP requests in your preferred programming language.

Authentication typically involves passing your API key in the request headers, ensuring secure access to the Cognitive Actions services.

Cognitive Actions Overview

Infill with Automatic Mask Generation

The "Infill with Automatic Mask Generation" action leverages SAM2 to create masks and inpaint objects within images automatically. This eliminates the need for manual mask creation, allowing developers to focus on higher-level application logic.

Input

The action requires several fields to function effectively:

  • image (string, required): The URL of the input image. Must be in a valid URI format.
  • infillPrompt (string, required): The text prompt used to generate content to infill the image.
  • maskPrompt (string, required): The text prompt used to create the SAM2 mask for image processing.
  • seed (integer, optional): Specifies a seed value for consistent results. Random if not specified.
  • denoise (number, optional, default: 0.85): Level of noise reduction applied to the image.
  • outputFormat (string, optional, default: "jpg"): The format for the output image (webp, jpg, png).
  • guidanceScale (number, optional, default: 9.5): Balances creativity and adherence to the prompt.
  • maskThreshold (number, optional, default: 0.5): Sensitivity of mask generation.
  • outputQuality (integer, optional, default: 95): Quality of the output image, ranging from 0 to 100.
  • inferenceSteps (integer, optional, default: 20): Number of steps for the inference process.
  • infillNegativePrompt (string, optional, default: "deformed, distorted..."): Excludes undesirable features during infill.

Here’s an example input JSON payload for this action:

{
  "image": "https://replicate.delivery/pbxt/MSDtQ6SQcoBe7skJ2iINnCISioSfSAoe7OyrVaUzkuI47a5Q/image.png",
  "denoise": 0.9,
  "maskPrompt": "rabbit",
  "infillPrompt": "A small cute baby grizzly bear",
  "outputFormat": "jpg",
  "guidanceScale": 8,
  "maskThreshold": 0.5,
  "outputQuality": 95,
  "inferenceSteps": 20,
  "infillNegativePrompt": "deformed, distorted, blurry, bad light, extra buildings, extra structures, buildings, overexposed, oversaturated, fake, animated, cartoon"
}

Output

Upon successful execution, the action typically returns a URL to the processed image. Here’s an example of what the output might look like:

[
  "https://assets.cognitiveactions.com/invocations/873efea7-6f9d-4921-9f01-527b001e30bb/57efca25-6d3d-4853-a4d0-af835821c386.jpg"
]

Conceptual Usage Example (Python)

Below is a conceptual Python code snippet demonstrating how to invoke the "Infill with Automatic Mask Generation" action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "a8b98b3d-1010-4845-88e6-087a49dcb932"  # Action ID for Infill with Automatic Mask Generation

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/MSDtQ6SQcoBe7skJ2iINnCISioSfSAoe7OyrVaUzkuI47a5Q/image.png",
    "denoise": 0.9,
    "maskPrompt": "rabbit",
    "infillPrompt": "A small cute baby grizzly bear",
    "outputFormat": "jpg",
    "guidanceScale": 8,
    "maskThreshold": 0.5,
    "outputQuality": 95,
    "inferenceSteps": 20,
    "infillNegativePrompt": "deformed, distorted, blurry, bad light, extra buildings, extra structures, buildings, overexposed, oversaturated, fake, animated, cartoon"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this snippet, replace the placeholder API key with your actual key. The action ID corresponds to the specific action you're invoking, and the payload is structured according to the input schema described above.

Conclusion

The SAM2 Cognitive Actions, particularly the "Infill with Automatic Mask Generation," provide a robust tool for developers looking to automate image processing tasks. By leveraging these pre-built actions, you can significantly enhance your applications' capabilities with minimal effort. Consider experimenting with various prompts and settings to see how they can best serve your specific use cases and applications. Happy coding!