Transforming Images with InstantID and IPAdapter: A Developer's Guide

24 Apr 2025
Transforming Images with InstantID and IPAdapter: A Developer's Guide

In today's digital landscape, the ability to generate and modify images is a game-changer for developers. The InstantID and IPAdapter integration offers a powerful Cognitive Action that can help you create realistic images of people by transforming face photos. This action allows for intricate modifications through text-based prompts, enabling high-quality outputs suitable for various applications, from gaming to virtual reality.

Prerequisites

Before diving into the integration of InstantID and IPAdapter, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • A basic understanding of JSON format and API interactions.
  • Familiarity with a programming language, such as Python, for making API calls.

Authentication typically involves passing your API key in the request headers.

Cognitive Actions Overview

Transform Face Photos with InstantID and IPAdapter

Description: This action allows you to modify face photos by instantly creating realistic images using the InstantID and IPAdapter model. You can alter facial features and fine-tune the results for high-quality, large-scale image outputs.

Category: Image Generation

Input

The input schema for this action is quite comprehensive, allowing for various customization options:

  • inputImageUri (string, required): The URI for an input image to be used as a reference.
    Example: "https://replicate.delivery/pbxt/LGizs2Ko3P4PJ1hbeQRnWG52t27QuBLq3826VcJJURhxNoYS/IMG_4467.jpeg"
  • prompt (string, required): Text-based description for image transformation.
    Example: "Cyberpunk character, neon lights, futuristic implants, urban dystopia, high contrast, young man"
  • negativePrompt (string, optional): Descriptive terms to avoid in the output.
    Example: "NSFW, nudity, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured"
  • steps (integer, optional): Number of sampling steps (default is 30, max 50).
    Example: 30
  • width (integer, optional): Width of the generated output image (default is 1600 pixels).
    Example: 1600
  • height (integer, optional): Height of the generated output image (default is 1600 pixels).
    Example: 1600
  • guidanceScale (number, optional): Scale for balancing fidelity and creativity (default is 4.5).
    Example: 4.5
  • outputQuality (integer, optional): Quality of the output images (default is 80).
    Example: 80
  • Additional parameters like denoise, batchSize, scheduler, samplerName, outputFormat, and various weights for the InstantID and IPAdapter effects can also be specified for further customization.

Example Input:

{
  "steps": 30,
  "width": 1600,
  "height": 1600,
  "prompt": "Cyberpunk character, neon lights, futuristic implants, urban dystopia, high contrast, young man",
  "denoise": 1,
  "batchSize": 1,
  "scheduler": "karras",
  "samplerName": "ddpm",
  "outputFormat": "webp",
  "guidanceScale": 4.5,
  "inputImageUri": "https://replicate.delivery/pbxt/LGizs2Ko3P4PJ1hbeQRnWG52t27QuBLq3826VcJJURhxNoYS/IMG_4467.jpeg",
  "outputQuality": 80,
  "instantIdEndAt": 1,
  "ipAdapterEndAt": 1,
  "negativePrompt": "NSFW, nudity, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured",
  "instantIdWeight": 0.8,
  "ipAdapterWeight": 0.5,
  "instantIdStartAt": 0,
  "ipAdapterStartAt": 0,
  "ipAdapterWeightType": "linear",
  "ipAdapterCombineEmbeds": "norm average",
  "ipAdapterEmbedsScaling": "V only"
}

Output

The output of this action is a URL pointing to the generated image based on your specifications.

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/9ca3c0f1-d985-4fef-b4e8-e61d668a7316/4312cd15-4a9a-4fde-9a0d-8b697d1fa3f5.webp"
]

Conceptual Usage Example (Python)

Here’s a conceptual example of how you might call the Transform Face Photos action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "b54fc132-9234-4737-8593-f3666c3eeb1e"  # Action ID for Transform Face Photos

# Construct the input payload based on the action's requirements
payload = {
    "steps": 30,
    "width": 1600,
    "height": 1600,
    "prompt": "Cyberpunk character, neon lights, futuristic implants, urban dystopia, high contrast, young man",
    "denoise": 1,
    "batchSize": 1,
    "scheduler": "karras",
    "samplerName": "ddpm",
    "outputFormat": "webp",
    "guidanceScale": 4.5,
    "inputImageUri": "https://replicate.delivery/pbxt/LGizs2Ko3P4PJ1hbeQRnWG52t27QuBLq3826VcJJURhxNoYS/IMG_4467.jpeg",
    "outputQuality": 80,
    "instantIdEndAt": 1,
    "ipAdapterEndAt": 1,
    "negativePrompt": "NSFW, nudity, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured",
    "instantIdWeight": 0.8,
    "ipAdapterWeight": 0.5,
    "instantIdStartAt": 0,
    "ipAdapterStartAt": 0,
    "ipAdapterWeightType": "linear",
    "ipAdapterCombineEmbeds": "norm average",
    "ipAdapterEmbedsScaling": "V only"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

This Python code demonstrates how to structure your input payload and make a POST request to the hypothetical Cognitive Actions execution endpoint.

Conclusion

The InstantID and IPAdapter Cognitive Action opens up exciting possibilities for developers looking to integrate advanced image generation capabilities into their applications. By leveraging the flexibility and power of this action, you can create unique and engaging content that stands out. Consider experimenting with different prompts and settings to discover the full potential of this tool in your projects!