Generate Realistic Human Images with the zsxkib/instant-id Cognitive Actions

23 Apr 2025
Generate Realistic Human Images with the zsxkib/instant-id Cognitive Actions

In the realm of image generation, the zsxkib/instant-id API offers incredible capabilities to generate realistic images of human faces using cutting-edge techniques. With the help of its Cognitive Actions, developers can harness the power of instant identity-preserving image generation without requiring extensive training datasets. This API provides numerous customization options to improve fidelity, text editability, and style integration, making it an exciting tool for various applications.

Prerequisites

To get started with the Cognitive Actions provided by the zsxkib/instant-id API, you'll need:

  • An API key for authentication purposes, which you can obtain by signing up on the platform.
  • Basic understanding of JSON structure as input and output formats will be in JSON.

Conceptually, authentication can be handled by including the API key in the headers of your requests when calling the action endpoints.

Cognitive Actions Overview

Generate Realistic Human Images

Description: This action generates realistic images of real people instantly using zero-shot identity-preserving techniques. It supports various downstream tasks, allowing developers to create images that meet specific requirements.

Category: image-generation

Input

The expected input for this action is structured as follows:

  • image (required): URI of the input face image to process.
  • seed (optional): Random seed for deterministic outputs.
  • prompt (optional): Descriptive prompt to guide image generation. Default is "a person".
  • scheduler (optional): Scheduling algorithm, default is "EulerDiscreteScheduler".
  • numberOfOutputs (optional): Number of images to output, between 1 and 8. Default is 1.
  • outputImageFormat (optional): Format of the output images. Default is "webp".
  • outputImageQuality (optional): Quality of the output images, where 100 is best and 0 is lowest. Default is 80.
  • referencePoseImage (optional): URI of a reference pose image.
  • inputNegativePrompt (optional): A prompt to guide undesirable traits in the output.
  • Additional parameters for fine control over generation, such as weights selection and control nets.

Example Input:

{
  "image": "https://replicate.delivery/pbxt/KIIutO7jIleskKaWebhvurgBUlHR6M6KN7KHaMMWSt4OnVrF/musk_resize.jpeg",
  "prompt": "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality",
  "scheduler": "EulerDiscreteScheduler",
  "numberOfOutputs": 1,
  "outputImageFormat": "webp",
  "outputImageQuality": 80,
  "referencePoseImage": "https://replicate.delivery/pbxt/KJmFdQRQVDXGDVdVXftLvFrrvgOPXXRXbzIVEyExPYYOFPyF/80048a6e6586759dbcb529e74a9042ca.jpeg",
  "inputNegativePrompt": "(lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured",
  "baseWeightsSelection": "protovision-xl-high-fidel",
  "enablePoseControlnet": true,
  "enhanceNonFaceRegion": true,
  "enableCannyControlnet": false,
  "enableDepthControlnet": false,
  "numberOfInferenceSteps": 30,
  "poseControlnetStrength": 0.4,
  "cannyControlnetStrength": 0.3,
  "depthControlnetStrength": 0.5,
  "faceDetectionInputWidth": 640,
  "faceDetectionInputHeight": 640,
  "imageAdapterStrengthScale": 0.8,
  "classifierFreeGuidanceScale": 5,
  "controlnetConditioningScale": 0.8,
  "enableLatentConsistencyModels": false,
  "latentConsistencyModelGuidanceScale": 1.5,
  "latentConsistencyModelNumInferenceSteps": 5
}

Output

The action typically returns a list of image URIs generated based on the input specifications.

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/ea753dc3-05b8-458c-9d53-bbcef1a1c9f3/18b9eb0c-e55c-41df-8c5a-39abc2c2afa5.webp"
]

Conceptual Usage Example (Python)

Here’s a conceptual example of how to invoke this action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "f672a47c-cd83-4d55-9109-9e50e7e834de" # Action ID for Generate Realistic Human Images

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/KIIutO7jIleskKaWebhvurgBUlHR6M6KN7KHaMMWSt4OnVrF/musk_resize.jpeg",
    "prompt": "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality",
    "scheduler": "EulerDiscreteScheduler",
    "numberOfOutputs": 1,
    "outputImageFormat": "webp",
    "outputImageQuality": 80
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, you'll see how to set up the API request, including the action ID and structured input payload. Adjust the endpoint URL and request structure according to your actual implementation.

Conclusion

The zsxkib/instant-id Cognitive Actions provide a powerful means of generating high-quality human images with ease. By leveraging these actions, developers can create applications that utilize realistic image generation for diverse use cases, from gaming to virtual reality and beyond. With extensive customization options, the possibilities are endless—dive in and explore the creative potential today!