Enhance User Interactions with the Qwen-VL-Chat Cognitive Actions

21 Apr 2025
Enhance User Interactions with the Qwen-VL-Chat Cognitive Actions

In today's world, conversational AI and multimodal capabilities are essential for building engaging applications. The Qwen-VL-Chat Cognitive Actions provide developers with an opportunity to integrate a sophisticated AI assistant into their applications. This assistant supports flexible interactions, including multi-round question answering and creative tasks, making it a powerful tool for developers looking to enhance user experience.

Prerequisites

Before you can utilize the Qwen-VL-Chat Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Basic knowledge of making API calls and handling JSON data.

Authentication typically involves passing your API key in the request headers.

Cognitive Actions Overview

Interact with Qwen-VL-Chat AI Assistant

The Interact with Qwen-VL-Chat AI Assistant action allows you to leverage a multimodal LLM (Large Language Model)-based AI assistant designed for various interactive capabilities. This action is categorized under chatbots and is particularly useful for applications that require image analysis alongside textual questions.

Input

The input for this action is structured as follows:

  • image (required): A URI pointing to the image that the AI will analyze.
  • prompt (optional): A question or task that you want the AI to address based on the provided image. If not specified, it defaults to "What is the name of the movie in the poster?"

Example Input:

{
  "image": "https://replicate.delivery/pbxt/JSwt0WCMKtolbjYYo6WYIE01Iemz3etQD6ugKxxeiVVlMgjF/Menu.jpeg",
  "prompt": "How much would I pay if I want to order two Salmon Burger and three Meat Lover's Pizza? Think carefully step by step."
}

Output

The output of this action is a textual response generated by the AI based on the provided image and prompt. The response typically includes a detailed answer to the question posed.

Example Output:

If you want to order two Salmon Burgers and three Meat Lover's Pizzas, the total cost would depend on the price of each item on the menu. 

Let's assume that the price of a Salmon Burger is $10 and the price of a Meat Lover's Pizza is $12. In this case, the total cost for two Salmon Burgers would be $20 and the total cost for three Meat Lover's Pizzas would be $36.

So, the total cost for two Salmon Burgers and three Meat Lover's Pizzas would be $56.

Conceptual Usage Example (Python)

Here's a conceptual example of how you might call the Qwen-VL-Chat action using Python. This snippet demonstrates how to structure the input JSON payload correctly and make a request to the hypothetical Cognitive Actions execution endpoint.

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "44d94c0e-4b56-45bb-9a34-65f02d8c6713" # Action ID for Interact with Qwen-VL-Chat AI Assistant

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/JSwt0WCMKtolbjYYo6WYIE01Iemz3etQD6ugKxxeiVVlMgjF/Menu.jpeg",
    "prompt": "How much would I pay if I want to order two Salmon Burger and three Meat Lover's Pizza? Think carefully step by step."
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action_id variable holds the ID for the Interact with Qwen-VL-Chat AI Assistant action.
  • The payload variable is structured according to the action's input schema.

Conclusion

The Qwen-VL-Chat Cognitive Actions provide a robust framework for integrating conversational AI into your applications. By leveraging the capabilities of this multimodal AI assistant, developers can create dynamic and interactive user experiences. As you explore these actions, consider the various use cases, from customer support to creative content generation, that can greatly enhance your applications. Happy coding!