Effortlessly Extract Content from Visual Documents with Granite Vision

26 Apr 2025
Effortlessly Extract Content from Visual Documents with Granite Vision

The Granite Vision 3.2 2b service provides developers with powerful Cognitive Actions designed to streamline the extraction and understanding of content from visual documents. Leveraging a compact and efficient vision-language model, this service excels at parsing complex formats such as tables, charts, and infographics. By integrating these Cognitive Actions into your applications, you can enhance data accessibility, automate content processing, and improve user interactions, all while saving valuable time and resources.

Imagine a scenario where you need to analyze a series of visual reports filled with intricate data presentations. Manually extracting this information can be tedious and error-prone. With Granite Vision, you can automate this process, ensuring accuracy and efficiency. Whether you're building a data analytics tool, a document management system, or an educational platform, Granite Vision can simplify your workflow and elevate the user experience.

Before you get started, ensure you have a valid Cognitive Actions API key and a basic understanding of making API calls.

Extract Visual Document Content

The "Extract Visual Document Content" action is designed to help you efficiently extract and understand content from visual documents. This action addresses the challenge of interpreting various visual formats, enabling your application to seamlessly process complex information.

Input Requirements

To utilize this action, you will need to provide the following parameters:

  • topK: An integer representing the number of highest probability tokens to consider for generating the output. The default is 50.
  • topP: A probability threshold for generating the output, with a default value of 0.9.
  • prompt: The initial text input to prompt the model (default is an empty string).
  • maxTokens: The maximum number of tokens the model should generate, with a default of 512.
  • minTokens: The minimum number of tokens the model should generate, defaulting to 0.
  • temperature: A value between 0 and 1 that modulates the randomness of the output, with a default of 0.6.
  • systemPrompt: A system prompt that guides the model's behavior (default is "You are a helpful assistant.").
  • stopSequences: A comma-separated list of sequences that will halt generation when encountered.
  • presencePenalty: A penalty applied to previously generated tokens to reduce their likelihood of reoccurrence, defaulting to 0.
  • frequencyPenalty: A penalty applied based on the frequency of tokens in the generated text, also defaulting to 0.

Expected Output

Upon successful execution, the action will return a sequence of tokens that represent the extracted content from the visual document, providing a clear understanding of its elements.

Use Cases for this specific action

  • Data Analysis: Quickly extract insights from visual reports or dashboards within business intelligence applications.
  • Document Management: Automate the processing of scanned documents or PDFs containing charts and tables.
  • Education Tools: Enhance learning platforms by providing students with automated descriptions and analyses of educational infographics.
  • Accessibility: Improve accessibility for visually impaired users by converting visual data into understandable text descriptions.

```python
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "c877e84f-7e7e-4708-acc0-e98485deb74e" # Action ID for: Extract Visual Document Content

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "topP": 0.9,
  "prompt": "Describe this image",
  "maxTokens": 512,
  "temperature": 0.6,
  "systemPrompt": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")


In conclusion, the Granite Vision 3.2 2b service empowers developers to efficiently extract content from visual documents, enhancing data processing capabilities across various applications. By integrating the "Extract Visual Document Content" action, you can automate complex tasks, improve accuracy, and provide more valuable insights to your users. To get started, explore the action's parameters and consider how you can leverage this technology in your next project.