Streamline Receipt Processing with the Donut Cognitive Actions

22 Apr 2025
Streamline Receipt Processing with the Donut Cognitive Actions

In today's fast-paced digital landscape, automating data extraction from documents is essential for efficiency. The willywongi/donut API enables developers to harness the power of document understanding through its pre-built Cognitive Actions. One such action is designed to extract structured data from receipt images, which can significantly improve workflow automation in various applications.

Prerequisites

Before diving into the Donut Cognitive Actions, ensure you have the following:

  • API Key: You'll need an API key to authenticate your requests to the Cognitive Actions platform.
  • Setup: Familiarity with making HTTP requests and handling JSON data will be beneficial.
  • Authentication: Typically, authentication is handled by including your API key in the headers of your requests.

Cognitive Actions Overview

Extract Receipt Data with Donut

The Extract Receipt Data with Donut action allows you to extract structured data from receipt images using the Donut 🍩 (Document Understanding Transformer) model. This model provides efficient information extraction without relying on traditional OCR techniques.

Input

This action requires a single input parameter:

  • image (string, required): The URI of the input image to be processed. The URL must be accessible and point directly to the image file.

Example Input:

{
  "image": "https://replicate.delivery/pbxt/IgCzf30UdmaTYhtlaifqpVA7V7nQf7a8muE6AWie2Fm5bNv3/sample_image_cord_test_receipt_00004.png"
}

Output

The output from this action is a structured JSON object containing the extracted data from the receipt. Here’s an example of what you might expect:

Example Output:

{
  "menu": [
    {"nm": "ICE BLAOKCOFFE", "cnt": "2", "price": "82,000"},
    {"nm": "AVOCADO COFFEE", "cnt": "1", "price": "61,000"},
    {"nm": "Oud CHINEN KATSU FF", "cnt": "1", "price": "51,000"}
  ],
  "sub_total": {
    "subtotal_price": "194,000",
    "discount_price": "19,400"
  },
  "total": {
    "total_price": "174,600",
    "cashprice": "200,000",
    "changeprice": "25,400"
  }
}

Conceptual Usage Example (Python)

Here’s a conceptual example of how you might call the Extract Receipt Data action using Python. This example demonstrates structuring the input payload correctly.

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "d83909cc-47a6-4fe7-a0df-7bcdec907132" # Action ID for Extract Receipt Data with Donut

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/IgCzf30UdmaTYhtlaifqpVA7V7nQf7a8muE6AWie2Fm5bNv3/sample_image_cord_test_receipt_00004.png"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In the code above:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action_id is set to the ID for the Extract Receipt Data action.
  • The payload is structured based on the required input schema.

Conclusion

The Extract Receipt Data with Donut action simplifies the process of extracting structured information from receipt images, making it a powerful tool for developers looking to enhance their applications. With its efficient approach to document understanding, this Cognitive Action can save time and reduce manual data entry errors.

As a next step, consider exploring additional actions within the Donut API to further enhance your document processing capabilities!