Transform Your Documents into LLM-Ready Markdown with cuuupid/markitdown Actions

23 Apr 2025
Transform Your Documents into LLM-Ready Markdown with cuuupid/markitdown Actions

In the world of natural language processing, preparing documents in a format that can be easily consumed by large language models (LLMs) is crucial. The cuuupid/markitdown API provides powerful Cognitive Actions that enable developers to convert various document types into Markdown format, making them ready for LLM applications. This article will guide you through the capabilities of the Cognitive Actions available in this spec, specifically focusing on converting documents to LLM-ready Markdown.

Prerequisites

Before you start using the Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform to authenticate your requests.
  • Basic knowledge of JSON and how to make HTTP requests in your preferred programming language.

To authenticate, you will typically pass your API key in the request headers. This allows you to securely access the Cognitive Actions you want to use.

Cognitive Actions Overview

Convert Documents to LLM-Ready Markdown

Purpose

This action converts various document formats—including Office documents, PDFs, images, and audio files—into Markdown format suitable for large language models. It supports additional features such as Optical Character Recognition (OCR), speech transcription, and EXIF metadata extraction to enhance the quality of the output.

Input

The input for this action requires a JSON object that must contain a document URL. You can also optionally provide an openAiApiKey for additional OpenAI services.

Input Schema:

{
  "type": "object",
  "properties": {
    "document": {
      "type": "string",
      "format": "uri",
      "description": "A URL pointing to a document. Supports formats such as PDF, PPTX, DOCX, XLSX, PNG, JPG, MP3, WAV, HTML, CSV, JSON, and XML."
    },
    "openAiApiKey": {
      "type": "string",
      "format": "password",
      "description": "An optional API key for OpenAI services, used for authentication purposes."
    }
  },
  "required": ["document"]
}

Example Input:

{
  "document": "https://replicate.delivery/pbxt/M9lE653pyLnXBk7P0VrmymcjqvQyXKsBBUgNkLz3YN2Y9wdw/Tradewinds%2BMarketplace%2BAnnouncement%2BRevision%2B6.pdf"
}

Output

Upon successful execution, this action returns a URL that points to the generated Markdown file.

Example Output:

https://assets.cognitiveactions.com/invocations/2a92dbd1-7aeb-4935-9c93-803cdc77d46b/316cb546-19dd-48de-a444-e49ed9465ac0.md

Conceptual Usage Example (Python)

Here’s how you might structure a request to use this action in Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "ceeb2e3c-3824-4858-bef6-95597319e19f" # Action ID for Convert Documents to LLM-Ready Markdown

# Construct the input payload based on the action's requirements
payload = {
    "document": "https://replicate.delivery/pbxt/M9lE653pyLnXBk7P0VrmymcjqvQyXKsBBUgNkLz3YN2Y9wdw/Tradewinds%2BMarketplace%2BAnnouncement%2BRevision%2B6.pdf"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key, and the Cognitive Actions Execute URL with the correct endpoint. The action ID and the input payload structure are defined based on the action's requirements.

Conclusion

The cuuupid/markitdown Cognitive Actions provide an efficient way to convert documents into a format suitable for large language models. By leveraging the power of this API, developers can streamline their workflows and enhance their applications with rich, markdown-formatted content. Start integrating these actions into your applications today and unlock new possibilities in document processing!