Transform Your Documents into LLM-Ready Markdown with cuuupid/markitdown

23 Apr 2025
Transform Your Documents into LLM-Ready Markdown with cuuupid/markitdown

In the evolving landscape of AI and machine learning, the ability to convert various document types into a format ready for language models is crucial. The cuuupid/markitdown Cognitive Actions provide developers with the tools to seamlessly transform Office documents, PDFs, images, and audio files into LLM-ready markdown format. Using Microsoft's MarkItDown tool, these actions streamline the data conversion process, enabling you to focus on building innovative applications without the hassle of manual formatting.

Prerequisites

Before you can start using the Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform, which will be used for authenticating your requests.
  • Basic knowledge of RESTful API interactions and JSON formatting.

For authentication, you will typically pass your API key in the headers of your requests, ensuring secure access to the Cognitive Actions.

Cognitive Actions Overview

Convert to LLM-Ready Markdown

The Convert to LLM-Ready Markdown action allows you to utilize Microsoft's MarkItDown tool to convert a variety of file types into a markdown format suitable for large language models. This action supports optical character recognition (OCR) for images and transcription for audio files, providing a comprehensive solution for document conversion.

Input:

  • Required Fields:
    • document: A URI link to the document you wish to convert. This can be in formats such as PDF, PPTX, DOCX, XLSX, PNG, JPG, MP3, WAV, HTML, CSV, JSON, XML, and more.
  • Optional Fields:
    • openAiApiKey: Your OpenAI API key for additional authentication. This should be kept secure.

Example Input:

{
  "document": "https://replicate.delivery/pbxt/M9lE653pyLnXBk7P0VrmymcjqvQyXKsBBUgNkLz3YN2Y9wdw/Tradewinds%2BMarketplace%2BAnnouncement%2BRevision%2B6.pdf"
}

Output: The action typically returns a URL pointing to the converted markdown file.

Example Output:

https://assets.cognitiveactions.com/invocations/571d3973-79a6-42c5-8ceb-32e4063e5969/7062caff-249a-40f3-9dfd-11b9c2921d6d.md

Conceptual Usage Example (Python): Here’s how you might call the Convert to LLM-Ready Markdown action using Python. This snippet demonstrates how to structure the input JSON payload correctly and send a request to the Cognitive Actions API.

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "c460d32c-396a-483c-afce-8671713d7c7c" # Action ID for Convert to LLM-Ready Markdown

# Construct the input payload based on the action's requirements
payload = {
    "document": "https://replicate.delivery/pbxt/M9lE653pyLnXBk7P0VrmymcjqvQyXKsBBUgNkLz3YN2Y9wdw/Tradewinds%2BMarketplace%2BAnnouncement%2BRevision%2B6.pdf"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id is set to the ID of the Convert to LLM-Ready Markdown action, and the payload contains the required document URI. The endpoint URL and request structure are illustrative and may vary based on specific implementation details.

Conclusion

The cuuupid/markitdown Cognitive Actions provide an efficient way to convert a wide array of document types into LLM-ready markdown. By leveraging this action, developers can save time and resources, allowing them to focus on creating applications that utilize advanced language models. As you explore these capabilities, consider integrating these actions into your workflow to enhance your application's functionality and user experience.