Introducing new file format for large mixed-content data

MCD is a simple file format for humans and AI agents, designed for documents that need text, tables, formulas, annotations and structured data in one source of truth.

MCD mixed-content document format illustration

Most document-heavy businesses do not lose information because they lack tools. They lose it because every tool creates another copy of reality. A renovation estimate lives in a PDF. Measurements move through a spreadsheet. Comments sit in email. Approvals happen in WhatsApp or Slack. An assistant reads one version, a client downloads another, and the document that was supposed to be the source of truth becomes only a snapshot.

To solve it, we have invented a new, free, open-source file format .MCD (Markdown CSV Document), that addresses the problem with a deceptively simple idea: make the document itself the structured workspace. One file should be comfortable for a human to read and precise enough for software and AI agents to parse without guesswork. The format does not try to invent another heavy enterprise platform or closed editor. It combines familiar building blocks: Markdown for text and formulas, CSV for meaningful tables, JSON for structure, schemas and annotations, and HTML/PDF-like rendering for visual presentation. MCD is a binary (similar to PDF, docx, etc), ZIP-like package designed to look like a PDF for humans while remaining machine-readable for software, parsers and AI agents. (GitHub)

The revolutionary part is this simplicity. A conventional PDF starts from a visual page: text and tables are placed on a canvas, and software later tries to reconstruct meaning from coordinates. MCD starts from semantic source data. Prose, headings, lists, formulas, table anchors, typed CSV tables, schemas, display rules and layout metadata are declared first. The readable page is then generated from that source. The document looks familiar to a person, but to a parser it is already organized.

That matters for AI-assisted work. AI agents are increasingly expected to review contracts, compare estimates, summarize reports, check budgets, extract requirements and reconcile changes. They can do that far better when the input is not an ambiguous page image or a flattened export. In an MCD document, the agent can receive Markdown blocks in order, tables at exact positions in the text, schemas with units and labels, annotations, and optional layout information. The format’s own design notes state that agents should consume native parser output: document stream, typed tables, schemas and optional layout maps. (GitHub)

For industries with large mixed-content documents, the difference is practical. A manufacturing spec can keep prose, formulas, parts tables and review notes in one package. An advertising agency can keep a brief, budget table, production schedule, media plan and client comments together. A restaurant or events business can combine riders, menus, supplier sheets and cost calculations. A construction or renovation team can maintain estimates, measurements, change notes and approvals without scattering working context across email threads and spreadsheets.

Annotations are central to the model. Comments are not treated as disposable messages outside the file. They can live inside the package, attached to a page or line, with stable identifiers and machine-readable metadata. The CLI supports adding annotations and exporting annotation metadata, which makes review history available to both humans and automation. (GitHub)

An .mcd file is a compact package rather than a single opaque blob. It can be packed, unpacked, validated, rendered and extracted through a CLI. It supports extraction of Markdown, expanded Markdown with tables, JSON, tables, images, charts and annotations. The repository also documents language support across Rust, Python, TypeScript/JavaScript and PHP, plus a local-first browser viewer/editor. (GitHub)

The result is not a replacement for every PDF, spreadsheet or project-management tool. It is a new source-of-truth layer for workflows where documents carry real operational meaning. MCD makes a document readable like a polished page, inspectable like structured data, and usable by AI agents without forcing teams to abandon familiar formats. Its strength is that the concept is almost obvious once stated: one file, human-readable and agent-readable, with text, formulas, tables, layout and annotations preserved in their proper form.

Document workflow audit

Find the right source-of-truth layer before automating

Read guide