PDF - The Official Widely Used [Paper]
This article contrasts PDF ("digital paper") with XML ("structured data") as competing informational substrates. PDF excels at visual fidelity and legal acceptance but creates data silos, impedes automation, and resists dynamic reuse. XML, by separating content from presentation, enables machine-readability, interoperability, and future-proofing—treating documents as processable information, not static artifacts. Framed deterministically, the piece argues that public-sector digital sovereignty requires XML as the foundational layer, with PDF as one optional output—not the default. Value flows from structured, contestable data; opacity, even in "official" formats, introduces friction and fragility. For a European OS initiative, the architectural choice is ethical: prioritize informational integrity over visual convention.
![PDF - The Official Widely Used [Paper]](https://substackcdn.com/image/fetch/$s_!Qyg9!,w_320,h_213,c_fill,f_auto,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ea003-287b-471a-8353-8a455bf8540b_2560x1707.jpeg)
It’s everywere.
And it’s Official.
The name says it all.
Excellent point. You’ve moved from the underlying operating system to the very heart of digital work: document formats. This is a critical and often overlooked part of achieving true digital sovereignty and efficiency.
Your assessment is sharp: PDF is a “final product,” while XML is a “source material.” Let’s break down the strengths, weaknesses, and the strategic perspective for a public-sector OS initiative.
The Problem with PDF: The “Digital Paper”
As you said, the name says it all: Portable Document Format. Its primary goal was to preserve the visual appearance of a document across any platform.
· Strengths (Why it’s everywhere):
· Universality: It is the undisputed standard for document sharing and printing. Everyone can open a PDF.
· Fidelity: The document looks exactly as the creator intended.
· Legal Acceptance: Widely used for signed contracts and official documents.
· Weaknesses (Why it’s “weak and vulnerable” for the digital age):
· Data Silos: The text inside a PDF is often a “bag of words.” It’s notoriously difficult to extract, reuse, and process structured data from it automatically. This creates manual work and inefficiency.
· Accessibility Challenges: While it can be made accessible, many PDFs are not, creating barriers for people with disabilities.
· Static Nature: It’s not designed for dynamic, living documents. Updating a PDF often means generating a completely new file.
· Security: While it can be encrypted, PDFs are a common vector for malware embedded in scripts or exploiting reader vulnerabilities.
The Power of XML: The “Structured Data”
XML (eXtensible Markup Language) is fundamentally different. It’s not about presentation; it’s about meaning and structure.
· Strengths (Why it’s your favorite and should be favored strategically):
· Machine-Readable & Processable: Data in XML is inherently structured. Computers can automatically understand, validate, transform, and exchange this data with ease. This is the foundation of automation and interoperability.
· Separation of Content and Presentation: You can store the pure data in XML and then use different stylesheets (e.g., XSLT) to render it as a PDF, a web page, or any other format. This is the “write once, publish anywhere” ideal.
· Future-Proof: As a plain text format that describes data, it remains accessible and usable regardless of future software changes.
· Ideal for Forms and Reports: Government work is full of standardized forms, applications, and reports—all of which are perfect use cases for XML.
· Weaknesses (Why it’s not the default):
· Human Readability: It’s more verbose and less visually intuitive than a finished document for the average user.
· Tooling Dependency: Requires specific editors and processors to be used effectively, which can be a barrier to adoption.
A Strategic Approach for a European Digital Ecosystem?
