top of page
Search

Turning Archives into Opportunity: Lessons from a Massive XML Conversion

  • Writer: Laurent Galichet
    Laurent Galichet
  • Jan 10
  • 2 min read

In 2011, the International Organization for Standardization (ISO) embarked on a bold and complex journey — converting 30,000+ existing standards from disparate formats like Word documents, character PDFs, and scanned PDFs into a unified XML format known as ISOSTS for long-term digital use. What sounds like dry technical work on the surface turned into a lesson in project management, quality assurance, and the surprises that can derail even the best-planned digital transformation.

The Goal: Standardise and Preserve

ISO’s goal was ambitious:

  • Create a central XML repository of all their standards so content could be more easily shared, reused, and published online.

  • Streamline production workflows.

  • Broaden access for members and stakeholders across industries.

The task seemed simple at first — but converting hundreds of thousands of pages of legacy material isn’t just a “file format” problem. It’s a project management challenge wrapped in technology and human process.

Building the Conversion Workflow

ISO launched a request for proposals and chose a partner to handle the XML conversion. A detailed contract was signed, including performance benchmarks:

  • Extremely high text accuracy (no more than 5 errors per 100,000 characters)

  • Tagging accuracy for XML structures (no more than 5 tag errors per 10,000 tag pairs)

  • Support for complex elements like mathematical equations and tables.

Because of the content’s complexity — especially older scanned PDFs — the project required more than automation alone. A hybrid workflow evolved with human review integrated into the process.

The “Laughing Horse” Moment

Perhaps the most memorable example from the project — and its title source — happened when a vendor mistakenly replaced a scientific graph with a picture of a laughing horse during conversion. What began as internal humour quickly became a red flag: If human testing is too light or quality checks too limited, critical errors can slip through even the best automation systems.

This incident became a cautionary tale — a reminder that technology and trust must be balanced with rigorous validation. Automation can quickly process volumes of data, but nothing replaces careful human oversight.

What the Project Taught Us

Here are the key lessons ISO learned over the course of this mammoth conversion:

1. Planning and Quality Matter as Much as Technology

Investing time in workflow design — including roles, checkpoints, and accuracy standards — made all the difference.

2. Human Review Can’t Be Replaced

Even the best automation tools require human checks, especially for complex content like scanned images, equations, and embedded figures.

3. Clear Communication Is Essential

Strong communication between project managers, vendors, and quality teams helped resolve issues faster and maintain standards.

4. Expect the Unexpected

From the “laughing horse” moment to underestimated page counts, surprises were constant. Plans must account for unknowns and buffer both time and resources.

Final Thought

Large-scale digital conversions aren’t just about updating file formats — they’re about preserving knowledge, building future-proof systems, and respecting the integrity of information that people rely on. Whether you’re tackling an internal content archive or preparing digital assets for the next decade, this case study from ISO shows that successful transformation rests on strong project discipline, smart technology use, and unwavering quality assurance.

 
 
 

Comments


  • Grey Twitter Icon
  • Grey LinkedIn Icon
  • Grey Facebook Icon

© 2035 by Talking Business. Powered and secured by Wix

bottom of page