Wednesday, December 6, 2017

RTF to PDF Conversion

A lot of medical documents are still created in RTF format (or can be readily accessed as RTF).  This is due to the use of Word as a tool in some transcription environments.  Converting these documents to a "standard" format is a bit challenging.

There are some tools that will convert RTF to HTML (or XHTML), but my goal was to be able to convert them to PDF so that I could incorporate the content into the IHE Scanned Document format (and to the HL7 CCDA Unstructured Document format).

I needed this quickly, so I started looking for some open source libraries.  One of the ones I found was LibrePDF.  I had (from a former life) been familiar with the open source iText product, which would have been my first go to, but unfortunately, its licensing model isn't conducive for many to use (it's AGPL, essentially copy-left), even though prior versions had been LGPL or MPL.  It also dropped RTF support in later versions as well.  LibrePDF is a branch from the last MPL version of iText, and still has RTF parsing tools as well.

Unfortunately, LibrePDF doesn't really provide a great deal of information on how to use the components, so here's a quick summary:

To get what you need, including the following two dependencies in your pom.xml:

    <dependency>
        <groupId>com.github.librepdf</groupId>
        <artifactId>openpdf</artifactId>
        <version>1.0.5</version>
    </dependency>
    <dependency>
        <groupId>com.github.librepdf</groupId>
        <artifactId>pdf-rtf</artifactId>
        <version>1.0.5</version>
    </dependency>

The first one grabs the LibrePDF core components.  The second grabs the PDF-RTF tools.  When you grab the libraries you will also get bouncy-castle for decryption, encryption and signing.  You can ignore those unless you are going to be creating PDF files that require those capabilities.  For XDS-SD format PDF files, these features are not essential.

Having done that, you can now use this gist from Ajay Ramesh on GitHub to understand how RTF to PDF conversion is done.  You can comment out the line that reads:

   System.setProperty("os.name", "Windows 7");

This is no longer necessary, because the LibrePDF code doesn't have the same problem that iText 4.2.1 had reported on StackOverflow.

Having generated your PDF, you can now wrap it inside a CDA document, or perhaps use a FHIR DocumentReference resource.

     Keith


3 comments:

  1. Wouldn't it be easier to send it to the virtual printer "save to pdf"? You could probably do that in a command in the cmd.

    ReplyDelete
    Replies
    1. That works too. This is a software solution that doesn't require special system configuration.

      Delete
  2. I am still in love with this! You all did an awesome job! Love it.
    RTF to PDF Converter

    ReplyDelete