You see, attachments use EDI formats to exchange information. Many of these systems still use "big iron", and translation between ASCII on one end and EBCDIC on the other, and visa versa are common. This was a challenge because carraige return/newline pairs in one environment get translated to single characters in another, and so forth.
The X12 BIN Segment has two parts, the binary data, and a count of the length of that data. The challenge is that the whitespace is being changed from 2 characters to 1 or visa versa. And then when the system outputs the record in the new encoding, the length is off. This problem is clearly stated in the 275 transaction that contains a BIN segment:
It has been noted that line constraints, transfer protocols ... may insert additional control characters ... If this occurs in BIN02, the senders stated count in BIN01 may no longer be equal to the received content of the data in BIN02.This is a truly big challenge, because if BIN02 count doesn't match BIN01, business logic set up to detect errors could wind up rejecting the response to the request for an attachment.
There is an even more sinister problem in XML. There are three different valid ways to represent the character A, each with a different length. Once as the letter A, another using a decimal character entity, and the final as a hexadecimal character entity. As far as an XML processor is concerned, all three are the same. And then there are different character sets that an XML processor must support.
This would seem to provide a challenge for creating digital signatures, but XMLDSIG addresses that with the Canonicalization Algorithm. The algorithm ensures that the digital signature is computed over the same content by ensuring a consistent octet stream is used for the computation.
In X12, the BIN01 serves a similar purpose as a digital signature. It is a "check" that the data has not been modified (but it doesn't support non-repudiation or any other cool features of XMLDSIG. Canonicalizing Base-64 is simple. Ignore any whitespace.
Given that the specification notes the disparity, I would presume that receivers are expected to deal with it. So, give an innaccurate count to start with. Don't count the whitespace because it doesn't matter. Better yet, use the octet length of the original CDA document. The challenge here is that senders and receivers would have to change the counting algorithm for the BIN segment to make it ignore whitespace. That might be hard, but the only other solution I can think of is to upgrade the operating systems and hardware. I haven't run into an EBCDIC to ASCII issue in years.