Wednesday, January 8, 2025

HIPAA NPRM Summary

If you've been or catching up from hiding under a rock and getting back out from under after the holidays, there's a new HIPAA Security Rule out for review.  

Printed out, the newly proposed HIPAA Security Rule is about 465 pages, of which 121 are footnotes, and 38 pages are actual regulatory text, leaving 306 pages of prefatory/explanatory/justification material in front matter.  I'll skip to the rules in my first read through.

And then we get to Subpart C, which is the remainder of the HIPAA proposed rule changes.

First, the definitions get an update ...

Definitions were added for the following key terms:

  • Deploy
  • Implement
  • Multifactor authentication
  • Risk
  • Technical Controls
  • Vulnerability

Some definitions were clarified, but not really functionally changed from the perspective of a reasonably educated person.

  • Administrative safeguards
  • Information System
  • Password
  • Physical Safeguards
  • Security or Security Measures
  • Security Incident
  • Workstation

With respect to "Reasonably educated", that includes neither lawyers nor regulatory pedants.  Both are over-educated and so might actually care about the improved text in HIPAA

Finally, three definitions were somewhat changed in HIPAA:

Access: Add delete, transmit, substitute "component of an information system" for "system resource"

Malicious software: Now includes "firmware" with more description of the intent or impact of the software.

Technical Safeguards: Clarified and included technical controls as a type of safeguard.

§ 164.306 Security standards

General rules is revised a bit, but mostly unchanged EXCEPT:

  • (b)(2)(v) is added to require consideration effectiveness of the measure AND
  • (c) requires both standards & implementation specifications and (d) drops [THIS IS A BIG CHANGE].

§ 164.308 Administrative safeguards 

is very little like its predecessor, although I imagine it includes all of the requirements of that, plus a lot more.  I'm going to do a deeper review of the changes to HIPAA 45 CFR 164.308 later.

§ 164.310 Physical Safeguards

Mostly the same, ADDED annual maintenance requirement to each standard whereby you must review & test policies & procedures at least annually.

And implementation specs for workstation use & technology assets (a.k.a., devices)

§ 164.312 Technical Safeguards

adds a lot of new content and is going to require deeper analysis.

§ 164.314 Organizational requirements

I would say this is largely unchanged except the new requirement that any time an organization activates its contingency plan it must notify the organization or group health plan it has a BAA with w/in 24 hours.

§ 164.316 Documentation requirements 

is largely unchanged but somewhat restructured.  The maintenance of documentation is strengthened from as needed to at least annually.

§ 164.318 Transition 

was previously about Compliance deadlines & remains so, but in proposed rule, the text gets more convoluted and has to do with existing renewals and deeming compliance based on existing contracts.  Get your lawyers to explain it, I'm not gonna.

§ HIPAA 164.320 Severability 

adds a clause that basically says:

If anything here is invalid or unenforceable, etc... it shall be interpreted to give the maximum effect & if necessary will be held separate so as to not affect anything else we said you gotta do.

That's the end for now on my read of changes in HIPAA.  There will be more as I must do deeper analysis on 308 and 312.



Friday, December 20, 2024

Flying

 Last year I wanted to gift myself with some flying lessons.  I finally managed to do that earlier this month, and enrolled in the Top Gun Experience at AeroVenture, a local flying school.  The airport is about 20 miles (as the crow flies) from my house, or about a 35-minute drive.  It's six hours of instruction, 3 on the ground and 3 in the air, so NOT enough for a private pilot license, but enough to get started if you wanted to.

I took my first class on Wednesday.  We spend about 40 minutes in the simulator and talking about flight physics.  This was the simulator setup.  That's a few thousand dollars of equipment and a PC and monitor running the flight simulator application.

On Wednesday, I flew a Piper Warrior, which looks like this from the outside.


And like this on the inside:


Here I am in the student seat with head-gear.

Yesterday I flew a Cesna 172 Skyhawk:

These are the controls:

As part of my ground instruction, we planned the flight to my house, about 15 minutes (flight time) from the airport, and then took the plane out of the hangar.  Here's the overhead view (the instructor was flying while I took the picture).

I have one more flight, the Friday morning after Christmas.  This was an awesome experience. es evidenced by my ear-to-ear grin.















Tuesday, December 17, 2024

A Christman present from ASTP

#HTI2 Part 2 just dropped. At 54 pages, Health Data, Technology, and Interoperability: Protecting Care Access is one of the shortest rules I've seen recently from ASTP/ONC.  You can find it in the FR at https://federalregister.gov/documents/2024/12/17/2024-29683/health-data-technology-and-interoperability-protecting-care-access or in PDF form here https://govinfo.gov/content/pkg/FR-2024-12-17/pdf/2024-29683.pdf

There's just two pages of regulatory text in Protecting Care Access part of #HTI2. 

A definition for "Reproductive Healthcare" is added to section 171 -- which explains in part the rationale for this rule.

And Information Blocking is updated with 171.206 to permit blocking to protect "persons seeking, obtaining, providing, or facilitating reproductive health care are at risk of being potentially exposed to legal action ..."

The text for it starts here https://federalregister.gov/d/2024-29683/p-amd-5

Protecting Care Access truly is a Christmas present.  Thank you ASTP!



Wednesday, December 11, 2024

HTI2 just dropped ... light reading for the holidays

The #HTI2 Final Rule just dropped. At 156 pages, this is light reading for a December final rule from the Assistant Secretary for Thwarting PTO.

You can read the final text PDF of Health Data, Technology, and Interoperability: Trusted Exchange Framework and Common Agreement (TEFCA) here, and on Monday 12/16, here.

The key question for anyone who read the original war and peace version would be, where did over 1000 pages go? The answer provided by ASTP is rather simple: "Comments received in response to other proposals from the HTI-2 Proposed Rule are beyond the scope of this final rule, are still being reviewed and considered, and may be the subject of subsequent final rules..."

So, next year ... maybe.

The key changes are:

  • Complete EHR and EHR Module terms have been removed from #HTI2
  • They finalized the TEFCA Manner Exception in subpart D of part 171 with no revisions.
  • They added 45 CFR part 172 which codifies provisions related to TEFCA
Changes to 170.315 (certification criteria) in #HTI2 are minimal, see https://public-inspection.federalregister.gov/2024-29163.pdf#page=126


Changes to Section 171 add a severability clause and reference definitions from Section 172 (new), see https://public-inspection.federalregister.gov/2024-29163.pdf#page=129



What they DID NOT do in HTI-2 was change what they wrote in HTI-1 here:

The final 27 pages of HTI-2 add section 172 TRUSTED EXCHANGE FRAMEWORK AND COMMON AGREEMENT to regulation. These regulations apply to QHINs in the main, not Health IT providers that are NOT QHINs, and so I'm not going to cover them in detail, but I will cover one topic, delegation. ASTP is permitted under #HTI2 to delegate some of its responsibilities to the RCE (@sequoiaproject). See https://public-inspection.federalregister.gov/2024-29163.pdf#page=139

These include:

Subpart C—QHIN Onboarding and Designation Processes
172.300 Applicability.
172.301 Submission of QHIN application. 
172.302 Review of QHIN application. 
172.303 QHIN approval and Onboarding.
172.304 QHIN Designation.
172.305 Withdrawal of QHIN application. 
172.306 Denial of QHIN application. 
172.307 Re-application. 

Subpart D—Suspension
172.400 Applicability. 
172.401 QHIN suspensions.
172.402 Selective suspension of exchange between QHINs.

Subpart E—Termination 
172.501 QHIN self-termination.
172.503 Termination by mutual agreement.

So, like I said, a light rulemaking from ASTP this Christmas. Sounds like they have more work to do to earn the appelation: Assistant Secretary for Thwarting PTO.

Merry Christmas to all!

Sunday, October 20, 2024

A minor performance improvement

I work on a spring-boot application using embedded Tomcat (several in fact).  A rewrite in the last year of this application tripled the load capacity (that is NOT what this blog post is about).  That load capacity well more than needed to handle the entire load of the system with a single instance, but good systems grow, and there is also the need for excess surge capacity.

Pushing beyond that 3X limit though, the application was having failures under more stressful loads.  Since I was well over the system design requirements (we run multiple instances of the application for redundancy), but it bugged me.  I had a few extra hours of unscheduled time to play with this, so I started a side project to finally track it down.

Shortly into the fourth load test (the one that goes just beyond that 3X load mark), the application is restarted by AWS after failing 3 consecutive health checks.  I'm supposed to get a log message about this shutdown, but don't.  That turns out to be a signal propagation issue combined with a timing issue (my logger is running in a small separate background process).

The key problem wasn't a memory issue, threading issue, or a CPU load issue.  Heap memory was 1.4Gb, well under the 8Gb max, 4Gb allocated heap size.  Threads were running around 40-50 threads, well below the worker threads limit.  CPU load at application startup might hit 50%, but after that was well below 20% at the time of the health check failure. Even the number of in-use TCP connections was well below any server-imposed limits. 

As it turned out, the issue was not about the number of in-use TCP connections, but rather on the number of available TCP connections.  The root cause was keep-alives in both the application and transport layers.  HTTP 1.1 (the application protocol) supports the keep-alive header, which allows a single connection to be used for multiple requests, saving startup time.  HTTP keep-alive prevents a socket from being closed after the response is sent back, so that it can be reused.  Lower on the network stack, TCP also supports keep-alive, which has a slightly different meaning. This ensures that an open socket that is unused for a while still remains open on both sides (by having the client or server send keep-alive packets).  

The server I'm working on has some long running requests that can take up to 10 minutes before a response is provided (the server is an intermediary talking to other, third-party services).  So long running TCP keep-alives are essential for correct operation, otherwise the poor client drops the connection before ever getting a response. AWS load balancers have a default setting for TCP of 350 seconds for TCP keep-alive, but this can be changed.  I recently had to adjust those settings to support a 10-minute request time (though that wasn't the root cause, the default setting of 350 was bad enough).

What it boiled down to was that I'd run out of available connections, not because connections were "active", but because they were being kept-alive by a combination of Application layer (HTTP) and Transport layer (TCP) configuration.  I could not rely on the HTTP client NOT using application layer keep-alives, and the system needed TCP keep-alives for one of its long-running services.  Given an actual request takes 10s to complete in worst case time, having a socket be unavailable for use for 5 minutes or longer is just too much.  That's 30 times as many connections unavailable than actually necessary.

That system has a security requirement that every request/response to the server goes through TLS negotiation, instead of reusing existing connections.  There was never ANY NEED or use for HTTP keep-alive. However, embedded tomcat supports these by default in spring boot.  The simple solution was to set server.tomcat.max-keep-alive-requests = 1.  

Initial testing confirmed that resolved the capacity problem.  Continued tested shows another 10X improvement in capacity, for a combined 30X load capacity gain with CPU and memory still under 50% utilization. As a result of these efforts, I can cut CPU costs by a good bit in production by reducing the number of redundant tasks and lower memory requirements by 50%.

My next minor annoyance (side project) is task startup time.

     Keith

P.S. I don't claim to be a network engineer, or a crypto expert, but given the amount of time I spend on networking and crypto in my day job, I might as well claim those skills.


Friday, October 18, 2024

Bouncy Castle FIPS (BCFIPS) 2.0 Upgrade

I've mentioned Bouncy Castle a few times in this blog over the past year.

The easiest major version upgrade yet I've ever had to execute was upgrading from BC-FIPS 1.X to 2.X.  New in Bouncy Castle 2.0 is certification under FIPS 140-3 instead of FIPS 140-2 (all new certifications follow NIST 140-3 requirements).  It also includes support for Java 21 as well as Java 17 and prior releases.  You can find the NIST Certificate details here: 4743

Really, all I needed to do was update my pom.xml files.  Smoothest major upgrade ever.

Well, technically, I did have to do a couple of other things.

1. Download bc-fips-2.0.0.jar into my project so that I could use it in local calls to Java's KeyTool (I have to convert a jks store to bcfks format in my build process.

2. Add the jar files to my Docker Image.  BC-FIPS (at in 1.x versions) cannot be rolled up into an Uber-Jar for Spring Boot given changes in the way that jar url handling happens.  This is because the module validation code in BC-FIPS has to be able to access the class data in the JAR file.

These are the file versions you need to change.

Old                                    New
bc-fips-1.0.2.X.jar    bc-fips-2.0.0.jar
bcpkix-fips-1.0.7.jar  bcpkix-fips-2.0.7.jar
bctls-fips-1.0.19.jar  bctls-fips-2.0.19.jar

    Keith


Wednesday, September 11, 2024

Python Requests Timing Out

 Somehow it seems that every edge case for TLS or TCP communications winds up in my queue.  I don't do much Python development personally (almost nothing in fact).  That's a language that fits into the I can read or debug it category, rather than the speak fluently and know the lexicon.  So, when and end user had a problem that they couldn't resolve in Python it took me quite some time to figure it out.

In a nutshell, the problem was that uploads of a file and data using POST and MIME multipart/form-data in Python weren't working for files over a particular size but were for smaller files.  I couldn't reproduce the problem on my local system, or in my development environment, so this was very troubling.  We'd tested this with files of the size this user provides, and so were pretty certain it would work, but in actual practice, they couldn't make it work.

We'd had others upload using other platforms, Java, C#, bash scripts using CURL, and browser-based forms, so we felt that there was nothing specifically wrong in the implementation and seen nothing like this problem.

Symptomatically, we'd see the entire request (which took a while to upload), and it would be processed (which also took a while), and we also had a log of the outbound response, so we knew a 200 OK success response had been sent back, but the user wasn't getting it.  The read was timing out.  This very much looked to me like a firewall problem, and we had the networking folks check it out, but clearly the firewall on the service wasn't closing the socket prematurely.  The request takes about 30 minutes to upload and process, so not a great candidate for debugging via Wireshark, and as I said, the problem did not occur in my development or local environments.

I finally tracked this down yesterday.  It really was a networking issue, but not in the firewall.  Instead, it's a problem in the network gateway.  In the AWS environment, if the network gateway does not have activity on a connection after 350 seconds, the connection is dropped.  Large file processing takes time on the order of file size, and the large files the user was sending took about 10 minutes to process.  While sending the file there was activity, but once it was being processed, there were several minutes of latency, and so the connection was dropped.

Why did this work in Java, Browsers, and with curl?  Well, b/c there's something called TCP Keep Alive packets, which exist entirely to address this sort of issue.  Those application environments have keep alive turned on sufficiently so that it's NOT a problem.  But Python does not.  

The fix is to add this to your imports:

from requests_toolbelt.adapters.socket_options import TCPKeepAliveAdapter

And this to the Session where you are doing long uploads:

    s = grequests.Session()
    keep_alive = TCPKeepAliveAdapter(idle=120, count=50, interval=30)
    s.mount('https://', keep_alive)

This, finally resolve the problem.  You'll note I'm using grequests instead of requests.  The two are generally API compatible, and grequests solves another problem I don't admittedly understand or have much time to debug.  In any case, what I've seen elsewhere indicates that if you've got connection timeout exceptions you cannot otherwise track down, and you are connecting to an AWS endpoint, and you know there's a long delay in processing, this might just be the solution you are looking for.