Welcome to the fourth and final installment of Creating an FHIR API with GCP. So far, we’ve covered a lot!
We discussed the differences between Google and Azure, landing on GCP as the best option for FHIR in Part 1. We began our implementation in Part 2, creating both the BigQuery resources and your FHIR repository resources. And finally, in Part 3, we tackled authentication methods and populating data in our FHIR repository.
This time, we’ll wrap everything up with a nice little bow. First, we’ll finish our implementation, and then, I’ll share the limitation I found – for the sake of transparency. Let’s dive in.
Configuration of the Pub/Sub Subscriber
In the last installment, we finished the creation of the Patient resource, so the next step is to create the subscriber for the Pub/Sub topic. Recall that Pub/Sub was enabled when the FHIR data store was created. Putting “Pub/Sub” into the search and clicking on the first result shows the topic that was created.
Click on the topic to see that no subscriptions exist.
Subscriptions can be created either through the GCP Console, the gcloud CLI (or IaC tools like Terraform), or at run time through most SDKs. There are advantages and disadvantages to each approach.
Creation through the GCP Console is quick, though it requires manual work to promote to other environments and leaves open room for human error. The SDK route allows the application to configure its own topology, at the cost of having the application use a service account with a higher level of permission than simply publishing or subscribing to messages. The ideal long-term approach would be through IaC as it has the most benefits with the fewest drawbacks.
For the purposes of this blog, the subscription will be created through the GCP Console. The subscription will be for subscribers interested in changes to Patient resources. After clicking on the Create Subscription button, the form includes a lot of possible configuration options.
The ID for the subscription will start with “patient-” to indicate that the subscriber is interested in changes to patients. The ID will end with the name of the FHIR data store.
The delivery type is a Pull since the consumers will be invoked as a reaction to a Patient resource change. The rest of the defaults around retention, expiration, and acknowledgment are fine as is.
The next major option on the form is to configure the subscription filter. This allows the publisher to send messages to the topic, and for each subscription to decide what subset of messages the subscribers receive. This is especially important for chatty topics as the cost to receive, unbox, and process a message can be non-trivial at a large scale.
Because this subscription is centered around Patient changes, the filter will include only Patient resources. Additional filtering can take place around the action (Create versus Update versus Delete).
One of the easiest, if not rudimentary, ways to see what can be filtered by a subscription is to create the subscription with no filter, publish a message, and see what shows up.
attribute.resourceType of Patient fits the desired filter for this subscription.
The next set of options deals with advanced features of Pub/Sub, like exactly once delivery, message ordering, dead lettering, and retry policy. For the purposes of this blog post, these options will be skipped, though they should be explored and considered for production scenarios.
Publish a few messages after the subscription is created. The screen will initially show messages found yet.
Click on the Pull button, and these messages will begin to show up.
With confirmation that FHIR Patient changes are filed from the publisher (GCP FHIR Repository) into the subscription for the topic, the consumer is the next piece of the puzzle.
Following the .NET Pub/Sub documentation from Google, the first step is to install
Google.Cloud.PubSub.V1 NuGet package. The second step in the documentation is to configure an environmental variable for credentials that will be used for authentication. It is often difficult in a development scenario to configure environmental variables consistently across all of the machines that developers use unless a homogenous development context is used (e.g. Docker, Visual Studio Code Development Container).
There are a few different methods I’ve found to provide the GCP Pub/Sub subscriber with the credentials. The simplest way is to give it the JSON file that was downloaded previously. It looks like what it’s doing under the cover is extracting out the private key that was embedded in the JSON file. So, a possible production setup might include storing that private SSL key in a secure location (e.g. Azure KeyVault Certificates), pulling it out at the run time, and then providing it to the GCP SDK to generate the channel credentials.
The subscriber is then ready to put everything together. This is what the full setup looks like, along with a screenshot of what a sample message looks like:
To decode the message data, use the following.
The messages published by the FHIR data store are the reference to what resource changed (e.g.
datasets/dsGCPDDEMO001/fhirStores/fdsGCPDDEMO001/fhir/Patient/1000006). It does not include the actual definition of the Patient resource as part of the Pub/Sub message. This requires the subscriber to then make a call out to get the resource. The Fhirly library is smart enough to understand that when it’s passed the full resource locator rather than just the suffix (e.g. Patient/100006) behaves the same.
Showing the subscription of the message, extracting out the message data, then sending it to the FHIR Repository to get the resource will look like this:
Querying the FHIR Repository with BigQuery
With the FHIR data set actually having data inside of it, the changes have been streamed to BigQuery, and data can now be queried. The most straightforward way of doing this is to search for “BigQuery,” navigate to the previously created dataset, create a new query, and
This will show the first 1,000 resources in the dataset. However, it is not very useful to see just the ID of the resources. It would be more helpful to be able to filter on the individual pieces of data inside and return those as discrete columns.
Thankfully BigQuery figures out the shape of the data inside as it’s streamed into the store, so the Schema tab will reveal the fields.
Using the auto-complete functionality of the query editor, it’s easy to see what can be queried for, the type, etc.
These fields can also be used in the WHERE clause, just as a developer would expect from traditional SQL.
This flexibility allows for applications to be built using all the strengths of a robust FHIR Repository. That data will then automatically be fed into a platform for analytics and reporting.
A Drawback: Consent Management API
One of the more interesting features of GCP’s FHIR offerings is its Consent Management API.
FHIR does not inherently offer a security or access model. Therefore, it is up to the application that uses the FHIR API to implement and enforce its own security.
This is further complicated by the BigQuery and Pub/Sub integration. A brute force implementation would be to have a different FHIR repository for every type of end-user role (e.g. researcher, behavioral health specialist, etc). The Consent Management API attempts to bridge this gap by offering a single source of truth if a user can access a particular resource.
The Consent Management API is implemented as a separate data store. It works by an administrator creating consents and attribute definitions related to the underlying resources in the FHIR repository. The API can then be invoked with the role(s) of the currently logged-in user along with the resource that is trying to be accessed.
The implementation then comes down to making two API requests for every
GET resource. The first is to get the actual resource, and the second is to check the access control. This can be implemented somewhat easily in an API Management layer or through application code.
The first major downside of this approach is that it can take two API calls rather than one, and it requires the implementer to enforce this in every
GET. The second is that the BigQuery and Pub/Sub integration needs to be wrapped with calls to API Management, potentially slowing down two features that are supposed to be high throughput.
Having the consent policy evaluation take place at the point of the data access, rather than as a required second factor, would be a better implementation. You should consider the factors in your situation before choosing one way or another.
There are several options available for development teams to create their own FHIR Repository. Out of the ones offered by the major cloud vendors, GCP’s implementation is the most feature-rich and ready for production, which is one of the reasons we chose to use it over other options.
The only downside I can see is that the consent policy implementation, which governs access controls, is implemented as a second party to the FHIR API, BigQuery, and Pub/Sub features. Once this native evaluation is implemented, the GCP FHIR offerings will be the de facto choice, until another vendor has a better implementation.
If you have any questions about the above implementation or are curious to know more about our choice, please feel free to leave a comment or reach out to me directly. I’m always happy to help!
As always, if you enjoyed this post or found it helpful, subscribe to the Keyhole Dev Blog. We publish weekly.
Series Quick Links
- Part 1: Creating an FHIR API – Google or Azure?
- Part 2: Creating an FHIR API – Implementation Part A
- Part 3: Creating an FHIR API – Implementation Part B
- Part 4: Creating an FHIR API – Wrapping Things Up