Education, tips and tricks to help you conduct better fMRI experiments.
Sure, you can try to fix it during data processing, but you're usually better off fixing the acquisition!

Friday, August 15, 2014

QA for fMRI, Part 3: Facility QA - what to measure, when, and why


As I mentioned in the introductory post to this series, Facility QA is likely what most people think of whenever QA is mentioned in an fMRI context. In short, it's the tests that you expect your facility technical staff to be doing to ensure that the scanner is working properly. Other tests may verify performance - I'll cover some examples in future posts on Study QA - but the idea with Facility QA is to catch and then diagnose any problems.

We can't just focus on stress tests, however. We will often need more than MRI-derived measures if we want to diagnose problems efficiently. We may need information that might be seem tangential to the actual QA testing, but these ancillary measures provide context for interpreting the test data. A simple example? The weather outside your facility. Why should you care? We'll get to that.


An outline of the process

Let's outline the steps in a comprehensive Facility QA routine and then we can get into the details:

  • Select an RF coil to use for the measurements. 
  • Select an appropriate phantom.
  • Decide what to measure from the phantom.
  • Determine what other data to record at the time of the QA testing.
  • Establish a baseline.
  • Make periodic QA measurements.
  • Look for deviations from the baseline, and decide what sort of deviations warrant investigation.
  • Establish procedures for whenever deviations from "normal" occur.
  • Review the QA procedure's performance whenever events (failures, environment changes, upgrades) occur, and at least annually.

In this post I'll deal with the first six items on the list - setting up and measuring - and I'll cover analysis of the test results in subsequent posts.


Choose an RF coil

RF coils break often. They are handled multiple times a day, they get dropped, parts can wear with scanner vibration, etc. So it is especially important to think carefully before you commit to a receiver coil to use for your Facility QA. What characteristics are ideal? Well, stability is key, but this is at odds with frequent use if you have but a single head coil at your facility. If you have multiple coils and are able to reserve one for Facility QA then that is ideal. The coils in routine use can then be checked separately, via dedicated tests, once you're sure the rest of the scanner is operating as it should.

When selecting a coil you also want to think about its sensitivity to typical scanner instabilities. If you have an old, crappy coil that nobody uses for fMRI any longer, don't resort to making that the Facility QA coil just because it's used infrequently! You want a coil that is at least as sensitive to scanner problems as those coils in routine use.

I use the standard 12-channel RF coil that came with my system. I happen to have two of these beasts, however, so if there is ever any question as to the coil's performance I am in a position to make an immediate swap and do a coil-to-coil comparison. I also have a 32-channel head coil. I test this coil separately and don't use it to acquire scanner QA measurements, but that's just personal choice. I've found that the 32-channel coil breaks more often than the 12-channel coils, simply because it has five plugs versus just two plugs for the 12-channel coil.


Select a phantom

Here there is really no excuse not to have a phantom dedicated to Facility QA. This phantom should be used only for QA and only by technical staff. You might want to purchase a phantom for this purpose, or simply designate something you have on-hand and then lock it away.

What characteristics should the phantom have? In my experience it doesn't matter all that much provided it approximates the signal coming from a human head. It doesn't need to be a sphere, but it could be. I decided to use one of the vendor-supplied doped water bottles when I devised my Facility QA scheme, and it was for a very simple reason: it was what I had! Perhaps I could have ordered, say, a second FBIRN phantom but I simply wasn't that forward-thinking.

I did, however, take the precaution of having a dedicated holder built for the cylindrical bottle I use. This holder keeps the phantom in exactly the same orientation with respect to the magnet geometry, thereby assuring near-identical shimming for every Facility QA session. (Shim values are some of the ancillary information we'll record below.) Reproducible setup is arguably more important than the particular characteristics - shape, size, contents - of the phantom.

There may be some other considerations before you commit to your Facility QA phantom. Do you need to compare your scanner's performance with other scanners? Cross-validation may require a specific phantom. Also, do you need to measure the performance of anatomical scans, or can you (like me) focus almost exclusively on fMRI-type stability testing? You may even need two or more phantoms to run all the Facility QA tests you need.


Decide what MRI data to acquire

Here's my Facility QA protocol in a nutshell:

  • Localizer scan (15 sec)
  • 200 volumes of "maximum performance" EPI at TR=2000 ms (6 min 44 sec total scan)
  • 200 volumes of "maximum performance" EPI at TR=2000 ms (6 min 44 sec total scan)
  • 200 volumes of "typical fMRI" EPI at TR=2000 ms (6 min 44 sec total scan)
  • Various service mode QA checks (approx. 15 min) 

Including setting up the phantom and recording various ancillary data, the whole process takes about 45 minutes to perform. This allows a further 15 minutes to analyze the EPI data on the scanner, for a total one hour commitment.

I use two types of EPI acquisitions (see Note 1) in my Facility QA protocol: one which is (close to) "maximum performance" and one that is representative of a typical user's parameters for fMRI. There have been instances when a problem has shown up in the user scan and not in the maximum performance scans, most likely because the user scan is applied in a slightly different axial-oblique orientation that requires driving the imaging gradients differently.

The idea with the maximum performance scans is to kick the scanner where it hurts and listen for the squeal. The first time series is inspected for problems but isn't analyzed further. It's essentially a warm-up scan. I fully analyze the second scan, however, making several measurements that reflect the temporal stability of signal, ghosts and noise. More on those measurements in later posts.

Why only one warm-up acquisition of under 7 minutes? More warm-up scans could be warranted if you have the time. My scanner achieves a thermal steady state in about 15 minutes. But I also have very efficient water cooling that means even a short delay between EPI runs, e.g. to re-shim, will cause a major departure from equilibrium. I determined that I could get sufficient stability in imaging signal after a 7-minute warm-up, so that's what I use. If you have reason to worry about the thermal stability of your passive shims in particular, then by all means warm up the scanner for 15-30 mins before running your QA. I've found it's not critical for my scanner and, as with all things QA, it's a tradeoff. More warm-up scans would take me over an hour for the entire Facility QA procedure.

The parameters for the "maximum performance" EPI acquisitions are in these figures (click to enlarge):


There are 40 descending slices acquired axially (perpendicular to the long axis of a doped water bottle) with the slice packet positioned at the bottle center. Provided the positioning is reproducible on the phantom, I don't think the particular slice position and orientation is as important as acquiring as many slices as possible in the TR of 2000 ms. We want to drive the scanner hard. The TE, at 20 ms, is comparatively low for fMRI but it permits a few more slices in the TR. I decided to use 2 mm slice thickness to drive the slice selection gradients about as hard as they ever get driven. But I decided to keep the matrix (64x64) and field-of-view (224x224 mm) at typical fMRI settings because, with the the echo spacing set short, I could get a larger number of slices/TR than with higher in-plane resolution. It's just another one of the compromises.

A word about the echo spacing. My scanner will actually permit a minimum echo spacing of 0.43 ms for a 64x64 matrix over a 224 mm FOV. I test at 0.47 ms echo spacing, however, because I observed that my EPI data were too sensitive to electrical power instabilities at the shortest possible echo spacing (see Note 2). Backing off to 0.47 ms eliminated most of the acute power sensitivity yet maintains an aggressive duty cycle and permits me to disentangle other instabilities that could manifest in the ghosts. (Recall that the N/2 ghosts are exquisitely sensitive expressions of EPI quality, as covered here and here.)

In the third and final EPI time series of my Facility QA protocol I test an EPI acquisition representative of a typical fMRI scan. As before, I use a standard (product) pulse sequence:


All pretty basic stuff. Thirty-three slices acquired at an angle reminiscent of AC-PC on a brain, and other parameters as appropriate for whole brain fMRI. As with the first "max performance," or warm-up, time series, I don't actually record anything from this time series. It is inspected for obvious problems only. I've found that analyzing the performance of the second "max performance" time series is generally sufficient to detect chronic problems. Intermittent problems, such as spiking, are addressed in separate, dedicated tests (see below).

Why don't I just do three acquisitions at maximum performance? I could, I suppose, but I prefer to have at least one look at the scanner performing as it does for a majority of my users' scans. It gives me an opportunity to assess the severity of a (potential) problem detected in the earlier time series, hence to make a decision on whether to take the scanner offline immediately, or whether to re-check at a later time and try to minimize the impact on the users.

What isn't tested in my protocol?

It should be clear that exhaustive testing of every parameter is impractical. In the protocol above I am using only two image orientations and two RF flip angles in total, for example. It is quite possible that gradient spiking will show up earliest in one specific image orientation because of the particular way the imaging gradients are being driven. Even testing all the cardinal prescriptions - coronal, axial, sagittal - would increase the total time considerably yet there's no guarantee that spiking would always be caught (see Note 3).

As for the RF flip angle, if the RF amplifier develops a problem at high power settings and I test only at the relatively low powers used in EPI for fMRI, I may well miss a slowly degrading RF amp. I would hope to catch the degrading performance eventually in the measurements that I do make. Still, if you were especially worried about the stability of your RF amp you could add a high flip angle time series to the tests. You need to determine the priorities for your scanner based on its history and the way it gets used.

Some other things I'm not testing directly: gradient linearity, magnet homogeneity, eddy current compensation, mechanical resonances. Many of these factor into the EPI data that I do acquire, so I'm not completely blind to any of them. My Facility QA protocol is primarily aimed at temporal stability as it affects fMRI data. Your facility may require additional MRI-based tests, e.g. gradient linearity determined on an ADNI phantom. And, of course, your scanner should be getting routine QA performed by the vendor to ensure that it stays within the vendor's specification.


Ancillary data for Facility QA

Now let's shift to considering other data we might record, either because the data could reveal a problem directly or because it might help us diagnose a problem that manifests in the time series EPI data.

These are the fields presently recorded to my QA log:

Date & time of test - Self explanatory. Essential for proper interpretation!

Visual inspection of the penetration panel - Has someone connected an unauthorized device causing an RF noise problem, perhaps?

Visual inspection of the magnet room - Is anything out of place or otherwise obviously wrong?

System status prior to QA - Record whether the scanner was already on, or was started up prior to performing QA. Electronics can be funny that way.

Suite temperature and humidity - I have a desktop monitor that lives in the operator's room. Ideally I'd record in the magnet room with remote sensors, but measuring in the operator's room is a reasonable check on the suite's condition. A consistent temperature in the magnet room is important for general magnet stability. Humidity is critical for proper functioning of gradients in particular. Low humidity may cause spiking, but it can also increase the rate of component failures from static discharges. Furthermore, if you have an electrical equipment room that isn't at a near constant temperature, e.g. because people go into it frequently, then you will want to measure the temperature of that room separately. RF amplifiers are often air-cooled so changes in the surrounding air temperature tends to translate into RF amplifier instabilities.

Prevailing weather - I use the weather report from a nearby airfield. It gives barometric pressure, relative humidity, air temp and dew point, and the prevailing conditions (e.g. sunny, cloudy, rain, etc.). If you have a mini weather station of your own that's even better! Lest you think this information is overkill, more than one site has found that their magnet went out of specification when the sun was shining on the MRI suite. Extreme temperature may have direct effects, e.g. via passive shielding or building steelwork, or indirect effects, e.g. high electrical load for your building's air conditioning. Large, rapid changes in barometric pressure may affect magnetic field drift in some magnet designs, too.

(The following data may only be available via a service mode interface. Check with your local service engineer.)

Gradient coil ambient temp - Temperature of the gradient coil (or return cooling water) before commencing QA. The equilibrium temperature is a function of the cooling water temp to the gradient coil and should be consistent.

Gradient coil temps before/after each time series EPI acquisition - Useful to determine if you are generating excess heat, e.g. because of an increased resistance in a gradient circuit, or if the gradient water cooling has a problem, e.g. low pressure or flow rate.

Magnet temperatures - You may be able to record the temperatures of some of the various barriers between the liquid helium bath (at 4 K) and the MRI suite (290-3 K, or 17-20 C). Your scanner vendor is likely monitoring these numbers remotely, but it doesn't hurt to keep a check on things yourself, especially if your site is prone to periods of extreme vibration - earthquakes, passing freight trains - or you have just had someone accidentally stick a large ferrous object to the magnet. Internal magnet temps can be an early indication of a possible quench due to a softening vacuum shield, amongst other things.

Helium level - Another good indication of something going wrong inside the magnet, although with the refrigeration units (cold heads) on modern MRIs the helium level over time may actually be a better indication of the health of your helium recycling than of the magnet per se.

Linewidth - This is the post-shim water linewidth for your QA phantom. If the position of the phantom is reproducible in the magnet bore then the linewidth should be similarly reproducible.

Magnet center frequency (in MHz) - Together with the magnet temp(s) and helium level, relative stability of on-resonance frequency is a good indication of overall magnet health. Changes may occur with weather conditions or suite temperature, however, so be sure to consider all parameters together when performing diagnostics.

Room temp shim values - A phantom placed reliably in the magnet should yield reproducible shim values when an automated shimming routine is used. (Auto-shimming is the default on all modern scanners.) There are eight RT shims on my scanner: three linear shims (i.e. the gradients themselves), X, Y and Z, and five second-order shims, Z2, ZX, ZY, X2-Y2 and XY. Record them all. Changes in the RT shims may indicate that you have a problem with your phantom (a leak?) or the phantom holder, or they could be an indication that the passive shim trays  - thin strips of steel positioned between the magnet and the gradient set - are working loose due to vibration.

Service mode tests - I include the vendor's RF noise check and spike check routines because these are two relatively common problems and I prefer to diagnose them directly, not via EPI data, if at all possible. You may not have permission to run these tests, however. If not, you could either rely on analysis of the time series EPI data discussed above, or add further acquisitions designed to be maximally sensitive to spikes and RF noise (see Notes 3 and 4).

Additional RF coil tests - My 32-channel coil can be tested with a dedicated routine available under the service mode. I don't acquire any EPI data with this coil.

Service/maintenance work log - It is imperative to keep a record of any work performed on the scanner, and to refer to this log whenever you are interpreting your QA records.

Anything else? -  That's rather up to you. Electrical supply data can be very useful if you can get it. I can get minute-to-minute voltages for my (nominal) 480 V supply. I don't bother getting these reports for every Facility QA session we run, but I ask for them if I see anything strange in my test data.


Establishing your baseline

Having determined the data you'll record it's time to define "normal" for your scanner and its environment. In my experience, six months of data allows me to characterize most of the variations. I want to know what the variance is but I am also keen to know why it is how it is.

There are no shortcuts to obtaining a baseline, you have to acquire your Facility QA as often as you can. If you have a new facility or a new scanner then you probably have a lot of scanner access; your routine users haven't started getting in your way yet. It should be feasible to run once a day, five days a week for at least the first several weeks, then you can begin to reduce the frequency until you are running once a week or thereabouts.

Recently fixed/upgraded scanners should be tested more frequently, in part to check that there are no residual problems but also to redefine the baseline in case it has shifted. More on interpreting the data in the next post.


When to run Facility QA

You have your baseline and you know what normal looks like for your scanner. To science! Except you now have to decide how often to check on your scanner's status to ensure that all remains well. Or, if you're a realist, to determine when something starts to go wrong.

Many people will prefer to have a fixed time in which to run QA. It may be necessary to fix the time slot because of scanner and/or personnel schedules. Is this a good thing? Not necessarily. Some degree of scatter may catch problems that vary with scanner usage, or with time of day. Say you decide to do your Facility QA on a Saturday morning because it's when you have plenty of time. That's fine, but if your building is barely occupied and the electrical load is significantly lower than during working hours Mon-Fri then you may miss an instability that affects midweek scans. So if you opt for a fixed slot for QA, first establish in your baseline measurements that the time of day and the day of the week are insignificant drivers of the variance in the measurements you're making.

If you find that time of day or day of week is a significant factor in your scanner's performance then you may wish to try to rectify the source(s) of the differences first, if this is possible. If not then fixing the day and time of your Facility QA may be required in order to work around the instabilities that are beyond your control. Remember, the point of Facility QA is not to show that the scanner's performance is constant 24/7/365, rather it is to catch changes (usually deterioration) in its performance under fixed test conditions.


Next post in this series: processing and interpreting your Facility QA data.


___________________________


Notes:

1.  If your facility, like mine, uses a customized pulse sequence for routine fMRI acquisitions, resist the temptation to use that sequence for Facility QA. Instead, use one of the product sequences (I use Siemens' ep2d_bold) and then set the parameters as close as you can to what you usually use in your everyday sequence. Why? Because the service engineer is going to want to know what sequence you used when you found the problem you're reporting. They will want to see the problem demonstrated on a standard sequence in case it's a coding issue. (Yes, it really does happen that physicists screw up their custom pulse sequences! ;-) So save yourself the extra time and effort and remain as close to "product" in your Facility QA as you can.

2.  At very short echo spacings the fraction of ramp-sampled data points approaches or exceeds the fraction of sampling that happens along the flat top of the readout gradients. Even tiny shifts in the gradient waveform to the ADC periods will yield intense, unstable N/2 ghosts. (See the section entitled "Excessive ramp sampling" in this post.) A common cause of mismatch is due to the electrical supply at the instant the gradients are commanded to do something. Now, my facility has pretty good electrical power stability these days, but it's not perfect. (I don't have a separate, dedicated power conditioner for the scanner.) So if the voltage on the nominal 480 V, 3-phase supply changes with load elsewhere in the building, these changes pass through to the gradient amplifiers and may be detectable as periodically "swirling" N/2 ghosts. It is actually quite difficult to tie these swirling ghosts to the electrical power stability because other instabilities may dominate, depending on your facility. For example, in my old facility my scanner had its own external chiller comprising two refrigeration pumps that cycled depending on the heat load in the gradient set. When running EPI flat out the pumps would cycle every 200-300 seconds, and this instability was visible as a small instability with the same period in the EPI signal. But now that I have a building chilled water supply rather than a separate chiller the water cooling is essentially constant (and highly efficient!), revealing the next highest level of instability underneath, which in my new facility is the voltage on the 480 V supply.

3.  Siemens offers a separate gradient "Spike check" routine that the service engineer can use. If you know the service mode password you can use it, too. I've found that the dedicated routine is hit and miss compared to EPI for detecting spikes, but the difference may simply be due to the amount of time spent testing. If an EPI time series is 6 minutes long there are many opportunities to catch spikes. The service mode spike check runs for only a few seconds (although it does sweep through all three gradients at many different amplitudes). Sometimes it takes many repetitions of the spike check to confirm spikes that I think I've detected in EPI.

4.   In addition to the spike check mentioned in note 3, the vendor will have an RF noise check that acquires periods of nothing, i.e. the receivers are simply opened to sample the environment in the absence of gradient and RF pulses. Different carrier frequencies and bandwidths are tested to span the full range used in MRI acquisitions. If you are unable to use dedicated routines for either spike or RF noise checking then don't despair, test EPI data can be used to check for significant problems. The process becomes heavily dependent on analysis, however, so I'll cover it in future posts, on processing your QA data. In my opinion, for catching problems the dedicated routines are preferable for both their specificity, sensitivity and speed. The EPI test data can then be analyzed to confirm that all is well, rather than as the primary way to detect problems. I see this as an overlap between Facility QA and Study QA, so I'll revisit it in later posts.




6 comments:

  1. Any chance you have an article on the measure you compute on the second scan, as you describe here? I noticed you never completed part 4. I'm a MRI tech at a busy research facility. We also have a Trio and I am looking to implement a more rigorous QA protocol. Thank you for your excellent blog.

    ReplyDelete
    Replies
    1. I'm afraid not. I made the protocol up, and trained my techs to do the same thing. It's a quarter-sized ROI in a particular slice of the mosaic view, with a similar-sized ROI in ghosts and one in background noise. Basically, all that really matters is that the procedure is repeatable easily.

      In the 4th post of the series, I was going to reference most of the fMRI QA articles that you'll find via PubMed, but to me these are "Study QA" because they are generally aimed at quantifying how well the scanner is performing, e.g. for multi-site data pooling, rather than trying to catch scanner issues as early as possible (my Facility QA).

      As for interpretation, I use my Facility QA by inspection, drawing on my experience. And if there's any question about how something is performing, I simply do more tests to either satisfy myself there's no problem, or flag it. All of which is why I've not yet got around to writing that last post in the series. It feels like trying to describe hitting a tennis ball!

      Let me know where you are in your QA planning, perhaps I can shoot a quick video of our QA analysis in action.

      Delete
    2. Thank you for the quick reply. My current QA is: I have access to the Siemens battery of QA tests, and I run those regularly, including SNR, brightness, long term stability, RFNoise, and spike checks. Typically I check for changes or patterns in the values over time in addition to checking that they pass the manufacturer thresholds. I also regularly check the backgrounds of scans for typical RF patterns and spikes, as you have described in other articles. I have caught a few issues this way. We are also tracked as part of multi-site studies with monthly phantoms. But, as I said, I am looking for something more rigorous, especially as our system ages. I would rather establish my own baseline and thresholds now.

      The Siemens LTS seems to approximate what you describe, but the values do vary quite a bit from day to day and I would like to be able to create my own measurements. Your method is what I had envisioned for analysis. I can likely create a script for that. Are you using MATLAB for this? I am running the fBIRN right now and implemented a protocol as you describe with 3 fMRI runs. I plan to run this every Monday with our 12-channel (fBIRN does not fit in the 32, but I can still run the standard tests on that one - or I run the same protocol on a bottle), in addition to the other tests I have already been running.

      So where I am at is collecting my first custom baseline and now looking to create a script to automate the analysis so I can track this longitudinally. I don't have a custom holder for the fBIRN, but have marked it with indicators to aid in reproducibility.

      Delete
    3. My protocol is based on EPI because most of what we do is fMRI, with supporting scans. Also, EPI tests the gradients and the overall system thermal stability pretty well. The RF isn't driven all that hard, so I sometimes miss issues with the RF amplifier which first show up in PCASL. So, my first suggestion is to think about which sort of scan(s) to add into your Siemens LTS mix; over and above their measures.

      We don't have scripts for anything, it's all done by hand and evaluated by eye. We just use a spreadsheet, then I can create plots of any time window I fancy. This is far from ideal, but I enjoy looking for problems and don't particularly trust any automated measure not to either miss subtle issues - an algorithm is only as good as the programmer has made it - or, if I set trigger points as low as I might like, I end up having to investigate normal performance anyway. The real time is in acquiring the data, the analysis rapid by comparison. Scripts become essential, of course, if you are trying to standardize your QA for a reason other than simply catching problems, e.g. normalizing data over time. (Then they become Study QA scripts!)

      Another important point: select your phantom carefully. I have an FBIRN phantom but it is aging and starting to show some shrinkage. That means I don't have an easily replaced signal standard. Bad. Thus, I opted to use one of the Siemens blue bottles because I have several of them, nominally identical, and I can expect to be able to replace them as many times as needed with very small variation. This approach is in conflict with many of the published QA approaches out there, that assume a phantom which approximates a head pretty well. I don't care about approximating anything, I simply want a fixed acquisition that will show changes in scanner performance. The bottles are perfect for this.

      Delete
    4. Very good points and thank you again. I didn't realize the FBIRN could shrink. We do quite a bit of PCASL as well, so I will add that to my tests. The same kind of analysis? How would RF amp issues present in a PCASL test?

      Delete
    5. My FBIRN phantom is now eight years old, and it's starting to show wrinkles. (Cracks and other interesting internal structure.) I don't mind for sequence testing; the features are useful. But not so good for long-term QA.

      The pseudo-continuous tagging is the most intricate RF scheme used on my scanner. (Highest SAR is probably a T2-FLAIR with SPACE readout.) So any time the control boards on the RF amp sag a little, we start to get failures of the PCASL sequence. I've not added PCASL to QA, but if I was to do so I would probably just run it on a standard phantom and assess the tag & control time series. (They should be identical.) It doesn't test the efficacy of the tag, but that's a slightly different question. So the evaluation for the PCASL data could be as for my standard EPI tests, for the EPI readout PCASL. (I only just received a stack-of-spirals 3D PCASL sequence, still setting that up.)

      Delete