Posted by: Chris Cole | September 20, 2015

To score, or not to score…? Suicide risk assessment in ED

On a dark and brooding autumn night, you once again find yourself healing the injured, curing the sick, and reassuring the worried well aboard the good ship TCH ED. Pale, second-hand moonlight spills over the land, which you only notice because you look up on your way past the annex beds when a small whirlwind sneaks in via the ambulance entrance, redistributing a handful of fallen leaves in your direction as an intrepid ACTAS crew seek to deposit yet another patient in the department. You quietly wonder about the current odds they have running on how quickly they can push us to official overcrowding threshold tonight. As you contemplate the attractive ochre hue of the leaves, you overhear the ambos giving handover to the harried-looking triage nurse. The patient is apparently a 34 year old woman who has been brought in at the behest of concerned family, after finding she had taken a bunch of various tablets, and penned a suicide note. She has two young children, and as you wander past, surreptitiously opening the lemonade Icy Pole you just stole from the freezer behind triage, you notice numerous old transverse linear scars on her forearms. You’re pretty sure from the quick list of tablets she’s taken, recited by the ambos to the triage nurse, that she’s in no real medical danger.

  • What are the chances that she might do this again?
  • What are the chances that she’ll actually kill herself, on a subsequent occasion?
  • How might you decide if she’ll be safe to send home, or if she needs admission to hospital?

As is often our wont in emergency medicine, what we’d like to do here is perform some effective risk stratification, to decide on the best course of action, both with regard to what we do now in ED, but predominantly what we need to do when it comes to disposition and follow-up. Ideally, this should take into account the needs and welfare of the patient, as well as taking a wider view encompassing resource utilisation and the subsequent opportunity cost to others (both within ED and at a hospital –wide level), along with staff safety and so on.

In the best of all possible worlds, our method of risk assessment would be evidence-based, applicable to the patient population we are actually dealing with, robust, and predictive of what is actually likely to happen to the patient in front of us. Do we live in such a world, you might ask? Does such a tool exist? As is often the case in our practice, your weapon of choice will boil down to a selection of gestalt, or a handful of varyingly useful scoring systems or clinical decision rules. Your armoury in this case consists more or less of:

  1.     Clinician gestalt / using The Force
  2.     SADPERSONS or Modified SADPERSONS score
  3.     Manchester Self-Harm Rule
  4.     ReACT Risk Assessment Tool
  5.     One of a number of bespoke single-centre locally derived tools that are not widely used but have a certain hipster appeal*

( *the use of these is restricted to those who live in Braddon, ride a fixie and have ironically styled facial hair )


All of the above methodologies rest on the assumption that there are a bunch of risk factors for repeated self-harm and completed suicide. More specifically, that there are readily and reproducibly identifiable aspects of the patient’s history and presentation today that yield a consistent likelihood ratio (LR) that, along with a knowledge of the pre-test probability of those outcomes, helps us calculate a post-test or posterior probability of badness that is hopefully more accurately predictive than simply consulting the nearest magic 8-ball.

Lists of risk factors deemed important vary from country to country, textbook to textbook, and can often depend on who you ask, the prevailing winds, and the current phase of the moon. In order of increasing reliability, these risk factors are derived by:

  • Revelation from a deity
  • Vertical memetic transmission (med school, textbook, local psych folk consensus)
  • Logistic regression performed retrospectively on data  from a very big sample representative of your ED patients
  • Prospective data from validation studies of a CDR (derived from the aforementioned logistic regression modelling) carried out on a very big sample representative of your ED patients

From amidst the nightmare that is the task of trying to identify the causative relationships amongst myriad interrelated and confounding variables that hint at, or veritably scream, correlation, some of the contenders for risk factors for repeated self-harm and subsequent completed suicide are…

From my medical school psychiatry textbook:

  • Being single / living alone
  • Being male
  • Depression
  • Insomnia (even in the absence of depression)
  • Substance abuse (EtOH and others)
  • Schizophrenia
  • Physical illness (especially if debilitating)
  • Family Hx of suicide
  • Previous attempts (50-80% of completed suicides have had a crack at it before)
  • Seriousness of previous attempt(s)
  • Recent bereavement
  • Unemployment or financial difficulties

From the current ACT Health SOP for Clinical Risk Assessment and Observation (MHAU, AMHU):

  • Suicidal now or Hx of previous suicide attempts or acts of self harm
  • Chronic suicidal thoughts, but with current intent
  • Aggression / violence
  • Delusions, particularly paranoid ones
  • Hallucinations, particularly voices telling them to harm
  • Hx of absconding
  • Poor adherence to medication programs
  • Substance abuse
  • Hx of inappropriate sexual behaviour
  • Cognitive impairment
  • Medical condition

Based on clinical assessment and using the above list as a guide, our mental health staff assign something called an At Risk Category (ARC) to the patient. They are numbered 1 to 4, but there are 5 of them because someone snuck a 2.5 in between 2 and 3 at some point. There are strong suspicions that attempts to renumber the lists 1-5 in all relevant documents using varyingly obsolete combinations of Microsoft Office and archaeologically significant versions of Internet Explorer most likely led to a nervous breakdown on the part of the administrative officer so tasked, and their subsequent admission to hospital themselves with an ARC of 2.5. The ARC level assigned to a patient is used to help determine who gets involved in the patient’s care, how urgently, and particularly the frequency and nature of observations required for that patient while in hospital. This is a little like the Australasian Triage Scale (ATS) categories as they relate to recommended waiting times, and is reproduced below:

ARC Level

Level of Risk

Description of Obs

Frequency of Obs


Low         General Once per shift


Low – Medium         Intermittent 1 hour


Medium         MHAU Obs 30 mins


Medium – High         Close 15 mins


High         Special Constantly stared at




This is the perennial favourite that most of us grew up with at medical school, or learned to know and love early in our careers as junior doctors. It comes in two flavours: Original and Hot & Spicy…um… “Modified”. The original score comprises 10 items worth 1 point each, and produces 3 tiers of risk: Low (0-4), Medium (5-6) and High (7-10). The modified version has both some slightly different criteria, and introduces weighting for some items (some being worth 1 point, and some 2 points), and also gives you a Low / Medium / High risk answer at the end of the day. You can check it out in all its publicly edited glory here:

A study by Bolton et al published in 2012 in the Journal of Clinical Psychiatry had a look at the clinical performance of this scoring system (both original and modified) for risk-stratifying people who present to ED with self-harm, to see whether it could usefully predict which of them were likely to do it again. Their sample was every patient presenting with self-harm to two tertiary EDs in Manitoba, Canada, from 2009-2010. They managed to collect 4,019 patients with self-harm, 566 of them deemed to have had bona fide crack at actually killing themselves, then followed them up at 6 months to see who’d had a repeat episode. Sensitivity of non-low-risk categorisation for the original and modified scores was 19.6% and 40% respectively, with PPVs of 5.3 and 7.4%. The Receiver Operating Characteristic (ROC) curve was almost a 45-degree straight line, with an area-under-the-curve (ROC-AUC) or c-statistic of a whopping 0.572. Keeping in mind that a coin toss yields an ROC-AUC of 0.500, this is not exactly a vote of confidence for the SADPERSONS risk assessment tool.


This is a risk assessment tool derived in… wait for it… the UK, with the hope of providing something more clinically useful than the SADPERSONS score. It is a simple 4-item list, and a positive response to any of the 4 items flags you as non-low-risk. The 4 criteria are:

  • History of self-harm
  • Previous psychiatric Rx
  • Current psychiatric Rx
  • Benzodiazepine use in this attempt

Cooper and company embarked on this quest to build a better mouse trap, publishing in 2006 in the Annals of Emergency Medicine. They used data from patients presenting to 3 large EDs as their derivation cohort, and applied that to a validation cohort from 2 other EDs, involving a total of 9,086 patients. For repetition of self-harm (including completed suicide), the sensitivity was 94% and specificity 25%.

Realising that prospective validation data is perhaps a little more solid, the same group published a 6-month follow-up in 2007 in the Emergency Medicine Journal, and this time also included a comparison between performance of the MSHR and clinical gestalt. Not much changed in terms of the rule’s functionality. Sensitivity, specificity, PPV and NPV were all about the same as they were in the original study. Interestingly, clinician gestalt had higher specificity at 38%, compared to ~26% for the MSHR.


Not satisfied with the craptacular performance of the then-available risk assessment tools, Steeg and his mates (including the Cooper of Manchester fame) set out to Find A Better Way™ , and in 2012 published a huge cohort study in Psychological Medicine, resulting in the ReACT Self-Harm Rule (RSHR). They collected data for 18,680 patients presenting with self-harm to 5 large EDs in the UK. Paying ancestral homage to their minor author (Cooper was listed last in the 2012 paper), they used a 4-question system as well:

  • Self harm in the last year
  • Live alone or homeless
  • Cutting involved
  • Current psychiatric Rx

This rule had a sensitivity of 95%, and specificity of 21% for predicting repeat self-harm or completed suicide in the next 12 months.


As well as the abovementioned scoring systems which have had their time on the red carpet of psychiatric academia, there are also a few other B-grade actors on the stage. Studies that some people in mental health might think look a bit familiar… they’ve seen them in something, yeah… but can’t quite remember their name, or what they’re doing now.

Bilen 2013Emergency Medicine Journal – This was a Swedish study involving 1,524 consecutive patients in a large tertiary ED. The Swedish chefs concocted their own special blend of logistically regressed risk factors and in a veritable My Kitchen Rules of acute psychiatry, pitted it head to head against the MSHR. Following up to see who had repeat self-harm (including completed suicide) at 6 months, they found that in their cohort of patients, the MSHR had sensitivity and specificity of 89% and 21% respectively, while the Swedish secret sauce produced not terribly dissimilar numbers at 90% and 18%.

Tran 2014BMC Psychiatry – These guys took a derivation cohort of 4,911 ED self-harm patients, measured the performance of clinician gestalt, cooked up a score based entirely on previous medical history (i.e. only data they could mine from a patient’s prior records; if they hadn’t presented to their hospital system before, their score could not be calculated) and then compared their performance on a validation cohort of 2,488 patients, looking for recurrence at 3 months follow-up. Clinical gestalt produced an underwhelming ROC-AUC of 0.58, and their medical records data mining algorithm managed a respectable c-statistic of 0.79.


Having toured the panoply of ostensibly potentially useful risk assessment tools at our disposal, I feel it’s time to add the eggs, or the Force, depending on your inclinations… that which binds it all together… and gives us some semblance of a pragmatic answer to the question: “So, what should I actually do with my patients in ED??”

In this instance, the special ingredient required to help pull something useful from the maelstrom is provided by the Australian Bureau of Statistics, in the form of data detailing the actual incidence of suicide in our population, enabling us to make reasonable estimations of pre-test probability, and thence gauge the utility of our risk assessment options.

In the most recent year for which complete statistics are available, 2012, the suicide rate in Australia, expressed in various ways (all per year) was:

Females 5.6   per 100,000 1 in 18,000 0.006 %
Males 16.8 per 100,000 1 in   6,000 0.017 %
Average 11.2 per 100,000 1 in   9,000 0.011 %


How does that compare to the risk of repeat episodes in patients who present to ED with self-harm? A systematic review of 90 studies conducted by Owens et al in 2002 and published in the British Journal of Psychiatry found the rate of repeated self-harm in the next twelve months was about 16%, while the rate of completed suicide in the same period was 2%. The 9-year mortality was around 7 %. A more recent review by Carroll et al published in PLOS One in 2014 found similar figures with a one-year fatality risk of 1.6 % and a 5-year mortality of 3.9 %. Furthermore, they also found that these recurrence rates (both for self-harm in general, and completed suicide) were consistent across a broad time period. A total of 177 studies from 1976 to 2012 were included, with no real difference seen between the morbidity and mortality statistics from the 1980’s versus the 2000’s.

So we have the following pieces of information:

  • The background risk of suicide (per year) is 1 in 9,000
  • The risk of suicide (per year) in ED patients with self-harm is 1 in 50-60
  • This is effectively a LR+ = 180 simply by virtue of turning up in our ED
  • Those who score high-risk on a stratification tool have a LR+ = 1.2 – 1.3 (PPV 2 %)
  • Those who score low-risk on a stratification tool have a LR- = 0.25 (NPV 95 %)

In the context of a patient presenting to ED with self-harm:

  • Without scoring them, the 1-yr risk of suicide is 2 %
  • Scoring non-low, the 1-yr risk of suicide is 2 %
  • Scoring low risk, the 1-yr risk of suicide is 0.5 %


Keep in mind that of patients presenting to ED with self-harm, the vast majority will not score low-risk on these tools. A generous estimate is that around 10% will do so. Let’s imagine 6 patients a day present to ED with self-harm:

  • 2,000 patients present per year to ED with self-harm
  • 40 of them will kill themselves in the next year
  • 200 of them will score as “low-risk”
  • 1 of those 200 will still die in the next year, compared to 1 in 9,000 in Australia

With such incredibly bad specificity, and subsequently an extremely poor PPV (i.e. these scores are absolutely useless as a rule-in test), we’re really only looking for a tool’s utility in identifying those who are safe (or at least safer) than the rest to discharge; we want to use it as a rule-out test, a bit like an XDP. Let’s take a simplistic approach and imagine that we will admit everyone that scores high risk, and discharge those who score low risk, and that admitting a patient is 50% effective in preventing completed suicide for the coming year:

  • We see 2,000 patients
  • 200 of them score low risk and are sent home
  • 1,800 score high risk and are admitted
  • We use 1,800 hospital beds
  • 18 admitted patients go on to die
  • 1 discharged patient dies

Now imagine we just admit everyone:

  • We see 2,000 patients
  • We use 2,000 hospital beds
  • 20 patients die

So, implementing a risk stratification tool lets us save 200 hospital beds and perhaps* 1 life-year per 2,000 patients we see, but 5% of the deaths that do occur will be in the group we sent home.

Now imagine mental health are absolute legends, and 90% of those admitted who were going to kill themselves survive at least a year. Using a scoring system:

  • We see 2,000 patients
  • 200 of them score low risk and are sent home
  • 1,800 score high risk and are admitted
  • We use 1,800 hospital beds
  • 4 admitted patients go on to die
  • 1 discharged patient dies

Versus just admitting everyone:

  • We see 2,000 patients
  • We use 2,000 hospital beds
  • 4 patients die

We’re still saving 200 beds, but now 20% of the deaths that do occur will be in the bunch we sent home. Paradoxically, by implementing a scoring system as a rule-out test in this way, we have also managed to kill an extra patient per year!! Yep, really. In the immortal words of some beloved 1980’s cultural icons… “I love this plan. I’m excited to be a part of it. Let’s do it!!”  …or… maybe not.

“Bollocks!” I hear you cry. “There is”, you say, “an error in his maths!!”. Well, maybe, but if you don’t believe me, extend this little gedankenexperiment to examine the boundary condition of mental health inpatient admission being 100% effective for preventing completed suicide in the next year. If we admit everyone, no-one dies. If we employ a score and discharge the 200 low-risk punters, 1 of them still dies because there is a small but non-negligible risk of a false negative and we miscategorised the poor bugger. This is essentially the guy you discharged after a negative XDP who then died of the PE you missed.

This illustrates why simple statistical characteristics of a test or a rule should never be considered in isolation. In the same way that sensitivity, specificity and LR inform our decision making, but are much, much more useful once we know the prevalence or pre-test probability (giving us the more clinically relevant PPV and NPV), it’s worth remembering that the ultimate utility of a test, or scoring system in this case, also depends on externalities that are often overlooked. In this particular case, while the basic characteristics of sensitivity and NPV look promising as a rule-out test, the fact is that the more effective the treatment for the pathology is, the worse the test performs at the end of the day. This would be of only academic interest, but in this instance, the tipping point of doing more harm than good (ie killing more patients than we would by simply declaring all-comers high-risk) falls well within the spectrum of reasonable real-world approximations for the externalities involved.


  • The population risk of suicide in the coming year is about 1 in 9,000
  • The risk in punters with self-harm seen in ED is 180x that, around 1 in 50 (2 %)
  • Scoring high risk on a stratification tool has zero predictive value; still 1 in 50 (2 %)
  • Scoring low risk lowers that to 1 in 200 (0.5 %)
  • Very few of our patients score low-risk, which means:
    • We save very few, none, or indeed lose more lives by using a scoring tool depending on how effective inpatient mental health treatment is
    • We reduce hospital bed/admission utilisation by around 10% vs an “admit all” strategy



  • The biggest risk factor for completed suicide in the next year is the fact the patient is in your ED now with self-harm or ideation. LR = 180
  • No scoring system can predict who is at higher risk than this baseline 1 in 50 risk
  • Using a scoring system can reduce inpatient bed use by a small amount but has minimal impact, and possible deleterious effect, on morbidity and mortality in this population


(*Footnote: By using a scoring tool to exclude the lower risk patients from admission, the admitted group necessarily have a higher risk of completed suicide than the undifferentiated all-comers group, with concomitantly more deaths in the admitted cohort. I rounded numbers to keep it neat here, but the stated benefit of saving 1 life per year or 2,000 patients is generous, as there will be more inpatient group deaths to offset that saving in the low-risk group. That is, implementing a scoring tool even when the externalities (effectiveness of inpatient psychiatric intervention) are conducive to making the tool look good, doesn’t).

Early last year the folks at Boehringer-Ingelheim (who manufacture and sell tenecteplase) sponsored the STREAM trial, which I have previously discussed in detail here. Recapping the greatest hits:

  • n = 939 + 943 randomised interventional trial
  • Inclusion = Adults with STEMI presenting < 3 hrs but can’t get to PCI < 1 hr
  • Group 1 = tPA immediately and then delayed PCI (median 17 hrs)
  • Group 2 = PCI as soon as possible (median 3 hrs)
  • Outcome measure = cardiac death or MACE at 30 days
  • No difference in these outcomes between the two groups
  • 5 x rate of ICH in tPA group
  • 2 x rate of CABG required in tPA group
  • 36% of tPA group required immediate rescue PCI in order to achieve the “equal” outcomes

The one-year follow-up looks at mortality from cardiac and all causes a bit further down the track:

  • Follow-up was pretty damn good – 99.2% and 99.3% of patients in each group were included
  • All-cause mortality = 6.7% (tPA) vs 5.9% (PCI)
  • Cardiac mortality = 4.0% (tPA) vs 4.1% (PCI)
  • Altogether only 63 people died in the follow-up period
  • 42 of them died of non-cardiac causes – 25 (tPA) vs 17 (PCI)
  • In the 30-day to 1-year period, non-cardiac deaths were 13 (tPA) vs 7 (PCI)
  • 9 died of stroke or ICH (tPA) vs 4 (PCI)
  • 5% absolute increase in risk of death from ICH if over 75 yrs old & got tPA*

That last point is interesting. During the early stages of the trial, they noticed an excess mortality from ICH in patients > 75 yrs of age in the tPA group. After the first ~20% of patients were enrolled, they amended their protocol such that those people > 75 yrs old received a half-dose of tPA instead of the standard dose. The difference was about 5% (absolute mortality risk, not relative) and I have included the survival curves for “before and after” the protocol amendement below:


So if you’re over 75, have a STEMI and someone gives you tPA for it, you’re over twice as likely (10% vs 5%) to drop dead of an ICH if they don’t adjust the dose. Handy to know.

At the end of the day, mortality from cardiac causes was indeed pretty much exactly the same, though it’s worth remembering, again, that 36% of the tPA group required urgent rescue PCI to reach this equivalency of outcome.

It’s the mortality from non-cardiac causes that catches my attention, however. There is very little attention drawn to it in the paper; it is reported (briefly) in the results section, but the authors focus much more on the all-cause, overall mortality (which was closer to equal, though there was a trend for greater mortality in the tPA group) and cardiac mortality (which was the same). Far be it from me to suggest the presence of any impropriety or bias, but…

Screen Shot 2014-09-16 at 10.31.49 am

Overlooked as they are by the authors, pulling the actual numbers for non-cardiac deaths out of the paper is instructive:

  • Non-cardiac deaths in the tPA group = 25+13 = 38 / 944 patients = 4.03%
  • Non-cardiac deaths in the PCI group = 17 + 7 = 24 / 948 patients = 2.53 %

This equates to:

  • 40 deaths per 1,000 patients in the tPA group
  • 25 deaths per 1,000 patients in the PCI group

Given the power of this study, that is a significant difference. Statistically significant, and therefore likely to be “real”. And, subsequently and far more importantly, clinically significant / relevant.We hang our metaphorical hats on treatment effects and adverse effects of that magnitude (and smaller) every day (have a look at the size of the benefit margin for TXA in trauma, tPA in STEMI (where PCI is unavailable), aspirin in cardio/cerebrovascular disease, etc… you may or may not be surprised).


For a remote STEMI patient that can get to the cath lab within around 3 hours:

  • No additional benefit in thrombolysing them before/during transport
  • Thrombolysis has only a 74% chance of achieving reperfusion compared to PCI
  • If you do thrombolyse the old folk > 75 yrs, halve the dose or cop a 200% increase in risk of lethal ICH
  • Thrombolysing & delaying PCI results in 15 excess non-cardiac deaths per 1,000 patients treated at 1 year

The original (30 day follow-up) STREAM trial can be found in the NEJM here:

The latest (1 year follow-up) paper can be found in Circulation here:

Posted by: Chris Cole | July 24, 2014

SAH WARS – Episode V – The Red Cells Strike Back


A long time ago in an ED far, far away…


Welcome to another thrilling chapter of the SAH saga!  For those that haven’t been keeping up, here’s the story so far:

  • Clinical decision rules for SAH really suck.
  • Plain CT-brain is really good at finding SAHs.
  • We currently tend to do LPs on patients with headaches suspicious for SAH but who have a negative plain CT-brain.
  • The CSF is analysed for cells, and for xanthochromia.
  • The method used for detecting xanthochromia (visual inspection) is extremely unreliable.
  • Spectrophotometry for xanthochromia would be much better, our lab doesn’t do it, but will if we really push for it.
  • CT-angiogram (CTA)  is really good at finding cerebral aneurysms.
  • CT + CTA leads to, at worst, no more extra or unnecessary DSAs (and subsequent harm) than CT + LP.
  • CT + LP misses at least 4 times as many actual SAHs as CT + CTA.

So, doing an LP for the purposes of trying to detect the presence or absence of xanthochromia, by visual inspection, is not supported by currently available evidence, and is actively derided and advised against both by reputable researchers in the field, and in official guidelines. (See my previous post for details). But, the minions in the laboratory don’t just wave the tube in front of a light and then shake their Magic 8-Ball to decide whether to report it as yellow or not… oh no… they also, quite helpfully, have a look at the CSF under the microscope, and promptly report a RBC and WBC count.

Dr HeWhoShallnotBeNamed (awesome ED boss)  has pointed out that even if one is not a fan of xanthochromia, perhaps the RBC count in the CSF can assist in making or refuting the diagnosis of SAH. And, let’s face it, it’s pretty hard to argue with the logic of looking for blood in the subarachnoid space to answer the question “Is there blood in the subarachnoid space?”.

But, while we may not quite argue with that logic, we shall, I’m afraid, embark on a quest to at least sit down with it over a coffee and have a bit of a robust chat about the situation.

The primary dilemma here is that not all CSF red blood cells are created equal. Some of them were already there when we turned up, and some of them are interlopers, introduced by way of our need to jam a nice long needle through several layers of the patient’s back in order to get to their CSF. In terms of helping make the diagnosis of SAH, it’s the CNS RBCs with native title that we’re after, not the ones that rode in via our needle at the 11th hour. So how do we distinguish between a traumatic tap and a bona fide SAH?

Before delving into the murky waters of looking at the “falling cell count” method of comparing the number of RBCs in the first and last tubes, I want to take a simplistic approach to the situation, in order to outline the pertinent real-world numbers that will form the basis of a more nuanced assessment that takes into account evidence showing that Dr HeWhoShallNotBeNamed is right; the RBC analysis does  make a difference to our diagnostic process. But is that difference big enough? Let’s find out…

Dichotomous RBC count approach

So what happens if we decide that < 5 RBCs in the CSF is negative for SAH, and anything greater than that is a possible SAH that requires further investigation? (This is the normal range / upper limit used in clinical laboratories):

  • 1,000 patients present to ED with a headache suspicious for SAH
  • 100 of them have a SAH (prevalence / pre-test probability in this population is ~10%)
  • 98 of them (at least) will have a +ve plain CT and head off down the DSA/neurosurgery rabbithole
  •  902 of them have a -ve plain CT and go on to have an LP
  • 2 of them have a SAH
  • 180 of them have a traumatic tap performed (typical rate of ~20%)

So out of the ~182 people with a positive LP (based on RBC count), 2 of them really have a SAH, and 180 of them have a false positive due to traumatic tap. This gives a PPV of 2/182 = 1.1%. What does that mean? It means you have to stab around 500 people in the back to find 1 SAH, and in the process you’ll misdiagnose 99 of them with a SAH and send them off for a DSA they don’t need.

As advertised, this is an overly-simplistic approach, but serves to highlight the ballpark figures to be found when using this investigation in a real ED population. Unless we can do something miraculous to improve the situation by considering the specifics of the RBCs we find, this is very likely, a priori,  to be a fairly crappy test for our purposes.

So… are you feeling lucky?  Let’s look for that miracle…

Quantifying the RBC count and the clearance between CSF tubes

There are essentially two parameters considered to be potentially useful in determining whether the blood you found in the CSF represents a SAH, or a bad day with the needle:

  • Absolute RBC count (ARC)
  • Falling RBC count (FRC)

The rationale for the ARC is that one might sensibly expect the RBC count to be higher if a bloody great cerebral aneurysm has just gone KABOOM and pumped a reasonable quantity of blood into the subarachnoid space before anything started to clot, in comparison to the relatively small quantity of blood that might eek its way in along your needle tract.

The logic of the FRC as a discriminator is that the influx of RBCs in a traumatic tap should not be ongoing, and the initial flow of CSF should flush most of the blood out of your needle into the first tube, with subsequently lower RBC counts in the later tubes, particularly the last one (which may be the 3rd or 4th, depending on where you work and which kit you opened). Most commentators suggest a drop of 25 or 30% between the first and last tubes is indicative, or at least strongly suggestive, of a traumatic tap as the source of the RBCs.


The best analysis of the predictive power of the ARC comes from a 2013 study by Amanda Czuczman et al. By virtue of the wonderfully capitalistic private medical system which prevails in the Commonwealth of Massachusetts, the authors were able to retrospectively identify 4,496 consecutive ED patients who’d been billed for an LP, then sift through those to narrow it down to those presenting with headache to the ED who were investigated for SAH. This left them with 280 patients, 26 of whom had a SAH. They had a look at the RBC count in the last tube and found a relationship that is most succinctly tabulated as follows:

  • 0 < RBCs < 100              LR = 0
  • 100 < RBCs < 10,000      LR = 1.6
  • RBCs > 10,000               LR = 6.3

[ Czuczman A et al. 2013 Acad Emerg Med 20(3):247-56. ]

Another group, led by Julie Gorchynski, looked at the same thing back in 2007 in sunny California. They identified 299 ED patients with suspicious headaches who were worked up for possible SAH, had negative plain CT scans, and copped an LP for their trouble. These comprised 288 cases which were finally determined to be traumatic taps, and 11 real SAHs.

Mean RBC counts for tube 1 were:

  • 6,763 for traumatic taps
  • 399,277 for SAH

Mean RBC counts for tube 4 were:

  • 443 for traumatic taps
  • 307,700 for SAH
That looks pretty handy as a discriminator, but keep in mind these are average statistics for the two groups. Looking at the individual breakdown, while a (very) strong trend remains, there is enough overlap to give one pause for thought. For example, 18% of SAHs had a RBC count < 5,000 in tube 4. No patients with traumatic tap had a RBC > 10,000 in this study, but 27% of SAHs had a RBC count < 10,000.
The moral of the story is that a very high RBC count (> 10,000) is probably useful as a rule-in test, but using that threshold as a cut-off to rule out SAH is not an option, as you’ll miss ~25% of them. Delving back into the data from the Czuczman 2013 paper, we also find that 16/194 = 8% of traumatic taps had RBC counts > 10,000, so even wielding the ARC as a “rule-in” test has distinct limitations. The Czuczman study did demonstrate a fairly convincing LR of 6.3 for >10,000 RBCs in the final tube, though, and the predictive power of finding a whole bunch of blood in the CSF cannot and should not be dismissed.


Traditionally it was taught that a drop of more than ~25% in the RBC count between the first and last tubes is strongly suggestive of a traumatic tap, while a more consistent RBC count supports the diagnosis of SAH. it should be noted that this is essentially a consensus opinion, and not based on any actual experimental evidence. Smaller case series exist, for example a paper by Heasley et al. in 2005 published in the American Journal of Neuroradiology reviewed 22 cases of CT-negative severe headaches, found 8 of them had a SAH, and 2 of those SAH patients had >25% clearing of RBCs between the first and last tubes. The two papers I have chosen to cite in this article are the first to properly investigate this question in a rigorous manner, hence their inclusion. The 2013 paper by Czuczman’s group is larger, more rigorous and both designed and reported in a more ED clinican orientated manner, and consequently the most applicable to our clinical practice.

The Gorchynski group’s results are still informative, however. They found that the average fall in RBC between the first and last tubes was 82% in those with traumatic taps, versus 9% in those with SAHs. Again, it must be kept in mind that these are averages, the numbers involved are small, and essentially when trying to make generalised conclusions from data like this, we’re effectively attempting to fit a line or curve through a bunch of sparse, and sometimes very outlying, points. They did find that none of their 11 SAH patients had > 30% RBC clearance across tubes.

The Czuczman group found no useful predictive power in looking at the absolute drop in RBCs across tubes (the ROC curve is a 45 degree straight line, and a perfect example of a completely useless or “coin toss” test). The % drop across tubes, however, was quite predictive, though again not perfect. The sweet spot (maximum diagnostic performance for a dichotomous cut-off) was at 63%, giving an ROC AUC of 0.84 (not too bad at all). The likelihood ratio (LR) for SAH if you had < 63% RBC clearance across tubes was 3.6 (95% CI 2.7-4.7), and only about 0.1 if you had > 63% clearance. This becomes even more impressive, and useful, if one gets excited and combines the absolute number of RBCs with the % drop across tubes, yielding a LR of 24 (7-82) if the patient has both > 10,000 RBCs in the last tube, and a <63% drop in RBCs across tubes. This constitutes what pathologists refer to in private as a Pretty-Good-Test(tm).


Okay, so we have this apparently highly predictive test for SAH that requires an LP which, admit it, is kinda fun to do (“Hello everyone. My name is Chris, and I’m a procedure junkie”…). This is AWESOME! (Bonus points if you have seen The Lego Movie and can make it through this paragraph without singing “Everything Is Awesommmmme!” in your head). But alas, as usual, medicine (as with life) is not always as simple as we would like it to be. At the risk of inducing narcoleptic tendencies in those who have made it this far, I feel it is worth re-iterating some basic facts about what it is we do all day when we’re on the floor. We gather information about our patients, and use that information, often at a subconscious level, to modify and refine the probabilities we mentally assign to each of the differential diagnoses we are considering for the patient in question. We often don’t think about this process explicitly, or at least not with numbers attached, with the exception of certain diagnostic pathways, and the Wells/Geneva-XDP-scan decision flowchart is probably the best example of this. While I am not advocating replacing us all with computers implementing a Bayesian approach to diagnosis, there are times when our off-the-cuff generic vibe about what a test (or an examination finding) means can be very different to what it really does mean in real life. In such circumstances, sitting down and explicitly nutting out the numbers can be quite informative. This is one of those times.

Super-dooper important take-home point:

  • Sensitivity, specificity and LR (derived purely from the former two values) are intrinsic properties of the test
  • PPV and NPV are what we need to know about clinically, and they depend on all of the above plus the pre-test probability or prevalence
  • Sometimes an intrinsically crap test can still be a clinically useful discriminator
  • Sometimes an intrinsically good test can be a lousy disciminator when applied clinically
You may recall earlier I did some back of the envelope calculations with a simplified dichotomised RBC count discriminator for diagnosing a SAH on LP. It had a pretty bad PPV (~1%). Let’s see what happens when we apply analysis of RBC count in the last tube, and the % clearance across tubes to our ED population presenting with suspicious headaches, both before and after CT. I’m going to use the LRs from the Czuczman study, because they’re the best we have.
When the CT scanner is broken, or you’re in Cunnamulla…
  • 1,000 people present with a headache
  • 100 of them have a SAH
  • You LP everyone, coz that’s just how you roll
  • 20 of them will have a mixture of SAH RBCs + TT RBCs
  • 80 of them will have pure SAH RBCs and a clean tap
  • 180 of them have only TT RBCs from your dodgy tap
  • 720 of them have no RBCs at all


You apply the criteria of (a) >10,000 RBCs in the final tube, and (b) <63% clearance across tubes to the bloody taps:

  • A maximally positive result (>10,000 RBCs and <63% clearance) = 73% probability they really have a SAH*
  • A maximally negative results (neither of the above present) = 4% probability they have a SAH*
*you may subtract marks for not showing my working, but this post is long enough… you don’t want to see the maths… trust me.
So in the setting of no available imaging, an LP is a very very useful test to do, as it dramatically alters the probability that your patient has a SAH, and will clearly change your further investigation and management of the patient.
But what about when you’re rocking the TCH ED with a 64-slice scanner, and a backup one, just down the corridor??
  • 1,000 people present with a headache
  • 100 of them have a SAH
  • 98 of them have a +ve CT and your work here is done, grasshopper
  • 902 have a negative CT and you crack on and LP them
  • 2 of them actually have a SAH
  • 180 have a traumatic tap
Of the LPs you do on this group of 902 patients:
  • 2 will have RBCs due purely to SAH
  • 1 will have a mixture of SAH and traumatic RBCs
  • 180 will have only traumatic RBCs
  • 720 will have nothing on microscopy
And your mission, should you choose to accept it, is to tease out which of the 183 bloody CSF samples represent a real live SAH. Applying the diagnostic criteria to this group of patients yields:
  • A maximally positive result = 5% probability they really have a SAH
  • A maximally negative result = a problematic result due to the lower bounds of the statistics involved, but near enough to zero to be pretty much zero.

SUMMARY – Crunching the numbers for Real Life(tm)

If we LP everyone who has a negative plain CT-brain in ED:

  •  To find 1 SAH…
  • We send 20 people for a DSA, because they had RBCs that met the +ve criteria outlined above…
  • From 333 people who had any blood at all in their CSF.
  • We had to do 1,667 LPs to find them…
  • And we will cause 1 stroke in the process, because DSA has risks.
So, basically you have to ask yourself if you think it’s reasonable to do 1,667 LPs to find 1 SAH, to cause 1 stroke for every SAH you find, as well as causing an undetermined amount of morbidity (pain, infection, post-LP headache that might need a blood patch, etc). There is also the logistic consideration of tying up ED medical staff and other resources. Not all LPs are easy. Perhaps 50% of the time, a senior clinician will have to step in to do it, and one has to wonder if the resources allocated per diagnosis made or life saved is a reasonable deployment of already overstretched departmental assets.
Again I should stress that life is always more complex than we’d like it to be. If you are chasing or excluding other likely diagnoses, and you want CSF cultures, opening pressures and so on, then clearly an LP is the way to go. But if your only concern if finding a SAH, it is difficult to justify tying up significant departmental resources performing thousands of painful, invasive procedures, laden with its own inherent risks, as well as consigning 19 healthy patients to a more invasive and much riskier procedure, to find that one person in a haystack.

SUMMARY OF THE SUMMARY (because we’re ED folk with short attention spans)

If no-one remembers anything else from the ramblings above, please take away this. As counter-intuitive as it may seem, it is, nonetheless, true:
  • A patient presenting to ED with a suspicious headache has around a 10% chance of having a SAH.
  • The same patient with a negative CT-brain, with an LP positive on both RBC criteria has around a 5% chance of SAH.
  • The patient with a negative CT alone has < 2% chance of a SAH.
  • The patient with a negative CT + CTA has < 0.01 % chance of SAH.

SUMMARY OF THE SUMMARY OF THE SUMMARY (for those ED folk who are < 1 hour post-coffee)

  •  Your clinical gestalt with no tests is better at predicting SAH than an LP in a CT-negative patient is.
  • CT + CTA is wayyyyy better than you are
While this topic could be said to almost rival thrombolysis of acute ischaemic stroke in the “Been Done To Death” stakes, I have nonetheless decided that it would be worthwhile to take a cook’s tour of the facts and evidence available around this contentious area of practice, and attempt to piece together what can at times be a seemingly disparate collection of apparently only loosely related jigsaw-like fragments, into some semblance of a unified, logical approach to the whole messy shebang.
Or something like that.
Firstly, a big shout out to Dr Archambeau and Dr Ghani for their recent insightful pieces on aspects of the same topic (links removed as they are to article on a private departmental website and not accessible to the great unwashed masses).
While there will be some inevitable overlap, I’m mostly concerned with the specific issues surrounding the use of LP in diagnosing or excluding SAH, but if I tread on any toes, I offer my humblest apologies, and I promise I wasn’t setting out to create a demarcation dispute.

Some background pathophysiology / epidemiology

Around 1 in 40 adults have a cerebral circulation arterial aneurysm in their head:
  • 90 % of these are saccular or “Berry” type aneurysms
  • 10 % are fusiform
  • Distribution is roughly  30% PCOM,  25% ACOM, and 20% MCA

Around 1 in 10 adults we see in ED with headache and investigate for SAH actually have a SAH.


For folks with a SAH:


  • 85 % are aneurysmal SAH
  • 10% are perimesencephalic SAH
  • 5 % are other random stuff (AVM, neoplasm, inflammatory, acute malignant HTN (cocaine, etc.))
Perimesencephalic SAH is a very different beast to an aneurysmal SAH:
  • Basically, they do well.
  • One study suggests 3% lifetime risk of re-bleed (what this means when the bleed causes no morbidity anyway is unclear).
  • One 22 year longitudinal follow-up study showed no difference in mortality or life expectancy between PMSAH patients and the general population (i.e. their ongoing risk of badness is the same as the background population risk).
  • Some groups suggest a repeat angiogram to “find the missed aneurysm” but there is no evidence to back this viewpoint.

Imaging in suspected SAH

Vanilla CT

 There is much acrimony and disgruntlement about the varyingly quoted sensitivities of plain, non-contrast CT-brain for detecting or excluding SAH. Aline has covered this in some detail, so I’m not going to delve into it to any great depth. Suffice to say the major differences in opinion stem largely from 4 factors:
  • Different generations of scanner have different diagnostic performance (i.e. newer / more-slices = better)
  • Your mileage varies depending on who you are scanning, and the inclusion/exclusion criteria varies somewhat between different (particularly older) studies. This is a function of variable pre-test probability.
  • Most studies use CSF analysis as the gold standard against which to judge CT. For reasons that will become apparent in but a few scrolls downwards, this is perhaps not the most well-founded of ideas.
  • For the borderline or subtle bleeds, the accuracy of the scan is highly dependent on who’s doing the reporting. There are several articles floating about from the Journal of the Blindingly Freaking Obvious (JBFO) demonstrating that experienced subspecialist neuroradiologists perform better than general radiologists, who perform better than emergency physicians (though, at the risk of being a bit parochial, we’re actually not that much worse) when diagnosing SAH on plain CT.
With modern 64-slice or better plain CT, the numbers are none too shabby:
Backes D et al. 2012 Stroke 2012;43(8):2115-9.
  • n = 137 with CSF spectrophotometry as the diagnostic gold standard
  • Sensitivity of plain CT < 12 hours = 98.5%
  • Captured all aneurysms + perimesencephalic bleeds but missed one AVM
  • Sensitivity > 6 hrs = 90%
  • Sensitivity < 6 hrs when only including those who presented with headache = 100%
Perry J et al. 2011 BMJ 2011;343:d4277
  • n = 3,132 patients presenting to ED with sudden onset (< 1 hr) headache who had a CT
  • Sensitivity 92.9%, NPV 99.4% for all-comers irrespective of time of onset / delay to scan
  • n = 953 had their CT at < 6hrs post-onset of headache with n = 121 of them having SAH
  • Sensitivity/specificity/NPV/PPV = 100%

CT with The Lot (CT Cerebral Angiogram)

You will all be familiar with the radiographer’s caveat of:  ”If it’s ok, we’ll leave the patient on the table until [Radiology Registrar de jour] has a look at the scan and decides if they want to do a CTA”. This is eminently sensible, as if the plain CT is positive for subarachnoid blood, it is a handy thing indeed to know where, and how big, that pesky aneurysm is. As noted above, however, 15% of the time we won’t find one, because they’re having a non-aneurysmal SAH (or the scan is a false positive, which is exceedingly rare, but does happen). But how good is CTA at finding aneurysms? As it turns out, pretty damn good:
Li Q et al. 2013 Acta Radiol 2013 Sep 24 (Epub preprint)
  • n = 118 patients with a positive DSA (digital subtraction angiogram)
  • There were 145 aneurysms in those 118 patients
  • CTA found 96% of all aneurysms,  but found at least 1 aneurysm  in every one of the 118 patients
  • That is, no patients were missed
  • “Missed” aneurysms were small and non-contributory
Lim LK et al. 2014 J Clin Neurosci 21(1);191-3.
  • n = 63 patients with SAH proven on CSF & DSA
  • NPV of CTA was 98%
Zhang H et al. 2012 J Neuroimaging Dec 10 (Epub preprint)
  • n = 84 aneurysms in 71 patients
  • 64-slice CTA had sensitivity 97.6%, NPV 95.1%
Wang H et al. 2013 Clin Radiol 2013;68(1)e15-20.
  • n = 54 aneurysms in 52 patients had 320-slice CTA + DSA
  • Sensitivity 96.3%, Specificity 100%


Digital Subtraction Angiography (DSA)

This is essentially the imaging-based gold standard for finding aneurysms. Like most other good old-fashioned interventional angiographic procedures, however, it is not without its risks, and they are not negligible. This is very important to keep in mind when considering your choice of first and second line investigations for SAH, because a positive result on any of them will consign your patient to a DSA:
Hankey GJ et al. 1990 Stroke 1990;21:209-222.
  • Risk of TIA/stroke 4%
  • Risk of permanent disabling CVA 1%
  • Mortality 0.06%

LP and CSF analysis in suspected SAH

So, you suspect the guy who just rocked up to ED with his “worst headache everrrrr, dude!” of sudden onset might just have a SAH. (By the way, some 70-80% of non-frequent-flyer headaches seen in ED will describe this as their worst headache ever; that’s why they came to ED. It’s not a particularly good discriminator). You mutter to yourself a bit, bite the bullet and order a non-contrast CT-brain. Which looks normal to you. And to the radiology registrar/consultant. Bollocks. Being a good, medicolegally defensive, guideline-adherent emergency clinician, you decide to push on and do an LP, to fully exclude SAH. Because everybody knows that CSF analysis is the absolute rootin-tootin’, uber-reliable gold standard way to definitively find or exclude 100% of SAHs.
Yep. And I have a bloody awesome collection of bridges to sell you. If you’d just step this way…
There are currently THREE ways one can analyse CSF to determine whether the patient has any subarachnoid blood in their head:
  1. Visual inspection for xanthochromia
  2. Spectrophotometric analysis for xanthochromia (i.e. measuring the extinction/absorption due to bilirubin in the CSF sample)
  3. Quantitative analysis of the ferritin concentration
Visual inspection involves holding the CSF tube that you sent to the lab up to a white light, or against a well-lit sheet of white paper, turning to someone else in the lab and asking “Hey, Bill! Waddya reckon…. it is yellow… ish?”. Sadly, intrepid reader, this is not an exaggeration. I shit you not. This is exactly what our laboratory (and almost every clinical chemistry laboratory in Australia) currently does when we send them a CSF sample from a patient with suspected SAH. Which, to be fair, would be totally cool, if visual inspection for xanthochromia was a reliable test. But it is not:
Marshman et al. 2014 Neurosurgery 74(4):395-9.
  • n = 26  mock CSF samples (clear stuff with known concentrations of bilirubin corresponding to range of values seen in real patients)
  • Clear samples misclassified 22% of the time
  • Yellow samples 29% of the time
  • 75% agreement in 46% of tests
  • 90% agreement in only 39% of tests
  • 88% of yellow specimens were called as clear
Dupont et al. 2008 Mayo Clinic Proc. 2008; 82(12):1326-31.
  • n = 152, retrospective records trawling
  • Visual xanthochromia in 18/152 (12%)
  • Aneurysms in 13/18 (72%)
  • 1/18 (5.6%) patient had an aneurysm and negative visual xanthochromia
  • 5/18 (28%) had a DSA when they didn’t have an aneurysm because their CSF was visually “positive” for xanthochromia
Arora et al. 2010 J Emerg Med. 2010;39(1):13-16.
  • n = 81 patients with headache suspicious for SAH
  • n = 19 with aneurysmal SAH
  • Sensitivity of visual inspection for SAH = 9/19 = 47%
Linn et al. 2005 J Neurol Neurosurg Psychiatry 2005;76:1452-54.
  • n = 101 clinicans + students evaluating set concentrations of bilirubin
  • Arbitrary cut-off of 0.05 AU used to represent a “true positive”
  • Choice of cut-off was retrospective.
  • 100/101 got it right when bilirubin > 0.05 AU
  • 69% reported 0.02 AU as colourless
  • Post-hoc setting of cut-off to give the best looking results (dodgy as hell)
Wallace AN et al. 2013 Stroke 44(6):1729-31.
  • n = 57 patients with negative CT, positive LP who then got DSA
  • Only 2 of them had aneurysms (3.5%)
  • Older review(s) suggest best case scenario 53% true positives
Using visual inspection to detect xanthochromia has problems in both directions. It has poor sensitivity and poor specificity, thus leading to missed SAHs, as well as over-investigation (DSA), with its attendant risks, due to false positives. In terms of diagnostic performance as an assay, it represents the worst of both worlds. So why the hell do perfectly smart and usually sensible chemical pathologists choose to use visual inspection for xanthochromia as their official assay? Maybe the guidelines of their peak professional bodies recommend it? Actually, no:
UK National Guildeines for analysis of CSF in suspected SAH
UK National External Quality Assessment Scheme for Immunochemistry Working Group
“…..Always use spectrophotometry in preference to visual inspection.”
Beetham 2009 Scand Clin Lab Invest 2009; 69(1):1-7   Review article.
“…..Visual inspection of the CSF supernatant fluid for xanthochromia is insensitive and should not be used on any account.
To be fair to the lab, there are valid reasons for not wanting to do it properly (and let’s face it, that’s what it comes down to; visual inspection is an inaccurate cop-out and spectrophotometry is demonstrably superior) which essentially amount to a combination of two factors:
  1. Analytical interferences exist, caused by the overlapping absorption peaks of bilirubin and oxyhaemoglobin
  2. Deciding what method is best for compensating for that overlap, and choosing cut-offs and decision trees for most accurately reporting the clinical significance of the results obtained, given the clinical context of a suspected SAH, is tricky


The first part is straightforward; the peak absorption wavelength of oxygenated Hb in the CSF (whether due to traumatic tap or native SAH) unfortunately sits very damn close to the absorption peak for bilirubin, our analyte of interest. The second bit is somewhat more problematic, and while I could bore you absolutely senseless with the details, the bottom line is that while it is a non-trivial problem, analytic methods and calculations have been developed which are extremely robust and quite straightforward to implement with existing standard equipment. Even I can understand them. If you’re truly excited or desperate enough, please check out the current UK Guidelines for an exhaustive explanation, replete with exhilarating spectral diagrams and colourful flowcharts:
The TCH lab currently uses visual inspection for xanthochromia. The head of biochemistry has indicated he is amenable to (though far from enthusiastic about) developing and implementing a formal spectrophotometric method for CSF analysis to replace visual inspection. It is clear that this would only happen if ED clinicians really really want it, and even then the reporting flowchart/algorithm for determining what will be reported as “likely SAH / unlikely SAH / no freaking idea” will be largely up to ED (i.e. me) to develop. The official position is that biochemistry are aware that visual inspection sucks, and will start measuring it properly if pushed to it, but would rather not, as it will require a lot of fiddling to develop and adopt a robust methodology, and in conjunction with some ED clinicians (I may be the culprit here again) the biochemists optimistically wonder if we should be approaching the diagnosis of SAH in ED from a purely imaging-based perspective, anyway. (If anyone has an even bigger bee in their bonnet than I do about forcing the lab to adopt spectrophotometry for CSF analysis, let me know, but there is unlikely to be any significant action on it until the end of this year).
Quantitative CSF ferritin measurement is the funky new kid on the block, and while it’s not quite ready for prime time, the only thing holding it back is the lack of a big-arsed prospective validation dataset. It is precise, easy to measure with standard equipment, and not plagued by the analytical interferences of CSF bilirubin. This will definitely be a “Watch This Space” contender to replace bilirubin/xanthochromia assays in the next few years.
Petzold A et al. 2011 J Stroke Cerebrovasc Dis 20(6):488-93.
  • n = 14 (known SAH) + 44 controls (headache ?SAH & got LP)
  • Ferritin levels in controls 3.9 ng/mL for clean stabs and 9 ng/mL for traumatic taps
  • Ferritin levels in SAH = > 65ng/mL day 1 increasing to > 1,750ng/mL day 11
Petzold A et al. 2009 Neurocrit Care 11(3):398-402.
  • n = 2 patients with CT +ve SAH
  • Serial measurements of CSF bilirubin and CSF ferritin
  • Bilirubin fell to “normal” by the end of 1-2 weeks
  • Ferritin peaked at ~3,000 ng/mL at 2 weeks (ref range < 12 ng/mL)

Putting It All Together

So, someone wanders in to ED with a sudden onset thunderclap worst ever someone-just-dropped-an-axe-through-my-head sort of headache that started 2 hours ago, they have a GCS of 15 (NB: sunglasses don’t get you a score of 3 for eyes closed, and besides, if they’re the sort of person who comes to ED at night with a headache and wearing sunnies (pre-packed carpet bag, pillow and jim-jams are all optional extras) then they have no sinister intracranial pathology, anyway) and you find yourself wondering “Hey, I wonder if they have a SAH?”. You order a plain CT, and it’s normal. What to do now… ?
Received wisdom whispers sternly in your ear: “LP… Do the Ell-Peee…”
Cool. So you’re going to follow up a 98-100% sensitive test (plain CT) with an invasive, painful test, with the sensitivity/specificity of a coin-toss, because that’s the way we’ve always done it. What if there was another way?
Mr EBM saunters up to your other shoulder, kicks Received Wisdom in the nether bits and provocatively suggests: “Use your braaaain, padawan… Use your braaaain.”
If this patient has a SAH as the cause of their headache, we only care if it’s aneurysmal (85% of them, remember?). Because the 10% of them that are perimesencephalic are clinically irrelevant apart from needing to treat the headache with some analgesia, and the other 5% are caused by stuff that to a first approximation are simply unfixable. So if they have a suspicious headache, with a negative CT, would it make sense to look for an aneurysm, just in case? Let’s see what happens if we do a CTA, which has a NPV of ~98%. What happens if it’s positive? What if it’s negative?
If the CTA is negative, the combined pre-test probability of ~10%, the negative predictive value of the plain CT of 95% (I’m being very pessimistic here), and the negative predictive value of the CTA of ~98% all comes out in the wash to produce a post-test probability of a missed aneurysmal SAH that is wayyy less than 1 in 10,000. I don’t know about you, but I’d be fairly comfortable discharging someone who I thought had a risk of 1 in 10,000 that their headache was a tiny SAH missed by both scans.
If the CTA is positive, there are two options:
  • The aneurysm is incidental (2.5% background chance) and a DSA is an unnecessary risk.
  • The aneurysm is the culprit (one of the 2% we may have missed on plain CT) and has bled but we can’t see the blood, and they should have a DSA and neurosurgical intervention as indicated.


Those who are not terribly keen on the CT + CTA approach to excluding SAH are (quite reasonably) concerned about the first situation above: How many unnecessary / avoidable / extra DSAs will we end up doing because the CTA finds an incidental, “innocent” aneurysm? This is a valid concern, but a very quick consideration of the relative risks involved, informed by an understanding (often lacking) of just how bad CSF analysis for SAH is, can offer an informative perspective. Let’s crunch a few numbers:


  • 100,000 patients present to ED with a suspicious headache, and cop a plain CT-brain
  • 9,800 have a positive CT and are appropriately shunted down the DSA / neurosurgical pathway
  • 90,200 have a negative CT (90,000 true negatives + the 200 that plain CT missed)
Now we bifurcate and either give them a CTA, or an LP…..
For the CTA crowd
  • 4,059 have a positive CTA (1804 innocent aneurysms + 2,255 culprit pathological aneurysms)
  • They all get a DSA:
    • 162 have a TIA/CVA but probably do okay at the end of the day. Probably.
    • 41 have a permanently disabling CVA and we’ve ruined their life.
    • 2 of them drop dead on the spot due to the DSA.
  • Number of unnecessary / extra DSA-harmed patients (due to CTA finding an innocent aneurysm)
    • 72 TIA/CVAs
    • 18 disabling CVAs
    • 1 death


For the LP crowd
  • 3,517 have a positive LP
  • They all get a DSA:
    • 140 TIA/CVAs
    • 35 disabling CVAs
    • 2 deaths
  • Number of unnecessary / extra DSA-harmed patients (due to false positive LP)
    • 70 TIA/CVAs
    • 17 disabling CVAs
    • 1 death


Within the margins of error involved with the figures employed, the number of patients harmed by “unnecessary” DSAs is the same, whether we choose a CTA or an LP as our second line investigation. What about the flip side of the coin:missing a real aneurysmal SAH?

For the CTA crowd

  • 90,200 CTAs performed
  • 170 of them have an aneurysmal bleed
  • 4 of them will have their lesion missed on CTA

For the LP crowd

  • 90,200 LPs performed
  • 170 of them have an aneurysmal bleed
  • 17 of them will have their lesion missed on LP*

(* NB: This is being very generous, as there is evidence the false negative rate can be considerably higher than 10%)



  • CT alone at < 6 hours post-ictus is probably enough. Honest. Certainly sufficient to drop the probability of badness to < 2%.
  • [ CT + LP ] and [ CT + CTA ] cause the same amount of harm from subsequent overinvestigation (DSA)
  • [ CT + LP ] misses at least 4 times as many real aneurysmal SAHs as [ CT + CTA ]
  • LP with visual inspection is a long long way from being a gold standard test for SAH
  • LP with spectrophotometry is much, much better but still not perfect
  • LP with ferritin measurement as the primary discriminator will be even better, but isn’t here yet
  • The other aspects of CTA (extra radiation, contrast load, having to convince a radiologist to do it) and LP (risk of infection, post-LP headache, extra diagnostic information to make or exclude alternative diagnoses) must be considered when deciding which approach to take, and may/will vary on a case by case basis, and between clinicians.
Posted by: Chris Cole | June 3, 2014

PPI’s in Upper GI Bleeding

I thought it might be useful to take a brief look at the evidence surrounding the use of intravenous proton-pump inhibitors (PPIs) in patients presenting to ED with upper GI bleeding (UGIB).

The current situation:

  • We routinely give an IV bolus of PPI (pantoprazole 80 mg) + ongoing infusion (8 mg / hr) for 72 hrs (or until discharge)
  • If we don’t do it, the gastroenterology team will ask for it
  • If we still don’t do it, they’ll order/start it
  • The rational pathophysiologic basis for this is that reduced stomach acid = less irritation of delicate bleeding tissues = a good thing

As with many interventions, however, what sounds like it should be true, what we want to be true, and what actually is true can be very different things. So what sayeth the evidence?

The folks at NNT ( ) think it’s all a load of rubbish:

The guys at LITFL thought it was pretty reasonable standard practice back in 2010:

…but have since linked to a few articles by others suggesting that maybe PPI’s aren’t all they were cracked up to be in UGIB after all.

A quick look at the most informative papers I came across (there are others but these are the Greatest Hits):


N Engl J Med. 2000 Aug 3;343(5):310-6.

Effect of intravenous omeprazole on recurrent bleeding after endoscopic treatment of bleeding peptic ulcers.

  • This paper is the “NINDS stroke tPA trial” of the UGIB world & why we started using PPIs in UGIB
  • n = 120 + 120
  • Rx with PPI bolus + infusion x 72hrs after endoscopic haemostasis vs placebo
  • Primary endpoint re-bleeding within 30 days: 6.7% vs 22.5% (Rx vs placebo)
  • No difference in mortality, or number requiring surgery
  • No difference in re-bleeding while in hospital
  • No difference in length of stay
  • No difference in ulcer healing at 8 weeks (in fact there was a trend toward better healing in placebo group)



Health Technol Assess. 2007 Dec;11(51):iii-iv, 1-164.

Systematic reviews of the clinical effectiveness and cost-effectiveness of proton pump inhibitors in acute upper gastrointestinal bleeding.


  • Included all papers up to 2006
  • PPI after endoscopy reduces re-bleeding
  • No impact on mortality
  • PPI before endoscopy reduces “stigmata of recent haemorrhage” at endoscopy (fails to move me)
  • PPI before endoscopy has no impact on clinically relevant outcomes


  • 6 x RCTs included
  • n = 2,223
  • Might reduce the number of patients with stigmata of recent haemorrhage at endoscopy (how exciting)
  • Reduced need for injection during endoscopy (probably a good thing, but…)
  • No evidence for reduction in mortality, re-bleeding or need for surgery


So, while your mileage may vary depending on which RCT or systematic review you choose to hang your hat on (and to be fair they do all have their methodologic foibles which are not worth delving into in great detail here), given the currently available evidence, it is difficult to dispute that:


  • Giving IV PPI probably makes endoscopy easier for the endoscopist (less blood, better view, less injecting)
  • Giving IV PPI might or might not reduce re-bleeding (but the relevance of this is highly questionable)
  • Giving IV PPI doesn’t save people from needing surgery
  • Giving IV PPI has no impact on long-term (> 2 months) ulcer healing
  • Giving IV PPI has no impact on mortality (and this is consistent across all RCTs to date)


But, I hear you ask, if you’re going to acquiesce and give IV PPIs in ED for UGIB, how should you do it? Do we really need to tie up a dedicated IV line for an ongoing PPI infusion? Well, probably not, as is turns out:


Am J Gastroenterol. 2008 Dec;103(12):3011-8. doi: 10.1111/j.1572-0241.2008.02149.x.

High- versus low-dose proton pump inhibitors after endoscopic hemostasis in patients with peptic ulcer bleeding: a multicentre, randomized study.


  • n = 238 + 236
  • Either got PPI 80 mg bolus + 8mg/hr infusion x 72hrs…      or a 40 mg bolus once daily x 3 days
  • No difference in re-bleeding (11.8% vs 8.1%)
  • No difference in transfusion requirements (1.7 vs 1.5 units PRBCs)
  • Shorter length of stay for low-dose bolus group (37% vs 47% <5 days)
  • No difference in mortality


Take Home Message

  • PPIs for UGIB in ED probably make no significant difference to patient outcomes
  • Improving the endoscopist’s view (the only benefit) is a fairly soft indication for giving a drug, but not entirely unreasonable
  • If you’re going to give a PPI, a single IV bolus is fine… There is no need to tie up an IV line for an infusion


Chris Cole – April 2014

Posted by: Chris Cole | May 5, 2014

Misuse of Triage Category to target ED co-payments

Unless you’ve been hiding under a very large rock for the past month or, like me, tend to catch the news only in brief snippets via the internet, you are most likely aware of the 31 March release of the National Commission of Audit  report. This is essentially a collection of reviews and recommendations made by a government appointed think-tank on wide-ranging issues affecting the ongoing management of Australia as a nation, with a heavy emphasis on economics, models of federal / state government funding and co-ordination, and with a particular focus on expenditure constraints.
The report(s) can be found at:
Health care expenditure and options to contain it are one of the areas addressed in some detail in the report. The relevant sections are 7.3 (healthcare in general) and 7.4 (the PBS) and can be found here:
One widely publicised recommendation is the introduction of a co-payment of $15 for patients seeing a GP. Measures would be introduced to prevent GPs from waiving or circumventing this. This has been enthusiastically criticised by the AMA and RACGP, and will not be discussed further here.
A second recommendation, perhaps not so widely known as yet, is the introduction of a co-payment for patients attending public hospital Emergency Departments (EDs). No exact figure has been attached to this in the report, however it is recommended that this co-payment should be “…at levels higher than those proposed for out-of-hospital-services.”
This recommendation, by itself, is concerning enough. However, the report goes on to suggest the manner in which we should select which patients are liable for an ED co-payment, and which patients should be seen for free. The recommended discriminator is the Australasian Triage Scale (ATS) category assigned to the patient:  Category 1, 2 & 3 patients should be seen without charge, while category 4 & 5 patients should be liable for the co-payment. The stated aim of this approach is to charge those patients who could have seen their GP instead of presenting to ED (and specifically, to discourage patients who might “inappropriately” avoid seeing a GP and paying the new GP co-payment, from turning up to a free ED instead).
The relevant section of the report is quoted verbatim below for your reading pleasure:

By introducing co-payments for services that are currently covered by bulk billing there is a risk of cost shifting, as some patients may seek out free treatment in the emergency room of public hospitals for services that would more appropriately be treated by a general practitionerTo address this issue, State governments should consider introducing equivalent co-payments for certain emergency room settings.

A possible co-payment structure for emergency rooms could be based on the hospital triage categorisation system. Emergency room patients are currently triaged on the basis of the speed with which they need medical attention. Triage categories one, two and three relate to patients who present with critical, life-threatening or potentially life threatening conditions. Co-payment arrangements would not apply in these cases.

Triage categories four and five relate to less urgent conditions that in many cases could be more effectively treated in a General Practitioner settingState governments could consider introducing co‑payments for triage categories four and five, at levels higher than those proposed for out-of-hospital services.

A payment structure along these lines would retain free emergency room care for those in genuine need while providing price signals that direct patients to access the most cost effective treatment setting.

There would also be a need to ensure that the co-payment provides a price signal as actually intended. In this light, consumers would not be able to insure against the co-payment. Similarly, medical practitioners who wish to bulk bill should not be able to waive the co‑payment. The Government will need to ensure the co-operation and compliance of insurers and doctors in the implementation of these arrangements.

While the wider ethical issue of charging Australian ED patients anything to access what is meant to be free universal healthcare is certainly worth addressing, it is fodder for another time. Here, I wish to examine more specifically the proposed use of the ATS as a dichotomising tool for deciding which patients should have to pay. The ATS was never designed, nor validated, as a tool for defining which patients have “GP” versus “ED” presentations. Its role is purely to sort people based on the urgency with which they should receive full assessment and treatment, in the context of the perceived likely time-course of any threat to life, limb or health. The Australasian College for Emergency Medicine (ACEM) guidelines on the implementation of the ATS open with:

“Triage is an essential function in emergency departments (EDs), where many patients may present simultaneously. It aims to ensure that patients are treated in the order of their clinical urgency which refers to the need for time-critical intervention. Clinical urgency is not synonymous with complexity or severity.


The second sentence, above, cannot be stressed enough. While this statement is self-evident to any emergency medicine clinician, the layperson’s understanding of the ATS lends itself to the incorrect conclusion that a lower triage category equates to “less severe” or “perhaps doesn’t need to be at the hospital”. Unfortunately, in a political climate of seemingly intentional misunderstanding of the factors that impede efficient function of an ED, the government’s ongoing obsession with the idea that large numbers of “GP” patients somehow account for the majority of ED overcrowding, long waiting times and poorer patient outcomes, makes it all too easy to latch on to what superficially appears to the uninformed to be a simple and ready metric of who is, and is not, a bona fide ED patient. It is unfortunate that those in a position to formulate, and potentially implement, policy in such matters lack the subject matter expertise (or the willingness to seek appropriate expert advice) to enable them to make properly informed decisions.

So, let us suppose that the Commission of Audit recommendations are adopted as policy, and we are instructed to extract a co-payment of as yet undetermined magnitude from every ATS Cat 4 & 5 patient presenting to ED. The goal is to discourage inappropriate use of ED, and in a perfect system, to charge only those patients who present to ED when they could reasonably be expected to have sought medical care from a GP instead. One might wonder, of all the Cat 4 & 5 patients presenting to a typical Australian tertiary ED, how many of them could have, or should have, gone to their GP instead? This is inherently a quite subjective clinical judgement call, but we could probably agree on a couple of overarching ground rules without too much dissent. Firstly, on what basis do we decide whether the patient should reasonably have thought they required the services of an ED? Wielding the hugely informative power of the retrospectoscope is grossly unfair; we cannot use the final diagnosis, or disposition, to decide who needed to be in ED. The patient and, indeed, the triage nurse have only the patient’s presenting complaint, their symptoms, to go on, and this is a more fair and robust starting point.

Raven et al. took an interest in the disparity or non-predictive relationship between presenting complaint and final diagnosis and disposition in some 34,942 ED patients, publishing their work in JAMA in 2013. Among their findings, only 6% of patients (across all triage categories) were deemed “GP patients” despite their presenting complaints being shared by some 88% of all ED patients, including those requiring admission and/or urgent intervention. The take home point here is that there is exceptionally poor concordance between the patient’s presenting complaint, and their final diagnosis. Therefore, using their final diagnosis to retrospectively determine whether they should have come to ED in the first place is not a tenable or sensible way to discriminate between “GP” and “ED” patients and, by extension, who should be billed for a co-payment.

So, if we are going to try to filter out the “GP” patients, we should do it based on the information available at triage, with no knowledge of their future clinical journey or ultimate diagnosis. In the spirit of scientific inquiry, I have reviewed the triage information (and only the triage information) for every ATS Cat 4 and Cat 5 patient presenting to TCH ED over a single 24 hour period (midnight to midnight) on a weekday in 2014. Based on the recorded triage information (age, presenting complaint, triage nursing note) I have made an admittedly subjective (but clinically and experientially reasonably informed) decision as to whether each of those patients could reasonably have sought assistance from their GP, or whether presentation at ED was appropriate. This judgement was made without consideration of their final outcome, or the fact that their presenting complaint was unlikely to represent a diagnosis requiring hospital resources. If it was something that could herald important pathology or require intervention, and could not be reasonably discerned by a layperson as being safe / non-urgent, then I put it in the “appropriate for ED” bucket (e.g. abdominal pain in a child, acute mental health presentations, chest pain). Complaints that frequently present to ED, but could easily be managed in the community (e.g. minor peripheral trauma that happened 2 or 3 days ago that may need an xray), and chronic complaints with no real acute component were tossed in the “GP” bucket. *

Preliminary results:

  • n = 125 patients (all Cat 4 & 5 presentations for a 24 hour period on a weekday)
  • 58 % presented to ED between 0800-1600hrs, 42 % “after-hours”
  • 29 % had presentations I deemed could be reasonably managed via their GP
  • 71 % had presentations I deemed required ED assessment +/- management
  • 67 % of “GP” patients presented during business hours of 0800-1600hrs
  • 33 % of “GP” patients presented outside of those hours
* Clearly there are huge limitations on any conclusions drawn from this data. Defining rigid criteria by which one can categorise “GP” versus “ED” patients is fraught with difficulty. In an effort to iron out the difference in clinical gestalt in making such assessments, my intention is to recruit a number of FACEMs, and a number of GPs to independently review the same data sets, and pool their responses (calculating kappa values of inter-observer variability) and examining the results. While there will be some variability, I suspect it is highly unlikely that such an analysis would not find that the overwhelming majority of Cat 4 & 5 patients (71% in my limited analysis) are not “GP” patients.
Thus, even if one accepts that we should be charging “GP” patients to use ED resources, if we are forced to implement a co-payment using the inappropriate tool of the ATS as a discriminator, we will  almost certainly be inappropriately billing the vast majority of patients who are required to pay the fee.
Posted by: Chris Cole | December 31, 2013

South Australian Dept of Health illegally billing ED patients

SA Health is currently unethically and illegally charging rural patients for ED care.

Creatively, some political tosspot decided to dichotomise the ATS triage system such that Cat 1 & 2s are seen for free and Cat 3,4, & 5s are privately billed by the doctor providing the service and have to chase Medicare for partial reimbursement later.

This is in direct contravention of the National Healthcare Agreement, and represents a version of state/federal healthcare cost shifting not seen in any other Australian state or territory. (Snaps for boldly going where no other state health dept has, illegally, gone before, guys!).

In the spirit of laziness, rather than re-hashing it all over again, below is the body of a letter to Jack Snelling, Minister for Health in SA:


Dear Mr Snelling,

I am writing to bring to your attention an interesting and concerning
fact about the delivery of rural healthcare in South Australia that
has only recent come to my attention.

In contravention of the National Healthcare Agreement, and federal
legislation, SA Health has been insisting that rural GPs who provide
after-hours emergency medical services at state public hospitals
privately bill some patients for public hospital emergency department

Those patients deemed suitable by your department to be privately
billed comprise those who are not admitted as inpatients to the
hospital, and whose Australasian Triage Scale (ATS) category is
assessed as being 3, 4 or 5. These patients are being privately billed
by the GP providing the service, and must then seek partial
reimbursement via Medicare.

State public hospitals, including rural hospitals, are mandated,
required and expected to provide emergency medical and nursing care
for all Australians, with no cost to the patient at point of care
delivery. This has been the case for some decades now. Your
department's current policy and practice in forcing emergency
department patients to pay for emergency medical care that your
department is already funded to provide, is blatant state / federal
cost-shifting. It is also grossly unethical. It does not occur in any
other state or territory in Australia. In the context of the National
Healthcare Agreement, it could also be construed as illegal.

There is a provision in the National Healthcare Agreement (paragraph
G.21) which allows that: "Eligible patients may obtain non-admitted
patient services as private patients where they request treatment by
their own GP, either as part of continuing care or by prior
arrangement with the doctor”.

The salient point here is that the intent of this provision was to
facilitate obtaining care from one's own local GP in certain
circumstances. And then only if the patient specifically wants and
requests this mode of care provision. It was not intended to be
applied as a blanket billing model for a majority of emergency
department patients. Its utilisation as such by your department is
quite frankly an abuse of the system, and clearly not in keeping with
either the intent or, indeed, the specific and exact wording and
requirements of the agreement.

Furthermore, your department's ad hoc adoption of a dichotomous
application of the triage system to determine who should be billed
(category 3,4 and 5 patients) and who should be treated free of charge
(category 1 and 2 patients) has no basis in reality or evidence, and
belies a lack of understanding of the purpose and application of the
Australasian Triage System in the practice of emergency medicine. I
would refer you to the Australasian College for Emergency Medicine
( for a comprehensive definition, but in brief: a
patient's assigned ATS category (1 to 5) does NOT reflect whether or
not they require emergency department, high acuity, complex or
hospital-level care.

This post is long enough without any further preamble, but basically this is a cut & paste from a response I made to a discussion on Dr Casey Parker’s exceptionally awesome medical education website & blog at Broome Docs.  This follows some days and weeks (and months really) of acute on chronic back and forth between several luminaries of the resuscitative care world (like Casey, Scott Weingart, Minh Le Cong, Anand Senthi, Ryan Radecki, Seth Treuger, etc…)  and interested spectators (like me) on the topic of how extensively we need to investigate patients presenting to ED with symptoms and signs that might represent a pulmonary embolism (a blood clot jammed in the arteries in your lungs).


The short short version is that PE is relatively common, the test for finding one is harmful in itself (but we don’t know how harmful, exactly), PE can kill you or leave you crippled (but little ones don’t), we’ve never proven that treating all of them (especially the little ones) actually saves lives, and we do indeed treat them all (with blood thinning medications), even the little ones. We use various scoring systems (based on things like heart rate, oxygen levels, whether you have cancer, etc.) to guesstimate the chances that you might have a PE, and the idea is that if the chance you will be harmed by the PE you might have is greater than the chance you will be harmed by the test we use to look for it, then we crack on and do the test. Some of the numbers used to estimate those chances are pretty flexible and poorly defined, though, so there is a lot of angst surrounding just how we should go about rationing out the tests and the treatment in order to avoid doing more harm than good. Oh, and one more thing… the term “clinical gestalt” basically means a gut feeling on the part of the doctor. An informed, educated gut feeling, but nonetheless it still amounts to essentially deciding something based on feeling a disturbance in The Force.


So, we continue on into the slightly disjointed responses I made to a few points that have come up:

1.  Delineate / separate the two distinct questions of (a) current PE vs. (b) risk of next PE

I’ve just listened to Scott’s response, and he made this point beautifully. I could not agree more. When we see a patient in ED with possible PE, we should break our inquiry into two questions. Does the patient have a PE right now? If so, is it causing enough trouble for me to care about any imminent threat to their well-being? (<– I’m claiming artistic license here and calling that one question, by the way; given my sample size of 3 things, 1 = 2 is well within the boundaries of a 95% CI). And secondly, even if I’m happy the PE I think is in their lungs is not a major problem today, if there is a PE there, what is the risk of the _next_ PE being haemodynamically or mortally significant, and what should I do about it?

Frustratingly for the “Avoid CTPA” camp (and in the interests of full disclosure, I’m pitching my own tent firmly on their patch of turf), the answer to the first question does somewhat inform the answer to the second one. What I would love to know, is just how much does one’s risk of a fatal/crippling/”bad” PE in a given time period (say, the next 12 months) increase if one has a haemodynamically irrelevant or subsegmental PE today? Even answering the posterior, converse question would be handy: for those patients who have a “bad” PE, how many of them have a sentinel/warning smaller PE that was symptomatic enough to bring them to hospital?

If the answer to both of those sub-questions is “not very much”, then the more relaxed our quest to answer the first big question can afford to be. If I know that missing a subsegmental PE in a patient who is low-risk according to clinical decision rules will only result in say a 0.1% absolute risk increase for “bad” PE in the next 12 months, I will be much happier not ordering that CTPA. If, on the other hand, I know that 10% of “bad” PEs are preceded by an ED-presentation-inducing sentinel subsegmental PE, or that there’s an annual increased absolute risk of “bad PE” of say 5%, then there is a more pressing need to find or exclude the current possible PE.

As we all know, it’s bloody easy to confidently exclude a clinically significant current PE using nothing more exotic than vital signs, ultrasound, ECG and Dr Weingart’s patented Looks Like Shit ™ or LLS score. (I’d love to do a prospective trial of the LLS score for massive/submassive/lysable PE, by the way). Unfortunately, until we get a better handle on how a non-haemodynamically significant PE affects the attributable risk of future “bad” PE, we just don’t have enough information to make a fully informed decision regarding how hard to look for that small current PE.

Being nearly 3am and running, as I am, on dark mint chocolate and tea, I hereby invoke the conceptual model of “Serial Schrodinger’s Cats”:

– The existence of the current small PE is represented by the mortal state of the cat in box #1. The probability of the atom decaying, the poison being released and killing the cat is ~20% (i.e. roughly the incidence of PE in those presenting to ED with what we think is ?PE)

– Doing the CTPA opens the box, collapses the probability waveform, and tells us if there’s a PE (i.e. if the cat is now, to paraphrase Monty Python, an ex-cat).

– The occurrence of a “bad” PE in the next 12 months is represented by the state of the cat in box #2.

– The discovery of a dead cat in box #1 cascades to alter the probability waveform of box #2. (We presume that the probability of a dead cat #2 was close to zero to begin with, but the reality of a dead cat # 1 puts his counterpart at greater risk).

– We NEVER OPEN the second box, unless the patient presents with a lethal PE some time after we missed the initial one, or they have a massive PE whilst already anticoagulated. This is a consequence of _everybody_ being anticoagulated immediately if we find a dead cat in box # 1.

– All of our efforts and discussions to this point therefore revolve around finding a way to AVOID OPENING BOX # 1.

A bit like Gene Hackman’s advice to Tom Cruise in “The Firm” when handing him a sealed envelope containing his new job offer and salary details, “A good attorney wouldn’t have to ask what’s in the envelope”, we strive to divine as accurately as possible the state of health of cat #1 without cheating and peeking in the box. But the _only reason_ we care about cat #1 is the effect his or her demise has on cat #2. However, anticoagulation destroys the quantum entanglement that links the two cats, and prevents us from determining the conditional probabilities upon which to base our estimates of how cat # 1 is doing, without looking in the box.

The point is: We need to start opening box #2, to confidently determine when it’s safe not to open box #1.

*(There is a more complex model, involving a 3rd cat representing the harms of CTPA + anticoagulation, but… we’re just not going there tonight… )

2. All gestalts are not created equal.

Yes, gestalt (or a combination of gestalt + clinical decision rule (CDR)) outperforms CDRs alone. But I can’t help but think that we should simply abandon the idea of clinical gestalt being a separate entity. For what is clinical gestalt? It is a set of factors or criteria, present or notably absent in our patient’s history and examination findings, which alters our notional pre-test probability of the patient having the pathology of interest, in this case a PE. What is a CDR or risk-stratification-tool (such as Wells or Geneva)? I submit that they are precisely the same thing, with one notable qualitative difference: Wells, Geneva, etc. are an explicitly defined and consistent subset of all of the myriad variables we might consider when estimating that pre-test probability. Clinical gestalt is much more a moving feast; it is a highly user-dependent, non-uniform and undefined collection of some of those same, and some different, variables found in the CDRs.

This is important, because even though overall when averaged across many physicians and many patients, gestalt might come out looking pretty good in the wash, there is likely to be much higher physician to physician and patient to patient variability in the accuracy of the assigned pre-test probability than there is with consistent application of a rigidly defined set of criteria. Thus, by employing gestalt, there will be a larger number of “outlying” patients who are either under or over-investigated and/or treated which by definition causes greater overall harm.

This is one of the reasons I favour Geneva over Wells (when used in this context), as it seems a bit recursive and self-referential to include, essentially, “Do I reckon this is probably a PE?” as one of the highest scoring components of a CDR which I am employing specifically to help me estimate the probability that this might be a PE. Geneva, at least, is more objective and consequently more reproducible across different observers/scorers.

The point is that all of these estimations use criteria that significantly overlap. So, I fret that the logic in saying things like:

“Gestalt has superior predictive power than CDR 1 or CDR 2, and gestalt + CDR 1 is even better!”

…when that essentially means, for discrete criteria {a,b,c…n} :

(a+b+c+d+e+f+x+y+z) is better than  (a+b+c+d)  or  (a+c+d+f+g+h)
(a+b+c+d+e+f+x+y+z) + (a+b+c+d)   is better than  (a+c+d+f+g+h)

…is a wee bit flawed, and we can probably do better. I admit that real world effects of synergism and the fact that the addition of risk factors is not a zero-sum game mandate that we must somewhat laboriously conduct large prospective trials of each subset of criteria that takes our fancy, rather than just applying Boolean logic to simplify the problem (though the idea does have some merit), but I suggest that there must be some way of more rigidly quantifying what it is that makes up our individual clinical gestalt guesstimations, other than waving our hands vaguely and ascribing it to “using the force”.

If and when we can get a handle on a well-defined set of variables that in reality are what we actually use (even if not explicitly or “out loud”) to tip us over that critical “Hey, you know what, I reckon this just might be a PE… aww, crap!” point, we will be one step closer to a more rational evidence-based approach to estimating pre-test probability in a less volatile and noisy manner, so that practice will be more consistent and fewer patients will be unnecessarily harmed by “gestalt variation”.

3.  Let’s do an RCT.  No… really.

Sick of a lack of high quality evidence undermining your lovingly crafted prognostic algorithms? Tired of the unquestioned acceptance of dogma from a bygone era bursting your bubble of hope for a more objective predictive model? Take heart, for you are not alone!

Is anyone interested in seriously looking into the possibility of setting up a prospective RCT for this? Ethics approval is clearly a major issue given the ingrained nature of universal anticoagulation, but I suspect it is not unthinkable that a strong enough case can be made, on solid evidential grounds (or predominantly the lack of evidence, really), to crack on and actually learn something that will probably change practice.

Posted by: Chris Cole | April 12, 2013

The world of #FOAMED

Foam VennThe world of #FOAMED

Does anyone else ever feel there is a certain degree of over-representation of particular topics in the #FOAMED corner of the Twittersphere? 🙂

The recent media frenzy regarding chiropractors partaking of their CPD (continuing professional development) by attending lectures given by known anti-vaccination campaigners has sparked renewed interest in their particular brand of fraud quackery… “complementary health” services, from those with meaningful qualifications who actually understand what “science”, “evidence-based” and “honest” mean, and by whom professional ethics and responsibility are considered important and necessary.

In response to the flurry of popular media reporting, individual chiropractors and representatives of their professional associations and governing bodies have come forth to clarify (or in some cases to seemingly actively avoid clarifying) their positions and it was in this context that I stumbled across some nuggets of wisdom from a gentleman by the name of Tony Croke, who is a board member of the Chiropractor’s Association of Australia (CAA). Perusing Mr Croke’s blog, I came across an article which I thought was probably worth mentioning, given its abject failure to co-incide with reality.

The blog article, entitled “it’s not just babies who need safety pins”, concerns the safety and efficacy of chiropractic treatment for children, and can be found here:

Let’s start with Mr Croke’s statement about safety:

“Yes, chiropractic care is super-safe for kids.  In fact, there hasn’t been a serious negative outcome reported anywhere in the world since 1992”

Without worrying about the fact that chiropractors have nothing like the level of internal and external auditing, quality control and adverse incident reporting infrastructure that medicine has (making adverse reactions to chiropractic intervention less likely to be reported in the literature in the first place), or that they treat far fewer patients than doctors do, or that the adverse effect rate for medicine is balanced by the fact that real medicine offers proven tangible benefits in exchange for the risk of those adverse effects, let’s just take that statement at face value. Nothing since 1992, eh? Hmmm…

  • A systematic review (2007) of adverse events in paediatric patients treated with spinal manipulation found 14 specifically reported cases of significant adverse events directly caused by chiropractic treatment, 9 of them disastrous (e.g. subarachnoid haemorrhage, paraplegia) and 20 further cases where provision of chiropractic caused delay to real medical care and subsequent adverse outcomes:

If you’re going to assert absolute crap in a public forum, at least take five minutes on Google or PubMed and confirm that it’s not blatantly obvious crap.

Mr Croke then goes on to discuss why performing chiropractic techniques on kids is a good thing:

“A chiropractic textbook published in 1927 used the famous analogy of the safety pin to explain how a brain and body can become partially disconnected by subluxation.  This disconnection opens up the possibility of reduced effectiveness of nerve system function and “dis-ease”, a state of abnormal physiology.

Chiropractic restores the connection between brain and body, helping the body to control and coordinate all its functions in the best way possible.  And that’s a good thing for any body to have at any age.”

Ignoring the fact that the purported pathophysiologic basis of chiropractic (subluxation theory) has absolutely zero evidence for its existence anywhere except in the imagination of chiropractors, let’s ponder whether there is any real world evidence of efficacy for chiropractic in treating children for… well… anything:

Chiropractors in Australia are now a regulated group, and fall under the auspices of the AHPRA.  Any real doctor spouting incorrect or misleading facts or advice in a public forum where they are asserting their position of knowledge and authority as healthcare professionals would (and should) be taken to task by AHPRA. Why is it that non-evidence based peddlers of misinformation that is to the detriment of public health can get away with it without a second glance? It presumably represents tacit admission by the authorities that they know chiropractic is unsupported bullshit, and that they assume the public must know this as well and therefore don’t need to be protected from it, but if that is the case, why pretend to be “regulating” their industry at all? And if AHPRA truly thinks chiropractors are bona fide healthcare providers, what excuse is there for not holding them to the same standards as other practitioners who practice real medicine?

Older Posts »