The results of this review suggest that low back pain screening instruments may perform better in predicting poor disability and absenteeism outcomes than in predicting poor pain outcomes. However, the evidence presented is not sufficiently robust to support firm conclusions. The use or search terms for prognostic/predictive studies, and the inclusion of an English language limit, may have resulted in relevant studies being missed by the searches. Weaknesses in the analysis methods and risk of bias assessments limit the usefulness of the reported results.
Overall summary High risk of bias in the review
The use or search terms for prognostic/predictive studies, and the inclusion of an English language limit, may have resulted in relevant studies being missed by the searches. Weaknesses in the analysis methods and risk of bias assessments limit the usefulness of the reported results.
|A. Did the interpretation of findings address all of the concerns identified in Domains 1 to 4?||Probably no|
|B. Was the relevance of identified studies to the review's research question appropriately considered?||Probably yes|
|C. Did the reviewers avoid emphasizing results on the basis of their statistical significance?||Probably yes|
|Risk of bias in the review||High|
|Number of studies||18|
|Number of participants||5,834|
|Last search date||June 2016|
|Objective||To evaluate the performance of low back pain screening instruments for determining risk of poor outcome in adults with low back pain of less than three months duration.|
|Population||Adult (>18 years of age) patients with recent onset acute (zero to six weeks) and subacute (six weeks to three months) low back pain, with or without leg pain.|
|Interventions||Prognostic screening instruments: the STarT Back Tool (SBT), the Orebro Musculoskeletal Pain Screening Questionnaire (OMPSQ), the Vermont Disability Prediction Questionnaire (VDPQ), the Back Disability Risk Questionnaire (BDRQ), the Absenteeism Screening Questionnaire (ASQ), the Chronic Pain Risk Score (CPRS), and the Hancock Clinical Prediction Rule (HCPR).|
|Outcome||Discrimination of the screening tool/instrument, measured by Area Under the Curve (AUC)|
|Study design||Prospective cohort studies with follow-up outcomes at a minimum of 12 weeks.
Retrospective cohort studies, analysis of a single arm of a randomised controlled trial or case series reports were excluded.
|Reference standard||Poor outcome, measured at 3 to 12 months, and defined as: pain (numerical rating score of 3 or more, or pain index score >16); disability (Oswestry Disability Index, Roland Morris Disability Questionnaire, or Spine Functional Index score 30% or more); absenteeism measures.|
|PP factor||Prognostic screening instruments: the SBT, the OMPSQ, the VDPQ, the BDRQ, the ASQ, the CPRS, and the HCPR.|
The pooled analysis of five studies assessing the Keele STarT Back Tool (SBT) indicated poor performance for predicting poor outcome, defined by an NRS pain score ≥3, at follow-up (3 or 6 months). The pooled area under the curve (AUC) was 0.59 (95% confidence interval (CI) 0.55 to 0.63, n = 1,153). For predicting disability (ODI or RMDQ score ≥30% at 3 or 6 months), the pooled AUC was 0.74 (95% CI 0.66 to 0.82, three studies, n = 821).
Pooled analysis of four of the seven studies investigating the Orebro Musculoskeletal Pain Screening Questionnaire (OMPSQ) indicated poor performance for predicting poor pain outcomes (NRS pain score ≥3 at 3 or 6 months); AUC 0.69 (95% CI 0.62 to 0.76, n = 360). For predicting disability, (SFI score, RMDQ score ore ODI score ≥ 30% at 3 or 6 months), the pooled AUC was 0.75 (95% CI 0.69 to 0.82, three studies, n = 512) and for six month absenteeism (>28 days) the pooled AUC was 0.83 (95% CI 0.75 to 0.90, three studies, n = 243).
The research objective was clearly stated and appropriate inclusion criteria were defined. A study protocol was registered on PROSPERO (CRD42015015778). No restrictions were reported, based on study characteristics or source of information.
|1.1 Did the review adhere to pre-defined objectives and eligibility criteria?||Probably yes|
|1.2 Were the eligibility criteria appropriate for the review question?||Probably yes|
|1.3 Were eligibility criteria unambiguous?||Probably yes|
|1.4 Were all restrictions in eligibility criteria based on study characteristics appropriate (e.g. date, sample size, study quality, outcomes measured)?||Probably yes|
|1.5 Were any restrictions in eligibility criteria based on sources of information appropriate (e.g. publication status or format, language, availability of data)?||Probably yes|
|Concerns regarding specification of study eligibility criteria||Low|
MEDLINE, EMBASE, CINAHL, PsycINFO, PEDro, Web of Science, SciVerse SCOPUS, and Cochrane Central Register of Controlled Trials were searched to identify all relevant studies. The reference lists of all included articles and relevant review articles were searched to locate any additional studies. The full search strategy was reported and included terms for prognostic/predictive studies and an English language limit, which may have resulted in relevant stuides being missed. Two review authors independently performed the study selection and any disagreements were resolved through discussion.
|2.1 Did the search include an appropriate range of databases/electronic sources for published and unpublished reports?||Yes|
|2.2 Were methods additional to database searching used to identify relevant reports?||Probably yes|
|2.3 Were the terms and structure of the search strategy likely to retrieve as many eligible studies as possible?||Probably no|
|2.4 Were restrictions based on date, publication format, or language appropriate?||No|
|2.5 Were efforts made to minimise error in selection of studies?||Yes|
|Concerns regarding methods used to identify and/or select studies||High|
Two reviewers independently extracted relevant data using a standardised spreadsheet. Sufficient general study characteristics appear to have been extracted to allow interpretation of the results. However, the use of an overall measure of discrimination (area under the receiver operating characteristic curve, AUC) represents a loss of information; reporting of paired statistics, e.g. sensitivity and specificity would have been preferable. Two review authors independently assessed the methodological quality of the included studies using the quality in prognostic studies (QUIPS) tool. Disagreements in ratings were discussed and if not resolved, a third review author was consulted. The QUIPS tool is intended to assess risk of bias in prognostic factor studies (studies which aim to identify potential predictors) and is therefore not the most appropriate tool for assessing risk of bias in studies evaluating existing screening instruments; since the studies included in this review were analysed as predictive accuracy studies, QUADAS-2 would have been a more appropriate risk of bias tool.
|3.1 Were efforts made to minimise error in data collection?||Yes|
|3.2 Were sufficient study characteristics considered for both review authors and readers to be able to interpret the results?||Probably no|
|3.3 Were all relevant study results collected for use in the synthesis?||Probably yes|
|3.4 Was risk of bias (or methodological quality) formally assessed using appropriate criteria?||Probably no|
|3.5 Were efforts made to minimise error in risk of bias assessment?||Yes|
|Concerns regarding methods used to collect data and appraise studies||High|
The synthesis included all eligible studies. Studies using different outcome measures and different follow-up durations were inappropriately pooled. Summary estimates were generated by simple pooling of the area under the ROC curve (AUC), a method which is not generally recommended as it takes no account of the trade off between sensitivity and specificity. Post-hoc sensitivity analysis was undertaken to explore the influence of study variation. Quality of the individual studies was considered in the synthesis of findings, but the risk of bias tool used was not appropriate for this review..
|4.1 Did the synthesis include all studies that it should?||Probably yes|
|4.2 Were all pre-defined analyses reported or departures explained?||Probably yes|
|4.3 Was the synthesis appropriate given the degree of similarity in the research questions, study designs and outcomes across included studies?||Probably no|
|4.4 Was between-study variation minimal or addressed in the synthesis?||No|
|4.5 Were the findings robust, e.g. as demonstrated through funnel plot or sensitivity analyses?||Probably no|
|4.6 Were biases in primary studies minimal or addressed in the synthesis?||Probably no|
|Concerns regarding synthesis and findings||High|
Background: Delivering efficient and effective healthcare is crucial for a condition as burdensome as low back pain (LBP). Stratified care strategies may be worthwhile, but rely on early and accurate patient screening using a valid and reliable instrument. The purpose of this study was to evaluate the performance of LBP screening instruments for determining risk of poor outcome in adults with LBP of less than 3 months duration. Methods: Medline, Embase, CINAHL, PsycINFO, PEDro, Web of Science, SciVerse SCOPUS, and Cochrane Central Register of Controlled Trials were searched from June 2014 to March 2016. Prospective cohort studies involving patients with acute and subacute LBP were included. Studies administered a prognostic screening instrument at inception and reported outcomes at least 12 weeks after screening. Two independent reviewers extracted relevant data using a standardised spreadsheet. We defined poor outcome for pain to be > 3 on an 11-point numeric rating scale and poor outcome for disability to be scores of > 30% disabled (on the study authors' chosen disability outcome measure). Results: We identified 18 eligible studies investigating seven instruments. Five studies investigated the STarT Back Tool: performance for discriminating pain outcomes at follow-up was 'non-informative' (pooled AUC = 0.59 (0.55-0.63), n = 1153) and 'acceptable' for discriminating disability outcomes (pooled AUC = 0.74 (0.66-0.82), n = 821). Seven studies investigated the Orebro Musculoskeletal Pain Screening Questionnaire: performance was 'poor' for discriminating pain outcomes (pooled AUC = 0.69 (0.62-0.76), n = 360), 'acceptable' for disability outcomes (pooled AUC = 0.75 (0.69-0.82), n = 512), and 'excellent' for absenteeism outcomes (pooled AUC = 0.83 (0.75-0.90), n = 243). Two studies investigated the Vermont Disability Prediction Questionnaire and four further instruments were investigated in single studies only. Conclusions: LBP screening instruments administered in primary care perform poorly at assigning higher risk scores to individuals who develop chronic pain than to those who do not. Risks of a poor disability outcome and prolonged absenteeism are likely to be estimated with greater accuracy. It is important that clinicians who use screening tools to obtain prognostic information consider the potential for misclassification of patient risk and its consequences for care decisions based on screening. However, it needs to be acknowledged that the outcomes on which we evaluated these screening instruments in some cases had a different threshold, outcome, and time period than those they were designed to predict. Systematic review registration: PROSPERO international prospective register of systematic reviews registration number CRD42015015778. Copyright © 2017 The Author(s).