What is GRADE?
Since 2000, the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) working group has worked to develop a systematic and explicit approach to making judgements about quality of evidence and strength of recommendations. The GRADE approach seeks to address many of the perceived shortcomings of existing models of evidence grading. Crucially, evidence is evaluated not study by study, but across studies for specific clinical outcomes. The methods developed by the GRADE working group take into account methodological flaws within the component studies, the consistency of results across different studies, how generaliseable the research results are to the wider patient base, and how effective the treatments have been shown to be. All treatment comparisons are given one of four GRADE scores reflecting the quality of the evidence: high, moderate, low or very low quality evidence. The approach taken by the GRADE working group is widely seen as representing the most effective method of linking evaluations of the quality of evidence to clinical recommendations. Our approach to grading evidence is based on the work of the GRADE working group. When taken with our existing intervention categorisations, we believe that our approach will give clinicians a clear view of the evidence relating to key treatment interventions.
How have we chosen the outcomes to report in our reviews?
At Clinical Evidence we have chosen clinical outcomes that matter to patients with that particular condition, meaning those outcomes that patients themselves are aware of, such as symptom severity, quality of life, disability, and survival, etc. We are less interested in proxy outcomes such as, for example, blood lipid concentrations, or ovulation rates. In each review, the outcomes we have chosen to report on are clearly listed in the outcomes section, and where possible, we have described how these might be assessed. Any individual study we report in our reviews may present data on a range of outcomes, some of which may be different to our designated outcomes of interest. However, at Clinical Evidence we only report and evaluate evidence on our pre-specified outcomes of interest.
How have we come to the GRADE scores for each comparison?
We have developed a pragmatic approach that has enabled us to apply the principles of GRADE to our systematic reviews in a reproducible and appropriate way. In each review, we present a GRADE table that identifies on what basis judgements of evidence quality are made. The GRADE table specifies what outcomes have been GRADED. For each chosen outcome of interest, we list what studies have contributed data on that outcome, the treatment comparison involved (for example, drug A versus placebo, drug A versus drug B), the number of people included in the comparison, and we report an overall GRADE score (from 4 to 0) based on our assessment of the overall quality of evidence on that outcome. The table specifies why points may have been deducted or added to attain the final GRADE score.
- Type of evidence: We allocate four points to evidence largely based on RCTs, and two points to evidence based on observational studies. Because Clinical Evidence reviews usually have a systematic search that is restricted to RCTs and systematic reviews of RCTs only, we have almost certainly omitted observational evidence of relevance to the GRADE comparisons. Where we have found no RCTs or systematic reviews reporting on an outcome of interest, we have reported that we found no clinically important results from RCTs on that outcome, rather than search for and report observational studies which are at high risk of bias, and therefore may not yield clinically robust results about the effects of interventions. This applies to most of our reviews. In Clinical Evidence reviews, we only systematically search for observational studies and base categorisations on these data, where RCTs are unethical or are unlikely to be performed (for example, in our Sudden Infant Death [SIDS] review). In the small number of reviews in which this applies, we have calculated a GRADE score based on observational evidence. We have not performed a GRADE analysis on statements based on expert opinion / consensus only.
- Quality points: We have assessed issues such as sparse data, follow-up, withdrawals, blinding, allocation concealment, incomplete reporting of results, and other quality issues into one quality category, and have allowed deduction of up to three points for quality flaws. We describe comparisons with fewer than 200 participants in total as sparse data. Whilst a definition of sparse data should be based on event rates, many of our outcomes are presented as continuous data, which does not easily lend itself to conversion into event rates. Therefore, for pragmatic reasons, we have chosen trial size as a proxy. Where there is sparse data on an outcome of interest, we deduct a quality point. This is based on the participants actually included in an analysis rather than the total number of people in a study, as the number analysed may be less than the total trial size.
- Consistency: We include heterogeneous studies, with different end-points and populations, providing they all evaluate the outcome in question and compare the same interventions. We deduct up to one point for inconsistent results among studies (for example, for statistical or conflicting results among studies), and add up to one point for evidence of a dose response or if adjustment for confounders would have increased the effect size.
- Directness: We deduct up to two points for issues which may limit the generalisability of the reported results to our specified population of interest. Such issues may include, for example, a restricted population in trials, the inclusion of too broad a population in trials, or the use of co-interventions in addition to our intervention of interest.
- Effect size: Because we rarely report a single meta-analysis for the outcome and comparison in question, and because many of our outcomes are expressed as continuous data, we have had to modify the GRADE recommendation to add one point for a relative risk or odds ratio of 2 or more, and to add two points for a relative risk or odds ratio of 5 or more. We look at all effect sizes for the comparison in question, reported in individual RCTs or meta-analyses, and add one point if they are all greater than 2 (or less than 0.5) or two points if they are all greater than 5 (or less than 0.2) and are statistically significant. If one or more of the effect sizes reported is less than 2, or if the results are not statistically significant, no points are added.
- Strength of recommendation: We already categorise all interventions included in our reviews according to their likely effectiveness. We feel that this categorisation adequately reflects the strength of recommendation.
- Cost-effectiveness assessment: Clinical Evidence does not include data on cost-effectiveness as this varies internationally and over time. Therefore, we have not included cost-effectiveness data in our evaluation of the evidence, or in our categorisations.
The scoring system used for Clinical Evidence reviews
For each comparison, points are initially allocated for the type of evidence. For example, four points are allocated for evidence based on RCTs, and two points for observational evidence. Points are then subtracted or added depending on issues relating to quality, consistency, directness, or effect size, to arrive at a final GRADE score. The scoring system is listed below.
| Type of evidence | ||
| Initial score based on the type of evidence | +4 +2 |
RCTs/ SR of RCTs, +/- other types of evidence Observational evidence (e.g. cohort, case control) |
| Quality | ||
| Based on | Blinding and allocation process Follow-up and withdrawals Sparse data Other methodological concerns (e.g. incomplete reporting, subjective outcomes) |
|
| Score | 0 -1 -2 -3 |
No problems Problem with 1 element Problem with 2 elements Problem with 3 or more elements |
| Consistency | ||
| Based on | Degree of consistency of effect between or within studies |
|
| Score | +1 0 -1 |
Evidence of dose response across or within studies (or inconsistency across studies is explained by a dose response); also up to one point added if adjustment for confounders would have increased the effect size All / most studies show similar results Lack of agreement between studies (e.g. statistical heterogeneity between RCTs, conflicting results) |
| Directness | ||
| Based on | The generaliseability of population and outcomes from each study to our population of interest |
|
| Score | 0 -1 -2 |
Population and outcomes broadly generaliseable Problem with 1 element Problem with 2 or more elements |
| Effect size | ||
| Based on | The reported OR/RR/HR for comparison |
|
| Score | 0 +1 +2 |
Not all effect sizes more than 2 or less than 0.5 and significant; or if OR/RR/HR not significant Effect size more than 2 or less than 0.5 for all studies/meta-analyses included in comparison and significant Effect size more than 5 or less than 0.2 for all studies/meta-analyses included in comparison and significant |
The GRADE score: We used four categories of quality of evidence based on the overall GRADE scores for each comparison: high (four or more points overall), moderate (three points), low (two points), and very low (one or less).
An important point on the interpretation of the Clinical Evidence GRADE scores
The final categorisation of evidence based on the GRADE score above (high, moderate, low, or very low quality) does not necessarily relate to the overall methodological quality of any individual RCT or review. Rather, it relates to the quality of evidence on a specific outcome in our designated population of interest. For example, an individual RCT may include children and adults. If our question in Clinical Evidence is on adults only, we may deduct a directness point for the inclusion of children, as children are outside our specified population of interest (adults only). As a result, the Clinical Evidence GRADE score only relates to the quality of evidence on our chosen outcome of interest in our designated population of interest.
Further detail on how we come to the GRADE score for each comparison
At Clinical Evidence we do not necessarily report on all possible parameters of a study. For example, whether an RCT was single or double blinded, or the precise method of randomisation used. Rather, following a critical appraisal of each study, we selectively highlight any important methodological or other issues which we feel may affect the interpretation of the results or the weight that might be placed on them. These issues may vary in importance from trial to trial. For example, an unblinded assessment of an absolute outcome such as mortality may cause less methodological concern than an unblinded assessment of a subjective outcome such as a patient’s satisfaction with treatment. Hence, it is impossible to be totally comprehensive in listing all the issues which may potentially affect our assessment on evidence quality. However, some examples of issues which may or may not affect our scoring of evidence, with the headings we would report these under, include:
- Quality: sparse data; baseline differences between groups; incomplete reporting of results (for example, no absolute results, no reporting of test statistic); flaws in randomisation; flaws in blinding; flaws in analysis; no intention to treat analysis; inconsistency between interventions; problems with cluster methodology; uncertainty regarding included population; uncertainty about accuracy of diagnosis; inclusion of non-randomised data in an analysis; poor methods in general; poor follow up; uncertainty regarding statistical significance of a result; uncertainty regarding clinical relevance of a result; subjective assessment of outcomes; subgroup analysis; post-hoc analysis.
- Consistency: statistical heterogeneity between studies; conflicting results between studies; lack of agreement between studies; different results for different outcomes; different results for different subgroups; different results for different endpoints; evidence of dose response.
- Directness: recruitment issues decreasing generalisability; narrow included population; restricted population; inclusion of different disease states; inclusion of people outside our group of interest; exclusion of selected participants (for example, non-responders); high or low dose of drugs; use of composite outcomes; use of co-interventions; no direct comparisons between groups; small number of comparisons reported; different inclusion criteria between studies; differences in regimens between studies; unclear measurement of outcomes; unclear definition of outcomes; short follow up; clinical heterogeneity between studies.
- Effect size: whether OR/RR/HR significant; OR/RR/HR more than 2 or less than 0.5 in all studies / meta-analyses and significant; effect size more than 5 or less than 0.2 in all studies or meta-analyses and significant.
- このウェブサイトの利用条件 |
- 購読に関する契約条件 |
- プライバシーポリシー |
- ホーム |
- トップ






