Observer Agreement in the Angiographic Assessment of Arteriovenous Malformations of the Brain
Purpose— We aimed to determine intraobserver and interobserver agreement in the characterization of brain arteriovenous malformation (AVM) angioarchitecture on intra-arterial digital subtraction angiograms.
Methods— Five experienced interventional neuroradiologists independently reviewed 40 anonymized angiograms obtained at the time of first-ever AVM diagnosis. The allocation of the films to observers was balanced for AVM size and complexity. Every observer was compared with himself and all the others by distributing the films in 2 batches 3 months apart. The observers used standard forms to collect both quantitative and categorized qualitative angiographic data. To measure agreement we used the kappa statistic (κ) for nominal data, weighted κ for ordinal and discrete interval data, and Bland & Altman analysis for continuous data.
Results— Intraobserver agreement was generally moderate to substantial, with 95% confidence intervals ranging from fair to almost perfect. However, for every characteristic, interobserver agreement was less than intraobserver agreement. Interobserver agreement was generally slight to moderate, with 95% confidence intervals ranging from less than chance to almost perfect.
Conclusion— This study demonstrates the need for robust and generalizeable definitions of AVM angioarchitecture and methods of nidus size measurement—with proof of good intraobserver and interobserver agreement—for future efforts to understand the prognosis and best treatment of AVMs.
The vascular anatomy (angioarchitecture) of brain arteriovenous malformations (AVMs) demonstrated on intra-arterial digital subtraction angiograms is thought to influence their prognosis1 and is therefore often used to decide whether and how to treat an AVM.2,3⇓ However, accurate description of angioarchitecture is inevitably affected by the inherent complexity of AVMs, neuroradiologists’ personal interpretations, and human error. These sources of observer variation are likely to have been compounded by inconsistencies in angioarchitecture terminology used in the literature, varying perceptions of what is abnormal, and—until very recently—the lack of structured definitions intended for widespread use.4
A recent systematic review revealed no published studies on MEDLINE and Embase of observer agreement in the angiographic assessment of AVMs.1 We therefore sought to evaluate observer agreement using an approach that reflected day-to-day practice, minimized bias, and would not impose too great a burden on observers despite being powered adequately.
See Editorial Comment, page 1508
Materials and Methods
To be representative of everyday practice, we used 40 angiograms obtained at the time of first-ever AVM diagnosis from 1 year of a prospective population-based cohort of adults with a first-in-a-lifetime diagnosis of AVM (Scottish Intracranial Vascular Malformation Study [SIVMS], www.dcn.ed.ac.uk/ivm, accessed February 10, 2002). We used identical copies of the entire run of the 4-vessel angiogram that led to the first-ever AVM diagnosis. The angiograms were performed at the 4 Neuroscience Centers in Scotland; they included all the anteroposterior, lateral, and oblique views and vascular territories necessary to visualize the AVM and only omitted normal vascular territories and frames from superselective vascular catheterization performed before embolization. Because of the different facilities for angiography, the spectrum of experience of the radiologists, and the variable resolution of copy films, the 40 angiograms inevitably varied in quality. To explore the influence of film quality, the nonparticipating study neuroradiologists (J.J.B. and R.J.S.) rated the quality of the films before starting the study (12 were “excellent,” 18 were “good,” 8 were “average,” and 2 were “poor” but none were “terrible”). Axial CT or MR imaging from the presentation that led to diagnosis was included to aid localization. Size markers on CT and MR imaging were obscured to force observers to estimate nidus dimensions from the angiograms.
The 5 observers were practicing consultant interventional neuroradiologists in the UK (Table 1). They worked in separate cities in the UK and had never been involved in the management of any of the patients in this study. They interpreted the angiograms without knowledge of the original findings or each other’s results. The observers were not presented with definitions of any of the angioarchitectural features under investigation, and—as often occurs in everyday practice—they used the diameter of the genu of the petrous portion of the internal carotid artery (5 mm) as a reference for sizing a nidus, either using calipers or a customized scale on paper.5
The 40 angiograms were distributed and reviewed in 2 batches between January and May 2001. Every angiogram was reviewed by 2 of the 5 observers for the interobserver study, and 38 of the 40 angiograms were reviewed by the same observer on 2 separate occasions for the intraobserver study. The angiograms were divided among the observers so that each neuroradiologist was allocated a similar spectrum of AVMs according to crude indicators of their nidus diameter (2 large [>6cm], 14 medium, 24 small [<3cm]) and vascular complexity (18 simple [≤2 feeders or draining veins] and 22 complex [>3 feeders or draining veins]).
A standard data collection form was distributed with each angiogram. The following data were collected (forced categories are described in parentheses): depth (deep, superficial); nidus diameter (mm) in each of 3 dimensions (anteroposterior, transverse, and vertical); number of feeding arteries; feeding artery angiopathy (yes, no), and if present whether abnormally dilated, stenosed, or both dilated and stenosed6; angiogenesis (yes, no)3; collateral supply (yes, no), and if present whether dural, pial/leptomeningeal, or both dural and leptomeningeal7; nidus border (compact/diffuse)8; discernible fistula in the nidus (yes, no)9; number of draining veins/nidus compartments10; Spetzler-Martin surgical grade, calculated by an observer summing scores from the scale provided for AVM nidus size, pattern of venous drainage, and eloquence of adjacent brain (although constituent scores on each of these 3 items were not collected)11; venous varices (yes, no)12; venous ectasia (yes, no)12; venous stenosis (yes, no)13; aneurysm(s) (yes, no), and aneurysm type if identified (feeding artery, nidal, pseudo-aneurysm,14 remote).15
All data were double-punched to ensure accuracy of data entry. Because the same observer reviewed some scans on 2 separate occasions for the intraobserver agreement study, only data from the earliest date that an observer reviewed a particular scan were used to evaluate interobserver agreement.
The primary outcomes in this study were observer variation quantified by the kappa statistic (κ) for nominal data (eg, dichotomous yes/no answers),16 the weighted κ statistic for ranked ordinal data (eg, Spetzler-Martin grade) and discrete interval data (eg, number of feeders),17 and Bland & Altman analysis for continuous data (eg, nidus dimensions in millimeters).18,19⇓ Percentage agreement between observers is not a good measure, because—unlike κ (Figure 1)—it does not discriminate between actual agreement and agreement that arises as a result of chance.
The raw data were recoded for some variables to enable a simple assessment of observer agreement using the κ statistic, although we also examined cross tabulations of the raw data and Bland & Altman plots for these variables where appropriate (eg, nidus dimensions, see below). For example, nidus size in each of the 3 dimensions was recoded into a dichotomous variable according to whether an observer found the size to be ≥30 mm or <30 mm (the size threshold for stereotactic radiotherapy). The numbers of feeding arteries and draining veins were recoded as 1, 2, or ≥3 because different radiologists had different thresholds for deeming there to be “multiple” vessels when they were too numerous to count, which made Bland & Altman analysis of these variables impossible.
All analyses were performed in Statistical Product for the Social Sciences (SPSS) version 10.0.5 except confidence intervals for κ, which were calculated using Confidence Interval Analysis software,20 and weighted κ tests and their confidence intervals, which were calculated in Statistical Analysis Software (SAS) version 8.
Sample Size Calculation
The study was designed with 87% power to detect a greater than fair agreement (κ=0.4) at the P=0.05 level of significance, assuming the level of agreement for the characteristics would be substantial (κ=0.7).21
Every film was carefully anonymized to protect patient confidentiality and comply with UK Data Protection legislation. This study is covered under the approval given for SIVMS by the Multicentre Research Ethics Committee for Scotland (MREC/98/0/48).
Complete responses were received from all 5 observers, and the median time between an observer reporting the 2 batches of angiograms was 5 (range, 4 to 6) months. The interpretation of interobserver agreement requires a prior appreciation of intraobserver agreement because between-observer variation is inevitably affected by the extent of within-observer variation.
Figure 2 demonstrates that for every characteristic, intraobserver agreement was greater than interobserver agreement. Intraobserver agreement was moderate to substantial, with 95% confidence intervals ranging from fair to almost perfect. Interobserver agreement was slight to moderate, with 95% confidence intervals ranging from less than chance to almost perfect.
Intraobserver agreement about whether the diameter of an AVM nidus was ≥30 mm or <30 mm ranged from substantial to almost perfect, whereas interobserver agreement was somewhat worse. Plotting the raw, continuous nidus size data on Bland & Altman plots18 reveals a tendency for both intraobserver and interobserver variation to increase as nidus size increases, especially above 20 mm. Using the transverse nidus dimension as an example (Figure 3), the greater scatter about the mean for interobserver as opposed to intraobserver comparisons of raw continuous data explains why κ=0.36 for interobserver comparisons as opposed to κ=0.78 for intraobserver comparisons using categorical data with a 30-mm size threshold (Figure 2).
Interobserver agreement was greatest for characteristics such as determining whether nidus diameter was ≥30 mm or <30 mm (eg, vertical dimension κ=0.62 [95% CI, 0.37 to 0.88]) and whether there were venous varices or not (κ=0.56 [95% CI, 0.31 to 0.81]). In general there was the greatest overall agreement between observers for AVMs with simple angioarchitecture lacking many of the features of interest (Figure 4). Interobserver agreement was worst for characteristics such as venous stenosis (κ=0.14 [95% CI, −0.33 to 0.60]), angiogenesis (κ=0.18 [95% CI, −0.13 to 0.49]), and the type of nidus border (κ=0.22 [95% CI, −0.20 to 0.64])—in other words, AVMs with more complex angioarchitecture (Figure 5). Alarmingly, interobserver agreement was only moderate for the variables with the greatest importance in routine practice: Spetzler-Martin grade, which influences predictions of morbidity from surgery (weighted κ=0.47 [95% CI, 0.30 to 0.64]), and the presence of aneurysms, which are thought to confer a greater risk of subsequent hemorrhage and may also influence management decisions (κ=0.40 [95% CI, 0.11 to 0.68]).
We asked observers to assess film quality with the intention of exploring whether it influenced observer agreement. However, intraobserver agreement about film quality was κ=0.43 (95% CI, 0.20 to 0.67), and interobserver agreement was κ=0.19 (95% CI, −0.04 to 0.42), making stratification of kappas by quality for each angioarchitectural feature susceptible to observer variation in the determination of quality. Moreover, the small number of observations in each quality category resulted in even wider confidence intervals around the kappa estimates, and we found no consistent trend toward a better level of agreement for higher quality films.
This is the first study of its kind. However, there has been an abstract reporting agreement about AVM size and morphology using World Wide Web–based joint photographic expert group (JPEG) format magnetic resonance and angiographic images.22 This study’s response rate was 63%, only 2 of 19 participants were neuroradiologists, and the images used were in a different format from ours.
Our study is a pragmatic effort to understand both intraobserver and interobserver agreement in neuroradiologists’ interpretations of AVM angioarchitecture in day-to-day decision making. The study sample reflected everyday practice by being drawn from a population-based cohort. The angiograms were of adequate quality, although it will always be difficult to reflect the dynamic nature of angiography in hard copy format. Data collection was complete. Our main findings are that there was greater intraobserver than interobserver agreement, and agreement ranged from less than chance to almost perfect. These findings should be regarded as a baseline measure of observer agreement for future studies; the results should be considered in the light of important statistical caveats common to all studies using κ, and they have some implications for routine practice.
Bias, Confounding, and Chance
We avoided bias by using angiograms that the study neuroradiologists had never seen before, by anonymizing films, and by leaving a median of 5 months between an observer re-reviewing the same angiogram (to lessen recognition effects). By distributing angiograms evenly according to image quality and AVM complexity, we sought to minimize confounding. Chance effects were minimized by ensuring the study was adequately powered to detect a difference from only fair agreement (κ=0.4), assuming agreement would be substantial (κ=0.7) for any characteristic. Indeed we confirmed our suspicion that intraobserver agreement was of this order for most angioarchitectural features (Figure 2).
To establish the extent of observer agreement with greater precision, a larger study will be required. This could be achieved by increasing the number of observers and/or angiograms, which would also enable an analysis of the bias of any individual observer (see below). A greater number of angiograms for which there are 2 observer comparisons would certainly narrow the 95% confidence intervals around estimates of κ (Figure 2), but having more than 2 observers per angiogram would complicate the statistical analysis.
Although there are several well-rehearsed caveats to the use of κ,23 and experts debate whether the intraclass correlation coefficient is a better measure, 24 κ is nevertheless the most frequently used index of agreement. κ relies on both the subjects under study and the observers being independent and that the categories in the scale are independent, mutually exclusive, and exhaustive.16 These assumptions held for this study, although varying beliefs about angioarchitecture between research groups (a form of global observer variation) will affect whether readers perceive overlap between categories for some characteristics.
The greater the number of scale categories, the lower κ will inevitably be, so agreement will tend to appear better with a dichotomous scale.23 For example, κ for interobserver agreement falls from 0.29 to 0.19 when subdividing angiopathy into more than yes/no categories, and it falls from 0.41 to 0.30 when subdividing collateral supply into more than yes/no categories. Conversely, using only 3 categories for the numbers of feeding and draining vessels masked variation; we would have used Bland & Altman analysis were it not for different thresholds between observers for declaring vessels “multiple.”
An artifact of κ is that it is affected in complex ways by both the prevalence of abnormality among the subjects used and also by observer bias.23,25⇓ Firstly, most characteristics in this study were unevenly distributed (the most extreme being venous stenosis and nidus border), although the distribution of some features such as aneurysms was more even (Table 2). The nature of our sample means this probably reflects the population distribution of these abnormalities. But when marginal totals are unbalanced, or expected levels of agreement are high because of a high underlying prevalence, κ is fragile and examination of the influences of prevalence effects will be essential when comparing studies.25 Secondly, agreement is only one aspect of variation between observers, the other being biases between them (eg, a tendency for one observer to systematically overestimate nidus size).23 We avoided bias by ensuring each neuroradiologist was observer 1 or observer 2 a comparable number of times in the interobserver study, and each neuroradiologist was equally represented in the intraobserver study.
When there are several ordinal scale categories (eg, Spetzler-Martin grade) between which large disagreements are more serious but would be treated as equally serious by κ, the weighted κ can be used when the relative seriousness of disagreements is specified.17 In this study we allocated weights evenly (eg, weights of 1, 0.75, 0.5, 0.25, and 0 for Spetzler-Martin grades 1 to 5), but whether and how uneven weighting should be used to reflect clinically important thresholds (such as ≤2 and ≥3 on the Spetzler-Martin scale) is debatable.
Implications for Routine Practice
The high level of intraobserver agreement shows that experienced interventional neuroradiologists are consistent, but the poor interobserver agreement shows that assessment and interpretation differ between them. This argues for caution in interpreting prognosis and basing treatment decisions on angioarchitectural features with less than adequate interobserver agreement.
Evaluate Nidus Size With Standardized Calibration Markers
There are several barriers to accurately sizing an AVM on an angiogram, which are reflected by the scatter of size estimates in Figure 3. The definition of a nidus as the area toward which multiple feeding arteries converge and from which enlarged veins drain is somewhat arbitrary.26 This is especially problematic when nidus morphology is diffuse (Figure 5)8 and when the AVM is a simple fistula (Figure 4, which elicited nidus dimensions of 0 to 20 mm in this study). The nidus is often not imaged in its entirety when catheterizing single vascular territories during angiography, making the maximum linear diameter in any dimension difficult to gauge.27 It is therefore hard to imagine how proposed nidus volume calculations,27,28⇓ dependent on further assumptions about the shape of the nidus, can be accurate. Moreover, magnification or minification by both film projection and digital subtraction imaging distorts the abnormal vessels. Without reference markers, the widespread use of the diameter of the genu of the petrous portion of the internal carotid artery—as in this study—can be inaccurate; for example, it is larger than 5 mm if feeding ipsilateral, large, high-flow brain AVMs. Potential solutions to these sources of measurement error include consistent angiographic magnification factors, widespread use of simple calibration markers such as rulers, washers, and coins (which were not in frequent use a decade ago29 and do not seem to be now), and magnification/minification rulers.30 The routine use of standardized calibration markers on angiograms is therefore essential to reduce variation in which patients are considered—and then further evaluated with stereotactic sizing31—for stereotactic radiotherapy.
Perform Angiography That Is Adequate to Characterize Subtle Angioarchitecture
The reliable characterization of angioarchitectural features of interest usually requires imaging of suspected abnormalities in at least 2 planes (eg, identification of aneurysms and venous stenosis).4 This should be the required standard of routine clinical practice. Superselective angiography further assists expert neuroradiologists in their interpretation of arterial supply (eg, the number of feeders) and the nidus (eg, whether there is a single fistula within it). But the balance between the risks and benefits of the routine use of superselective angiography has not been established, and there are no data on whether it carries greater risks than routine catheter angiography.
The results of this study argue that the further development of acceptable radiological definitions and future studies of observer agreement should be prioritized in certain areas. Agreement about nidus size should be re-evaluated when standardized calibration markers are in widespread use because of its importance in determining who is eligible for stereotactic radiotherapy.
The Spetzler-Martin grading system is in widespread use to predict morbidity from surgical excision, but the observed levels of intraobserver (weighted κ=0.63 [95% CI, 0.48 to 0.79]) and interobserver (weighted κ=0.47 [95% CI, 0.30 to 0.64]) agreement are cause for some concern. In addition to nidus size, the other 2 components of the Spetzler-Martin grading system (eloquence of adjacent brain and pattern of venous drainage) should be explored. It will be interesting to discover which of the 3 components is most responsible for less than perfect observer agreement in the use of this scale. In this study, observers assumed left hemisphere dominance for the determination of eloquence. To assist with determination of eloquence, future studies might benefit from a brief clinical history accompanying each angiogram. However, this benefit might be offset by indirectly encouraging overinterpretation of some angioarchitectural features in cases with a more severe presentation, thereby introducing bias.
Since a worse prognosis for the first occurrence of hemorrhage seems to be conferred by the identification of aneurysms in conjunction with unruptured AVMs1 (especially those in the nidus, not thought to be pseudoaneurysms14), efforts should be made to understand why interobserver agreement about the very presence of aneurysms was only κ=0.40 (95% CI, 0.11 to 0.68). This is likely to be only partly explained by neuroradiologists in this study reviewing hard copies of the angiograms, rather than performing them, and superselective studies not being available. It will be important to further evaluate agreement about the presence of aneurysms (distinct from infundibula), their number, their locations, and whether they should be treated.
In this study individual observers’ thresholds for declaring feeding or draining vessels multiple seemed to differ. Clearly, the ability to correctly define feeding vessel anatomy is to some extent dependent on the use of superselective angiography—neuroradiologists do occasionally discover unsuspected feeders only at the time of embolization. Therefore, the only way for us to interpret agreement about feeding and draining vessel anatomy was to group the raw, continuous data about numbers of vessels into 3 simple ordinal categories (1, 2, or ≥3 vessels), although some authors have reservations about this statistical approach.24 Although there are no clear data about the absolute number of feeding or draining vessels carrying particular prognostic importance, future studies might profit from assessing the agreement about the original vascular territories of feeding vessels, because this may determine outcome.32
Particular emphasis should be placed on developing internationally agreed-upon definitions4 for the characteristics above and for those features that had the greatest interobserver variation in this study (angiopathy, angiogenesis, collateral supply, nidus border, discernible fistula, and venous stenosis). Thereafter, using our study as a baseline measure of observer agreement before to publication of the Joint Writing Group’s definitions,4 it will be important to reassess observer agreement in larger studies, among neuroradiologists from different countries, and using emerging techniques (such as MR and digital angiography, image manipulation, and reconstruction). Furthermore, research workers have a similar need for clear definitions of clinical events affecting people with AVMs. These should focus on the nature of the clinical event (eg, transient, persistent and progressive focal neurological deficits in the absence of a seizure or hemorrhage), its timing in relation to the inception point of first presentation (eg, first occurrence, first recurrence, second recurrence), and whether it is attributable to an AVM or not (degree of certainty).
In conclusion, it is important to further develop internationally agreed-upon, generalizeable definitions of AVM clinical features, angioarchitecture, and methods of nidus size measurement and prove they have good intraobserver and interobserver agreement. These are necessary for future efforts to understand the prognosis and best treatment of AVMs.
The members of the AVM Observer Agreement Study Group were as follows: Andy Clifton, Department of Neuroradiology, Atkinson Morley’s Hospital, London; Anil Gholkar, Neuro X ray, Newcastle General Hospital, Newcastle-upon-Tyne; Shawn Halpin, Department of Radiology, University of Wales College of Medicine, Cardiff; John Millar, Department of Radiology, Wessex Neurological Centre, Southampton General Hospital, Southampton; and Andy Molyneux, Department of Radiology, Radcliffe Infirmary, Oxford.
Rustam Al-Shahi was funded by a United Kingdom Medical Research Council clinical training fellowship. This study was supported by the Chief Scientist Office of the Scottish Executive Health Department (CZB/4/35), the Stroke Association (TSA/04/01), and a Small Project Grant from the University of Edinburgh. We are very grateful to our collaborators and those participating in the Scottish Intracranial Vascular Malformation Study (SIVMS) who made this study possible.
- Received December 10, 2001.
- Revision received February 11, 2002.
- Accepted March 11, 2002.
- ↵Al-Shahi R, Warlow C. A systematic review of the frequency and prognosis of arteriovenous malformations of the brain in adults. Brain. 2001; 124: 1900–1926.
- ↵Joint Writing Group of the Technology Assessment Committee American Society of Interventional and Therapeutic Neuroradiology; Joint Section on Cerebrovascular Neurosurgery a Section of the American Association of Neurological Surgeons and Congress of Neurological Surgeons; Section of Stroke and the Section of Interventional Neurology of the American Academy of Neurology. Reporting terminology for brain arteriovenous malformation clinical and radiographic features for use in clinical trials. Stroke. 2001; 32: 1430–1442.
- ↵Pile-Spellman J, Baker KF, Liszczak TM. High-flow angiopathy: cerebral blood vessel changes in experimental chronic arteriovenous fistula. AJNR Am J Neuroradiol. 1986; 7: 811–815.
- ↵Garcia-Monaco R, Rodesch G, Alvarez H, Iizuka Y, Hui F, Lasjaunias P. Pseudoaneurysms within ruptured intracranial arteriovenous malformations: diagnosis and early endovascular management. AJNR Am J Neuroradiol. 1993; 14: 315–321.
- ↵Altman DG, Machin D, Bryant TN, Gardner MJ, eds. Statistics with Confidence: Confidence Intervals and Statistical Guidelines. 2nd ed. London: BMJ Books; 2000.
- ↵Stapf C, Hofmeister C, Pile-Spellman J, Young WL, Mohr JP. The feasibility of an Internet web-based, international study on brain arteriovenous malformations (the AVM world study). Stroke. 2000; 31: 322.Abstract.
- ↵Brennan P, Silman A. Statistical methods for assessing observer variability in clinical measures. BMJ. 1992; 304: 1491–1494.
- ↵Maclure M, Willett WC. Misinterpretation and misuse of the kappa statistic. Am J Epidemiol. 1987; 126: 161–169.
- ↵Doppman JL. The nidus concept of spinal cord arteriovenous malformations: a surgical recommendation based upon angiographic observations. Br J Radiol. 1971; 44: 758–763.
- ↵Forbes G, Fox AJ, Huston J III, Wiebers DO, Torner J. Interobserver variability in angiographic measurement and morphologic characterization of intracranial aneurysms: a report from the International Study of Unruptured Intracranial Aneurysms. AJNR Am J Neuroradiol. 1996; 17: 1407–1415.
- ↵Elisevich K, Cunningham IA, Assis L. Size estimation and magnification error in radiographic imaging: implications for classification of arteriovenous malformations. AJNR Am J Neuroradiol. 1995; 16: 531–538.
- ↵Stapf C, Mohr JP, Sciacca RR, Hartmann A, Aagaard BD, Pile-Spellman J, Mast H. Incident hemorrhage risk of brain arteriovenous malformations located in the arterial borderzones. Stroke. 2000; 31: 2365–2368.
- ↵Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology. A Basic Science for Clinical Medicine. 2nd ed. Boston, Mass: Little, Brown and Company; 1991.
Reliability of Angiographic Assessment of Brain Arteriovenous Malformations
Management of brain arteriovenous malformations (AVM) continues to evolve as evidence-based strategies strengthen clinical practice. With careful analyses of observational data derived from tertiary referral centers, information is emerging to create guidelines for prognostication and treatment of brain AVM. A 2001 American Heart Association scientific statement on brain AVM management highlights the need for further studies, not only to address end points such as subsequent hemorrhage and functional status but also to improve risk prediction models and validate them in different populations.1
In the attempt to meet these needs, we must also be mindful of the limitations of currently available brain AVM data. Most of the data have been derived from referral populations. There are currently scant data on brain AVM patients from well-defined populations.2,3⇓ Furthermore, the quality of the measures we use to describe brain AVM is unknown. In our efforts to create prediction rules, it is imperative that we first establish the integrity of those variables involved, so that we may produce optimal rules. It is in this setting that the importance of properly evaluating the quality of brain AVM data becomes evident. Once we know that those brain AVM variables can be measured reliably, we may then begin to assess their clinical value by examining whether they are associated with conditions or outcomes of interest.
Catheter angiography provides data on morphology and hemodynamics, which are likely to be important for evaluating natural history and are critical in defining treatment options. Little is known about the reliability of brain AVM data derived from angiography. Aside from a preliminary report,4 Al-Shahi and colleagues have conducted the only comprehensive study on the reliability of angiographic data in brain AVM patients. Reliability indicates the consistency of repeated measurements. It encompasses agreement across different reviewers and reproducibility over time. This study evaluated interobserver agreement among 5 interventional radiologist reviewers, as well as intraobserver agreement within each individual reviewer from repeated reviews over a 3-month period, using 40 anonymized angiograms. This study is noteworthy for its methodology. The distribution of cases was balanced for brain AVM size and complexity. Standardized forms were used in data collection, although standardized definitions were not provided. The authors used sophisticated measures to assess agreement in the analyses.
Interobserver agreement ranged from 14% to 62% and was generally worse for characteristics associated with more complex angioarchitecture of the brain AVM. Of notable concern was low interobserver agreement involving the measurement of Spetzler-Martin grade (47%) and the presence of aneurysms (40%), 2 variables that arguably bear the greatest importance in patient management decisions. However, it would be premature at this time to consider these estimates as evidence of the low reliability inherent in the measurement of these variables. Some modifications in study methodology may allow for a better measure of data reliability.
In future studies, the use of real-time high-quality angiographic digital images or animation will eventually supplant printed static images for communicating angiographic information. A reliability estimate obtained using high-quality angiographic media may be of apparently limited generalizability because it may not reflect the quality of angiograms seen in routine practice. However, this approach is nonetheless necessary to determine the extent to which reliability can be improved by increased angiographic quality and whether this improvement is significant enough to be of clinical importance. In addition, the use of standardized definitions and scales5,6⇓ and stratification by angioarchitectural complexity would improve reliability.
Al-Shahi and colleagues should be commended for their efforts. The article underscores the importance of measurement issues in brain AVM studies. Once it is determined that brain AVM variables can be measured with a high degree of reliability, associations with clinically significant variables should be more obvious, and prediction models and guidelines should improve. This information will ultimately help to construct ethical and efficient clinical trials to compare treatment modalities.
The AVM Observer Agreement Study Group participants are listed in the Appendix.
- ↵Ogilvy CS, Stieg PE, Awad I, Brown RD, Jr, Kondziolka D, Rosenwasser R, Young WL, Hademenos G. Recommendations for the management of intracranial arteriovenous malformations: a statement for healthcare professionals from a special writing group of the Stroke Council, American Stroke Association. Circulation. 2001; 103: 2644–2657.
- ↵Brown RD Jr, Wiebers DO, Torner JC, O’Fallon WM. Incidence and prevalence of intracranial vascular malformations in Olmsted County, Minnesota, 1965 to 1992. Neurology. 1996; 46: 949–952.
- ↵Stapf C, Hofmeister C, Mast H, Pile-Spellman J, Young WL, Mohr JP. The feasibility of an internet web-based, international study on brain arteriovenous malformations (The AVM World Study). Stroke. 2000; 31: 322.Abstract.
- ↵Joint Writing Group of the Technology Assessment Committee, American Society of Interventional and Therapeutic Neuroradiology; Joint Section on Cerebrovascular Neurosurgery, a section of American Association of Neurological Surgeons and Congress of Neurological Surgeons; and Section of Stroke and the Section of Interventional Neurology of the American Academy of Neurology. Reporting terminology for brain arteriovenous malformation: clinical and radiographic features for use in clinical trials. Stroke. 2001; 32: 1430–1442.
- ↵Imbesi SG, Knox K, Kerber CW. Reproducibility analysis of a new objective method for measuring arteriovenous malformation nidus size at angiography. AJNR Am J Neuroradiol. 2002; 23: 412–415.