Psychometric properties of the Zephyr bioharness device: a systematic review

Background Technological development and improvements in Wearable Physiological Monitoring devices, have facilitated the wireless and continuous field-based monitoring/capturing of physiologic measures in healthy, clinical or athletic populations. These devices have many applications for prevention and rehabilitation of musculoskeletal disorders, assuming reliable and valid data is collected. The purpose of this study was to appraise the quality and synthesize findings from published studies on psychometric properties of heart rate measurements taken with the Zephyr Bioharness device. Methods We searched the Embase, Medline, PsycInfo, PuMed and Google Scholar databases to identify articles. Articles were appraised for quality using a structured clinical measurement specific appraisal tool. Two raters evaluated the quality and conducted data extraction. We extracted data on the reliability (intra-class correlation coefficients and standard error of measurement) and validity measures (Pearson/Spearman’s correlation coefficients) along with mean differences. Agreement parameters were summarised by the average biases and 95% limits of agreement. Results A total of ten studies were included: quality ratings ranged from 54 to 92%. The intra-class correlation coefficients reported ranged from 0.85–0.98. The construct validity coefficients compared against gold standard calibrations or other commercially used devices, ranged from 0.74–0.99 and 0.67–0.98 respectively. Zephyr Bioharness agreement error ranged from − 4.81 (under-estimation) to 3.00 (over-estimation) beats per minute, with varying 95% limits of agreement, when compared with gold standard measures. Conclusion Good to excellent quality evidence from ten studies suggested that the Zephyr Bioharness device can provide reliable and valid measurements of heart rate across multiple contexts, and that it displayed good agreements vs. gold standard comparators – supporting criterion validity. Electronic supplementary material The online version of this article (10.1186/s13102-018-0094-4) contains supplementary material, which is available to authorized users.


Background
Technological development and improvements in Wearable Physiological Monitoring (WPM) devices, have facilitated the wireless, long range and continuous field-based monitoring/capturing of physiologic measures in healthy, clinical or athletic populations [1][2][3]. Numerous WPM devices have been introduced to the market [4,5] with a range of capabilities and target audiences.
The Zephyr Bioharness ™ (Zephyr Technology Corporation, Annapolis, MD, US) is a wireless chest-based wearable device, capable of real-time and long-distance recording of various physiological parameters, including heart rate, respiratory rate, core temperature, activity levels and posture [6]. The device can capture data for 26 h, includes a BioModule, weighs 85 g and fits on the chest at lower sternum for both men and women [6]. The BioModule is snapped into an adjustable belt. The belt (chest strap) contains skin conductive electrodes to captures heart rate through recording of cardiac electric impulses, and produces an output in beats per minute. Heart rate monitoring offers several advantages. Calculating the percentage of maximum heart rate, is a commonly used approach to monitor exercise intensity. [7]. Among athletes (soccer players), submaximal exercises heart rate monitoring has shown to be highly predictive of improvements in physical performance (i.e. maximal aerobic speed) [8]. In addition, during steady-state exercise, the linear relationship between heart rate and the rate of oxygen consumption has shown to be an effective method to assess training internal load [9]. This linear relationship can also be used to estimate maximal oxygen uptake VO2max [10]. Furthermore, in both trained individuals and athletes, monitoring of heart rate recovery has been suggested as a potential marker to evaluate training status, which is, in turn, used to optimize training programs [11]. It has been proposed that heart rate measures can be used to provide an estimate of energy expenditure, providing an easier and inexpensive alternative [12].
It is important for a device to be reliable (provide consistent scores in stable conditions), valid (provide true scores) and be responsive (to detect change over time) if it is to be used to assess/support performance or decision-making [13,14]. Reliability is measured in both relative and absolute terms. Relative reliabilitya correlation coefficient, comments on the ability of a device to differentiate between participants, whereas absolute reliability emphasizes on the measurement error in the same unit of original measurement [13]. In order for a device to be useful (reliable), its relative reliability needs to be sufficiently large, and absolute reliability sufficiently small [13]. A device can be reliable but not valid [13]. Validity can be assessed in a variety of ways, but ideally is established by comparing devices to an established "gold standard" criterion measure, with criterion validity established when a new device can provide the same measurement as the standard [15]. In addition, neither the reliability nor the validity measurement properties of a device can be used to detect change over time (improvements or deteriorations). Reporting of the responsiveness parameters of a device, deals with its ability to assess change over time [13,14].
Individual measurement studies often address some domains of measurement, but do not provide comprehensive assessments [16][17][18]. Systematic reviews of measurement studies allow for one to understand the measurement properties across a variety of contexts, populations and measurement purposes. By using a structured clinical measurement specific appraisal tool, we are able to focus on higher quality research when synthesizing measurement research [16][17][18].
Considering heart rate and its wide-spread application, and the need to synthesis and provide a comprehensive evidence on the accumulating measurement properties of Zephyr Bioharness device, the aims of this systematic review were to synthesize and critically appraise the measurement studies where a Zephyr Bioharness device was used to measure heart rate.

Search
To identify articles on psychometric properties of Zephyr Bioharness device, we searched the Embase, Medline, PsycInfo, PuMed and Google Scholar databases between January 2010 -January 2017, using the following keywords: Zephyr Bioharness OR ZB) AND (heart rate OR psychometric properties OR measurement properties OR reliability OR minimal detectable change OR validity OR responsiveness OR minimal clinical important difference OR agreement. Further articles were also identified by examining the reference list of each selected study. We were specifically interested in Zephyr Bioharness device, which has been introduced into the market at year 2010, so we limited our search to this year because we did not any expect publications prior to that year.

Selection of studies
At the first stage, two authors independently identified and screened Title/abstract. Studies that had used the device to monitor physiological measures only, without reporting of psychometric properties were considered irrelevant. An article was accepted if it met following specific eligibility criteria: Inclusion Criteria: 1. Purpose of the study states assessing reliability or validity or responsiveness or agreement parameters, of Zephyr Bioharness heart rate variable in healthy or clinical population.

Articles published in English,
Exclusion Criteria:

No data on the psychometric properties of Zephyr
Bioharness heart rate variable. 2. Studies that had used Zephyr Bioharness device to monitor physiological responses only.

Data extraction
The primary author G. N., and secondary author P. B. conducted the data extractions. For reliability measures, Standard Error of Measurement (SEM), intra-class correlation coefficient (ICC), mean differences and confidence intervals were extracted [16][17][18]. These were interpreted using a common benchmark where ICC < 0.40 indicate poor, 0.40 ≤ ICC < 0.75 indicate fair to good and ICC ≥ 0.75 indicate excellent reliability [19]. For construct validity where these devices were compared against a reference standard, Pearson's/Spearman's correlation coefficients and mean difference data were extracted [16][17][18]. The absolute value for the strength of the correlation were determined using the guide suggested by Evans [20] as follows; 0.00-0.19 "very weak", 0.20-0.39 "weak", 0.40-0.59 "moderate", 0.60-0.79 "strong", 0.80-1.00 "very strong". To assess levels of agreement, agreement bias along with 95% Limits of Agreement (LoA) were extracted. This uniquely evaluates whether there is a discrepancy (bias) between two different devices measuring the same construct [21].

Quality appraisal
The articles were appraised by the first (G. N.) and second (P. B.) authors for quality using a structured clinical measurement specific appraisal tool [16][17][18]. This quality tool has previously demonstrated high reliability in evaluating the quality of clinical measurement studies for musculoskeletal outcome measures [18]. The evaluation criteria included: 1) Thorough literature review to define the research question; 2) Specific inclusion/ exclusion criteria; 3) Specific hypotheses; 4) Appropriate scope of psychometric properties; 5) Sample size; 6) Follow-up; 7) The authors referenced specific procedures for administration, scoring, and interpretation of procedures; 8) Measurement techniques were standardized; 9) Data were presented for each hypothesis; 10) Appropriate statistics-point estimates; 11) Appropriate statistical error estimates; and 12) Valid conclusions and recommendations [16][17][18] (Additional file 1). An article's total quality score was calculated by summing of scores for each item, divided by the numbers of items and multiplied by 100% [16][17][18]. Quality summary of appraised papers that ranged from (0%-30%) was marked as Poor, (31%-50%) as Fair, (51%-70%) as Good, (71%-90%) as Very Good, and (> 90%) as Excellent [16,17]. When individual appraisals varied, we used the below consensus procedures:

Results
A total of 147 studies were identified from the search in the databases [Embase (n = 29), Medline (n = 19), Psy-cInfo (n = 1), PubMed (n = 58) and Google Scholar (n = 40)], of which 61 studies were considered relevant. All 61 studies were retrieved and assessed for eligibility, and a total of 10 studies were included in this review (Fig. 1). Table 1 displays the summary of the studies addressing the psychometrics of Zephyr Bioharness device. The quality of the studies ranged from 54 to 92%, with 80% of articles reaching or exceeding a score of 67% on the quality rating ( Fig. 1 & Table 2). The most common flaws noted in were 1) lack of specific hypotheses, 2) not considering an appropriate scope of psychometric properties/ lack of specific inclusion or exclusion criteria, and 3) lack of a sample size calculation/justification.

Zephyr bioharness heart rate reliability
We located four studies that examined the test-retest reliability measures of Zephyr Bioharness [ Table 3] during different physical activities including rest, recovery phases and unstructured mobility; vacuuming and sweeping, and structure running/walking, cycling and submaximal activity [1,2,22,23]. The populations studied included young healthy recreational active males and females across various age groups as well as older patients with atrial fibrillation [1,2,22,23]. Overall, ZB heart rate variable displayed excellent reliability properties. This included a SEM ranging from 2.11-5.90 beats per minute and, excellent test re-test reliability coefficients ≥0.85 [1,2,22,23].

Zephyr bioharness heart rate agreement
We identified two studies that assessed the pair-wise agreement between ZB heart rate measure with Polar T31 device [1,24], and six studies that assessed the pair-wise agreement between ZB heart rate measure with gold standard criterion measure (ECG) [ Table 5] [22,[25][26][27][28][29]. Three studies reported heart rate biases of ≤3.00 beats per minute with (− 3.10-2.42) 95% limits of agreement in pairwise device comparison of ZB at rest, recovery phases or during various activities against ECG [27][28][29]. Furthermore, the inter-device agreement between ZB and Polar T31 heart rate measures yielded agreement biases of ≤3.05 with (− 79.20-79.20) 95% limits of agreement during a treadmill walk/run testing protocols [1,24].

Discussion
After synthesizing ten studies addressing the measurement properties of the Zephyr Bioharness device, we conclude that there is good to excellent quality evidence supporting the reliability and validity of this device. This review suggests that the Zephyr Bioharness device can provide reliable and valid measurements of heart rate across multiple contexts, and that it might be useful for prevention or rehabilitation applications where fieldbased monitoring of heart rate is required in low risk patient populations. The use of the devices in high-risk populations was not studied.
In regards to ZB reliability parameters, four studies were identified [1,2,22,23]. The included studies reported sufficiently large relative reliability scores, and sufficiently small absolute reliability measures. All four identified studies reported excellent ICC ≥ 0.85 (SEM ≤   ZB zephyr bioharness, ICC intra-class correlation coefficient, SEM standard error of measurement, CV coefficient of variation, Mean diff mean difference, 95% C.I. confidence interval 5.90) during various physical activities, and excellent ICC ≥ 0.90 (SEM ≤ 3.51) at resting and recovery phases [1,2,22,23]. Validity coefficients quantify the linear relationship between two measures /devices [15]. However, the coefficients do not provide information regarding the extent of systematic error (lack of agreement) between two devices. Since it is very rare to obtained two identical findings while assessing the same construct using two different devices, reporting of the magnitude of the agreement is necessary [15]. Reporting of individual agreement in terms of 95% limits of agreement (LoA), put forward by Bland and Altman, is important to assess agreement parameters and whether the devices can be used interchangeably [15].
In this review, the validity of ZB heart rate variable against Polar T31 (ZB vs. Polar T31), and against gold standard criterion measure (ZB vs. ECG) yielded similar, strong to very strong correlation coefficients. However, the pairwise agreement parameters between ZB vs. Polar T31 (two studies), and ZB vs. ECG (six studies) varied. The Johnstone et al. [1] and Johnstone et al. [24] studies, were   [29] studies rated at "Good", reported (− 0.21-0.14) and (− 3.01-0.70) 95% LoA between ZB vs ECG respectively. It is important to note that there are no thresholds to help categorized 95% LoA into excellent or poor, however, narrower 95% LoA between ZB vs ECG, is suggestive of better agreement and possible interchangeable use. On the contrary, three studies, Kim et al. [25], Rawstorn et al. [22] and Gatti et al. [26] reported somewhat wider 95% LoA in pair-wise device comparisons between ZB vs ECG. However, these studies had lower methodological scores [22,25,26]. Therefore, studies with higher methodological quality scores that assessed ZB vs ECG agreements, displayed narrower 95% LoA than studies with lower methodological scores. Potential benefits of wearable technologies might include enhanced safety, better targeting of exercise to capability, better motivation and adherence. It might also allow for better progression of exercise interventions. While future studies might need to focus on the validity and utility of these devices in health promotion, monitoring or rehabilitation. The measurement studies to date are supportive of testing such applications.
The findings of our review must be considered in light of potential methodological applications. A variety of critical appraisal tools are available and the classification of quality varies across instruments. The Zephyr BioHarness measures a variety of other physiological indicators other than just heart rate, and we did not assess the reliability or validity of these other measurements. Finally, better measurement is the first step in the clinical process, and the downstream effects of using Zephyr need to be more fully investigated.