- Research article
- Open Open Peer Review
Psychometric properties of the Zephyr bioharness device: a systematic review
BMC Sports Science, Medicine and Rehabilitationvolume 10, Article number: 6 (2018)
Technological development and improvements in Wearable Physiological Monitoring devices, have facilitated the wireless and continuous field-based monitoring/capturing of physiologic measures in healthy, clinical or athletic populations. These devices have many applications for prevention and rehabilitation of musculoskeletal disorders, assuming reliable and valid data is collected. The purpose of this study was to appraise the quality and synthesize findings from published studies on psychometric properties of heart rate measurements taken with the Zephyr Bioharness device.
We searched the Embase, Medline, PsycInfo, PuMed and Google Scholar databases to identify articles. Articles were appraised for quality using a structured clinical measurement specific appraisal tool. Two raters evaluated the quality and conducted data extraction. We extracted data on the reliability (intra-class correlation coefficients and standard error of measurement) and validity measures (Pearson/Spearman’s correlation coefficients) along with mean differences. Agreement parameters were summarised by the average biases and 95% limits of agreement.
A total of ten studies were included: quality ratings ranged from 54 to 92%. The intra-class correlation coefficients reported ranged from 0.85–0.98. The construct validity coefficients compared against gold standard calibrations or other commercially used devices, ranged from 0.74–0.99 and 0.67–0.98 respectively. Zephyr Bioharness agreement error ranged from − 4.81 (under-estimation) to 3.00 (over-estimation) beats per minute, with varying 95% limits of agreement, when compared with gold standard measures.
Good to excellent quality evidence from ten studies suggested that the Zephyr Bioharness device can provide reliable and valid measurements of heart rate across multiple contexts, and that it displayed good agreements vs. gold standard comparators – supporting criterion validity.
Technological development and improvements in Wearable Physiological Monitoring (WPM) devices, have facilitated the wireless, long range and continuous field-based monitoring/capturing of physiologic measures in healthy, clinical or athletic populations [1,2,3]. Numerous WPM devices have been introduced to the market [4, 5] with a range of capabilities and target audiences.
The Zephyr Bioharness ™ (Zephyr Technology Corporation, Annapolis, MD, US) is a wireless chest-based wearable device, capable of real-time and long-distance recording of various physiological parameters, including heart rate, respiratory rate, core temperature, activity levels and posture . The device can capture data for 26 h, includes a BioModule, weighs 85 g and fits on the chest at lower sternum for both men and women . The BioModule is snapped into an adjustable belt. The belt (chest strap) contains skin conductive electrodes to captures heart rate through recording of cardiac electric impulses, and produces an output in beats per minute. Heart rate monitoring offers several advantages. Calculating the percentage of maximum heart rate, is a commonly used approach to monitor exercise intensity. . Among athletes (soccer players), submaximal exercises heart rate monitoring has shown to be highly predictive of improvements in physical performance (i.e. maximal aerobic speed) . In addition, during steady-state exercise, the linear relationship between heart rate and the rate of oxygen consumption has shown to be an effective method to assess training internal load . This linear relationship can also be used to estimate maximal oxygen uptake VO2max . Furthermore, in both trained individuals and athletes, monitoring of heart rate recovery has been suggested as a potential marker to evaluate training status, which is, in turn, used to optimize training programs . It has been proposed that heart rate measures can be used to provide an estimate of energy expenditure, providing an easier and inexpensive alternative .
It is important for a device to be reliable (provide consistent scores in stable conditions), valid (provide true scores) and be responsive (to detect change over time) if it is to be used to assess/support performance or decision-making [13, 14]. Reliability is measured in both relative and absolute terms. Relative reliability – a correlation coefficient, comments on the ability of a device to differentiate between participants, whereas absolute reliability emphasizes on the measurement error in the same unit of original measurement . In order for a device to be useful (reliable), its relative reliability needs to be sufficiently large, and absolute reliability sufficiently small . A device can be reliable but not valid . Validity can be assessed in a variety of ways, but ideally is established by comparing devices to an established “gold standard” criterion measure, with criterion validity established when a new device can provide the same measurement as the standard . In addition, neither the reliability nor the validity measurement properties of a device can be used to detect change over time (improvements or deteriorations). Reporting of the responsiveness parameters of a device, deals with its ability to assess change over time [13, 14].
Individual measurement studies often address some domains of measurement, but do not provide comprehensive assessments [16,17,18]. Systematic reviews of measurement studies allow for one to understand the measurement properties across a variety of contexts, populations and measurement purposes. By using a structured clinical measurement specific appraisal tool, we are able to focus on higher quality research when synthesizing measurement research [16,17,18].
Considering heart rate and its wide-spread application, and the need to synthesis and provide a comprehensive evidence on the accumulating measurement properties of Zephyr Bioharness device, the aims of this systematic review were to synthesize and critically appraise the measurement studies where a Zephyr Bioharness device was used to measure heart rate.
To identify articles on psychometric properties of Zephyr Bioharness device, we searched the Embase, Medline, PsycInfo, PuMed and Google Scholar databases between January 2010 – January 2017, using the following keywords: Zephyr Bioharness OR ZB) AND (heart rate OR psychometric properties OR measurement properties OR reliability OR minimal detectable change OR validity OR responsiveness OR minimal clinical important difference OR agreement. Further articles were also identified by examining the reference list of each selected study. We were specifically interested in Zephyr Bioharness device, which has been introduced into the market at year 2010, so we limited our search to this year because we did not any expect publications prior to that year.
Selection of studies
At the first stage, two authors independently identified and screened Title/abstract. Studies that had used the device to monitor physiological measures only, without reporting of psychometric properties were considered irrelevant. An article was accepted if it met following specific eligibility criteria:
Purpose of the study states assessing reliability or validity or responsiveness or agreement parameters, of Zephyr Bioharness heart rate variable in healthy or clinical population.
Articles published in English,
No data on the psychometric properties of Zephyr Bioharness heart rate variable.
Studies that had used Zephyr Bioharness device to monitor physiological responses only.
The primary author G. N., and secondary author P. B. conducted the data extractions. For reliability measures, Standard Error of Measurement (SEM), intra-class correlation coefficient (ICC), mean differences and confidence intervals were extracted [16,17,18]. These were interpreted using a common benchmark where ICC < 0.40 indicate poor, 0.40 ≤ ICC < 0.75 indicate fair to good and ICC ≥ 0.75 indicate excellent reliability . For construct validity where these devices were compared against a reference standard, Pearson’s/Spearman’s correlation coefficients and mean difference data were extracted [16,17,18]. The absolute value for the strength of the correlation were determined using the guide suggested by Evans  as follows; 0.00–0.19 “very weak”, 0.20–0.39 “weak”, 0.40–0.59 “moderate”, 0.60–0.79 “strong”, 0.80–1.00 “very strong”. To assess levels of agreement, agreement bias along with 95% Limits of Agreement (LoA) were extracted. This uniquely evaluates whether there is a discrepancy (bias) between two different devices measuring the same construct .
The articles were appraised by the first (G. N.) and second (P. B.) authors for quality using a structured clinical measurement specific appraisal tool [16,17,18]. This quality tool has previously demonstrated high reliability in evaluating the quality of clinical measurement studies for musculoskeletal outcome measures . The evaluation criteria included: 1) Thorough literature review to define the research question; 2) Specific inclusion/exclusion criteria; 3) Specific hypotheses; 4) Appropriate scope of psychometric properties; 5) Sample size; 6) Follow-up; 7) The authors referenced specific procedures for administration, scoring, and interpretation of procedures; 8) Measurement techniques were standardized; 9) Data were presented for each hypothesis; 10) Appropriate statistics-point estimates; 11) Appropriate statistical error estimates; and 12) Valid conclusions and recommendations [16,17,18] (Additional file 1). An article’s total quality score was calculated by summing of scores for each item, divided by the numbers of items and multiplied by 100% [16,17,18]. Quality summary of appraised papers that ranged from (0%–30%) was marked as Poor, (31%–50%) as Fair, (51%–70%) as Good, (71%–90%) as Very Good, and (> 90%) as Excellent [16, 17]. When individual appraisals varied, we used the below consensus procedures:
We first identified the variations in individual appraisals.
To resolve scoring discrepancies based on factual content, the original articles were reassessed.
To resolve scoring discrepancies based on extent of compliance with the item, the raters (G.N.) and (P.B.) consulted the third author/instrument developer (J.M.).
The first and the second authors further discussed their understanding of how well articles complied with each item of the appraisal tool. The most common source of score discrepancies were oversight.
A total of 147 studies were identified from the search in the databases [Embase (n = 29), Medline (n = 19), PsycInfo (n = 1), PubMed (n = 58) and Google Scholar (n = 40)], of which 61 studies were considered relevant. All 61 studies were retrieved and assessed for eligibility, and a total of 10 studies were included in this review (Fig. 1). Table 1 displays the summary of the studies addressing the psychometrics of Zephyr Bioharness device. The quality of the studies ranged from 54 to 92%, with 80% of articles reaching or exceeding a score of 67% on the quality rating (Fig. 1 & Table 2). The most common flaws noted in were 1) lack of specific hypotheses, 2) not considering an appropriate scope of psychometric properties/ lack of specific inclusion or exclusion criteria, and 3) lack of a sample size calculation/justification.
Zephyr bioharness heart rate reliability
We located four studies that examined the test-retest reliability measures of Zephyr Bioharness [Table 3] during different physical activities including rest, recovery phases and unstructured mobility; vacuuming and sweeping, and structure running/walking, cycling and submaximal activity [1, 2, 22, 23]. The populations studied included young healthy recreational active males and females across various age groups as well as older patients with atrial fibrillation [1, 2, 22, 23].
Overall, ZB heart rate variable displayed excellent reliability properties. This included a SEM ranging from 2.11–5.90 beats per minute and, excellent test re-test reliability coefficients ≥0.85 [1, 2, 22, 23].
Zephyr bioharness heart rate validity
We identified two studies that assessed the validity of ZB heart rate variable against other commercially used devices (Polar T31) [1, 24], and six studies that assessed validity against gold standard criterion measure (ECG) [Table 4] [22, 25,26,27,28,29]. Validity measures were established at resting, physical activity, and recovery phases, including both healthy recreational active males and females, as well as older patients with atrial fibrillation [1, 22, 24,25,26,27,28,29].
In summary, the ZB displayed strong to very strong correlations of ≥0.67 during physical activity phases when compared with Polar T31 device [1, 24]. In addition, the device demonstrated very strong correlations of ≥0.87 at rest [25,26,27], strong to very strong correlations of ≥0.74 during various activities [22, 25,26,27,28,29] and very strong correlations of ≥0.99 throughout recovery  when compared with a gold standard criterion measure (ECG).
Zephyr bioharness heart rate agreement
We identified two studies that assessed the pair-wise agreement between ZB heart rate measure with Polar T31 device [1, 24], and six studies that assessed the pair-wise agreement between ZB heart rate measure with gold standard criterion measure (ECG) [Table 5] [22, 25,26,27,28,29].
Three studies reported heart rate biases of ≤3.00 beats per minute with (− 3.10–2.42) 95% limits of agreement in pairwise device comparison of ZB at rest, recovery phases or during various activities against ECG [27,28,29]. Furthermore, the inter-device agreement between ZB and Polar T31 heart rate measures yielded agreement biases of ≤3.05 with (− 79.20–79.20) 95% limits of agreement during a treadmill walk/run testing protocols [1, 24].
Overall, ZB heart rate variable displayed better agreements (i.e. narrower limits of agreement) with ECG than with Polar T31 device, supporting criterion validity and suggestive of possible interchangeable use [1, 22, 24,25,26,27,28,29].
After synthesizing ten studies addressing the measurement properties of the Zephyr Bioharness device, we conclude that there is good to excellent quality evidence supporting the reliability and validity of this device. This review suggests that the Zephyr Bioharness device can provide reliable and valid measurements of heart rate across multiple contexts, and that it might be useful for prevention or rehabilitation applications where field-based monitoring of heart rate is required in low risk patient populations. The use of the devices in high-risk populations was not studied.
In regards to ZB reliability parameters, four studies were identified [1, 2, 22, 23]. The included studies reported sufficiently large relative reliability scores, and sufficiently small absolute reliability measures. All four identified studies reported excellent ICC ≥ 0.85 (SEM ≤ 5.90) during various physical activities, and excellent ICC ≥ 0.90 (SEM ≤ 3.51) at resting and recovery phases [1, 2, 22, 23].
Validity coefficients quantify the linear relationship between two measures /devices . However, the coefficients do not provide information regarding the extent of systematic error (lack of agreement) between two devices. Since it is very rare to obtained two identical findings while assessing the same construct using two different devices, reporting of the magnitude of the agreement is necessary . Reporting of individual agreement in terms of 95% limits of agreement (LoA), put forward by Bland and Altman, is important to assess agreement parameters and whether the devices can be used interchangeably . In this review, the validity of ZB heart rate variable against Polar T31 (ZB vs. Polar T31), and against gold standard criterion measure (ZB vs. ECG) yielded similar, strong to very strong correlation coefficients. However, the pair-wise agreement parameters between ZB vs. Polar T31 (two studies), and ZB vs. ECG (six studies) varied. The Johnstone et al.  and Johnstone et al.  studies, were both rated at “Very good”. However, agreement did not establish ZB agreement parameters against a gold standard criterion – ECG; instead compared ZB against Polar T31, nor provided any literature on the measurement properties of Polar T31 [1, 24]. Both studies reported wide 95% LoA. The lack of comparison against a gold standard criterion measure, and paucity of reports on the measurement properties of Polar T31, could have contributed to such wide 95% LoA. In regards to ZB vs ECG comparisons, Flanagan et al.  study rated at “Very good”, reported (− 2.84–2.42) 95% LoA. Similarly, Dolezal et al.  and Smith et al.  studies rated at “Good”, reported (− 0.21–0.14) and (− 3.01–0.70) 95% LoA between ZB vs ECG respectively. It is important to note that there are no thresholds to help categorized 95% LoA into excellent or poor, however, narrower 95% LoA between ZB vs ECG, is suggestive of better agreement and possible interchangeable use. On the contrary, three studies, Kim et al. , Rawstorn et al.  and Gatti et al.  reported somewhat wider 95% LoA in pair-wise device comparisons between ZB vs ECG. However, these studies had lower methodological scores [22, 25, 26]. Therefore, studies with higher methodological quality scores that assessed ZB vs ECG agreements, displayed narrower 95% LoA than studies with lower methodological scores.
Potential benefits of wearable technologies might include enhanced safety, better targeting of exercise to capability, better motivation and adherence. It might also allow for better progression of exercise interventions. While future studies might need to focus on the validity and utility of these devices in health promotion, monitoring or rehabilitation. The measurement studies to date are supportive of testing such applications.
The findings of our review must be considered in light of potential methodological applications. A variety of critical appraisal tools are available and the classification of quality varies across instruments. The Zephyr BioHarness measures a variety of other physiological indicators other than just heart rate, and we did not assess the reliability or validity of these other measurements. Finally, better measurement is the first step in the clinical process, and the downstream effects of using Zephyr need to be more fully investigated.
Good to excellent quality evidence from ten studies suggested that the Zephyr Bioharness device can provide reliable and valid measurements of heart rate across multiple contexts, and that it displayed good agreements with gold standard measurements.
Clinically important difference
Intra-class correlation coefficient
Limits of agreement
Minimal clinically important difference
Minimal important difference
Pearson’s correlation coefficients
- rs :
Spearman’s correlation coefficients
Standard error of measurement
Wearable physiological monitoring
Johnstone JA, Ford PA, Hughes G, Watson T, ACS M, Garrett AT. Field based reliability and validity of the bioharness™ multivariable monitoring device. J Sports Sci Med. 2012;11:643–52.
Johnstone JA, Ford PA, Hughes G, Watson T, Garrett AT. Bioharness™ multivariable monitoring device: part II: reliability. J Sports Sci Med. 2012b;11(3):409–17.
Bianchi W, Freyer-Dugas A, Hsieh YH, Saheed M, Hill P, Lindauer C, Terzis A, Rothman RE. Revitalizing a vital sign: improving detection of Tachypnea at primary triage. Ann Emerg Med. 2013;61(1):37–43.
Yang C, Hsu Y. A review of Accelerometry-based wearable motion detectors for physical activity monitoring. Sensors. 2010;10:7772–88.
Collier R, Randolph AB. Wearable Technologies for Healthcare Innovation. Hilton Head Island: Proceedings of the Southern Association for Information Systems Conference; 2015.
Zephyr Technology. BioHarness 3.0 User ManualBioHarness 3.0 User Manual. Retrieved from 2012. https://www.zephyranywhere.com/resources/omnisense-software. Accessed Feb 2018.
Borresen J, Lambert MI. Quantifying training load: a comparison of subjective and objective methods. Int J Sports Physiol Perform. 2008;3:16–30.
Buchheit M, Simpson MB, Al Haddad H, Bourdon PC, Mendez-Villanueva A. Monitoring changes in physical performance with heart rate measures in young soccer players. Eur J Appl Physiol. 2012;112:711–23.
Hopkins WG. Quantification of training in competitive sports. Methods and applications. Sports Med. 1991;12:161–83.
Astrand PO, Ryhming I. Nomogram for calculations of aerobic capacity from pulse rate during submaximal work. J Appl Physiol. 1954;7:218.
Daanen HAM, Lamberts RP, Kallen VL, Jin A, Van Meeteren NLU. A systematic review on heart-rate recovery to monitor changes in training status in athletes. Int J Sports Physiol Perform. 2012;7:251–60.
Brage S, Westgate K, Franks PW, Stegle O, Wright A, Ekelund U, Wareham NJ. Estimation of free-living energy expenditure by heart rate and movement sensing: a doubly-Labelled water study. PLoS One. 2015;10(9):e0137206.
Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. 2nd ed. New York: Oxford University Press; 1995.
De Bruin AF, Diederiks JPM, De Witte LP, Stevens FCJ, Philipsen H. Assessing the responsiveness of a functional status measure: the sickness impact profile versus the SIP68. J Clin Epidemiol. 1997;50(5):529–40.
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;327(8476):307–10.
MacDermid JC. Critical appraisal of study quality for psychometric articles, evaluation form. In: Law M, MacDermid JC, editors. Evidence-based rehabilitation. Thorofare: Slack Inc; 2008. p. 387–8.
MacDermid JC. Critical appraisal of study quality for psychometric articles, interpretation guide. In: Law M, MacDermid JC, editors. EvidenceBased rehabilitation. Thorofare: Slack Inc; 2008. p. 389–92.
Roy JS, Desmeules F, MacDermid JC. Psychometric properties of presenteeism scales for musculoskeletal disorders: a systematic review. J Rehabil Med. 2011;43(1):23–31(9).
Rosner B. Fundamentals of biostatistics. 6th ed. Boston: Duxbury Press; 2005.
Evans JD. Straightforward statistics for the behavioral sciences. Pacific Grove: Brooks/Cole Publishing; 1996.
Bunce C. Correlation, agreement, and bland–Altman analysis: statistical analysis of method comparison studies. Am J Ophthalmol. 2009;148(1):4–6.
Rawstorn JC, Gant N, Warren I, Doughty RN, Lever N, Poppe KK, Maddison R. Measurement and data transmission: validity of a multi-biosensor system for real-time remote exercise monitoring among cardiac patients. JMIR Rehabil Assist Technol. 2015;2(1):e2.
Nazari G, MacDermid JC, Kathryn SE, Richardson J, Tang A. Reliability of Zephyr bioharness and Fitbit charge measures of heart rate and activity at rest, during the modified Canadian aerobic fitness test and recovery. J Strength Cond Res. 2017; https://doi.org/10.1519/JSC.0000000000001842.
Johnstone JA, Ford PA, Hughes G, Watson T, Garrett AT. BioharnessTM multivariable monitoring device: part I: validity. J Sports Sci Med. 2012a;11(3):400–8.
Kim JH, Roberge R, Powell JB, Shafer AB, Williams WJ. Measurement accuracy of heart rate and respiratory rate during graded exercise and sustained exercise in the heat using the Zephyr bioharness TM. Int J Sports Med. 2013;34:497–501.
Gatti UC, Schneider S, Migliaccio GC. Physiological condition monitoring of construction workers. Autom Constr. 2014;44:227–33.
Flanagan SD, Comstock BA, Dupont WH, Sterczala AR, Looney DP, Dombrowski DH, et al. Concurrent validity of the armour39 heart rate monitor strap. J Strength Cond Res. 2014;28(3):870–3.
Dolezal BA, Boland DM, Carney J, Abrazado M, Smith DL, Cooper CB. Validation of heart rate derived from a physiological status monitor-embedded compression shirt against criterion ECG. J Occup Environ Hyg. 2014;11:833–9.
Smith DL, Haller JM, Dolezal BA, Cooper CB, Fehling PC. Evaluation of a wearable physiological status monitor during simulated fire fighting activities. J Occup Environ Hyg. 2014;11:427–33.
This work was supported by an operating grant from the Ontario Ministry of Labour - Grant Number #13-R-027. The funding body had no role in design, in the collection, analysis, and interpretation of data, in the writing of the manuscript, or in the decision to submit the manuscript for publication.
No funding was obtained for this study.
Availability of data and materials
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Description of data: Quality Appraisal of a Clinical Measurement Study Tool and Interpretation Guide. (DOCX 38 kb)