A systematic review investigating measurement properties of physiological tests in rugby

Background This systematic review was conducted with the first objective aimed at providing an overview of the physiological characteristics commonly evaluated in rugby and the corresponding tests used to measure each construct. Secondly, the measurement properties of all identified tests per physiological construct were evaluated with the ultimate purpose of identifying tests with strongest level of evidence per construct. Methods The review was conducted in two stages. In all stages, electronic databases of EBSCOhost, Medline and Scopus were searched for full-text articles. Stage 1 included studies examining physiological characteristics in rugby. Stage 2 included studies evaluating measurement properties of all tests identified in Stage 1 either in rugby or related sports such as Australian Rules football and Soccer. Two independent reviewers screened relevant articles from titles and abstracts for both stages. Results Seventy studies met the inclusion criteria for Stage 1. The studies described 63 tests assessing speed (8), agility/change of direction speed (7), upper-body muscular endurance (8), upper-body muscular power (6), upper-body muscular strength (5), anaerobic endurance (4), maximal aerobic power (4), lower-body muscular power (3), prolonged high-intensity intermittent running ability/endurance (5), lower-body muscular strength (5), repeated high-intensity exercise performance (3), repeated-sprint ability (2), repeated-effort ability (1), maximal aerobic speed (1) and abdominal endurance (1). Stage 2 identified 20 studies describing measurement properties of 21 different tests. Only moderate evidence was found for the reliability of the 30–15 Intermittent Fitness. There was limited evidence found for the reliability and/or validity of 5 m, 10 m, 20 m speed tests, 505 test, modified 505 test, L run test, Sergeant Jump test and bench press repetitions-to-fatigue tests. There was no information from high-quality studies on the measurement properties of all the other tests identified in stage 1. Conclusion A number of physiological characteristics are evaluated in rugby. Each physiological construct has multiple tests for measurement. However, there is paucity of information on measurement properties from high-quality studies for the tests. This raises questions about the usefulness and applicability of these tests in rugby and creates a need for high-quality future studies evaluating measurement properties of these physiological tests. Trial registrations PROSPERO CRD 42015029747. Electronic supplementary material The online version of this article (10.1186/s13102-017-0081-1) contains supplementary material, which is available to authorized users.


Background
Rugby (either rugby union or league) is a popular sport played professionally or otherwise at both junior and senior levels worldwide [1]. It is generally considered a physical sport characterised by multiple high-intensity activities interspersed with low-intensity activities [2][3][4][5]. The players engage in physically demanding contests such as tackles, rucks and mauls with the primary objective of gaining possession of the ball [6]. These contests require players to possess a wide range of physiological characteristics such as strength, power and endurance which allows them to be stronger and fatigue-resistant [7][8][9][10].
There are numerous studies in the literature that have provided scientific evidence on the physiological characteristics of rugby players. This has been necessitated by the drive to understand the physiological factors that differentiate between playing levels (talent identification) and the physiological characteristics associated with optimal performance [1,2,7,[10][11][12][13][14][15][16][17][18]. For example, Gabbett and Seibold [15] postulated that lower body power, upper-body strength-endurance, and prolonged highintensity intermittent running ability discriminated players for team selection in semi-professional rugby league (RL) players. Smart et al. [17] found correlations between speed, repeated-sprint ability and game performance statistics such as tackle breaks and tries scored in rugby union (RU). Furthermore, Till et al. [18] compared longitudinal changes in physical qualities with career attainment status and found that advanced physical qualities such as absolute strength during the adolescence period contributed significantly to the attainment of professional status in rugby. All these findings suggest an important relationship between physiological characteristics and future career success, physical performance and team selection [15,17,18].
Today, physiological profiling of rugby players has become an integral aspect of the contemporary sport of rugby. It allows coaches to determine "competent" players with enhanced physiological capacities to withstand the high-intensity demands of the sport and can win trophies for team, club or country [6,7]. This forms the hallmark of talent identification programmes. Secondly, understanding the physiological qualities needed in the sport of rugby may specifically inform training development practices of future professional players [18]. With the surge in physiological profiling, proliferation of talent identification and development programmes for young rugby players [18], there is need for identification and use of physical tests with known measurement properties (reliability, validity and responsiveness). A scoping review of the literature showed that there are multiple tests available for measuring the same physiological characteristic. For example, agility is a fundamental physiological characteristic required for optimal performance by rugby players. The construct has been evaluated using different tests such as 'L' run, Illinois agility run test, agility 505 test, modified 505 test and change of direction speed test in the literature [6,10,16,[18][19][20][21][22]. In an attempt to understand the basis of selecting tests, it may be important to have an overview of all the tests that measures a specific physiological construct and evaluate systematically the measurement properties of the identified tests in an attempt to identify test(s) with the strongest level of evidence per construct. Possibly, this information can help us understand the reasons for selection of particular tests for the measurement of a specific physiological characteristic in terms of measurement properties. To our knowledge, there is no systematic review that has been conducted to provide such information. Therefore, this systematic review was conducted with the aim of addressing the following research questions: 1. What physiological characteristics of rugby players are evaluated in the literature and which tests are used to measure each identified characteristic? 2. What is known about the measurement properties (reliability, validity and responsiveness) of each identified physiological test in the sport of rugby? If there is no information on the measurement properties for each test in rugby, is there any evidence available from other closely-related intermittent, collision team sports to rugby such as Australian Rules football, American football or Soccer? In case of multiple tests measuring the same construct, which test(s) has the strongest level of evidence in terms of the measurement properties?

Stage 1: Methods
This systematic review was registered on PROSPERO with the registration number CRD 42015029747 [21]. This review paper was organised in stages. Stage 1 presents an overview of the physiological characteristics commonly evaluated in rugby and the corresponding tests. Stage 2 presents an overview on the measurement properties of the identified physiological tests. Each stage was written in accordance with the Preferred Reporting Items for Systematic review and Meta-analyses (PRISMA) guidelines by Moher et al. [23].

Literature search
A literature search was conducted using the following databases: Scopus, Medline via EBSCOhost and via PubMed, Academic Search Premier via EBSCOhost, CINAHL (Cumulative Index of Nursing and Allied Health) via EBSCOhost and Africa-Wide Information via EBSCOhost. The review included studies published in the last 20 years between January 1, 1995, and December 31, 2016. Additionally, a hand search was also conducted on reference lists of selected articles to augment the literature.
Selection criteria for the studies Sports context There are two major variants of rugby, namely, RU and RL. Although RU differs significantly from RL in team sizes, scoring and in certain situations of tackling and when the ball goes out, there are striking similarities in game duration, field size, player positions, and goal posts [24]. There are also similarities in the physical demands and physiological responses elicited during game play as both sports are predominantly aerobic in nature interspersed with high-intensity efforts [5,24]. The objective in both is to get the ball over the opposition's goal line by carrying, passing, kicking and grounding the ball. Therefore, because of the resemblance we included studies on RU and RL. However, studies on the sport of rugby "sevens" were excluded.

Physiological characteristics
Rugby requires a blend of physiological characteristics for players to cope with demands of the game [1]. The studies included had to report on at least one physiological characteristic operationally defined as measures that assess speed, repeated-sprint ability, prolonged high-intensity intermittent running ability, agility, muscular strength, power and endurance and maximal aerobic capacity. In addition, for studies to be included they had to report the name of the test used to measure the physiological construct and include a detailed, reproducible description of the test procedure. There was no restriction in study design applied during study selection. However, editorials, book chapters, poster and oral conference abstracts, unpublished theses, dissertations, and case studies were excluded. Studies published in non-English language were also excluded.

Participants
Since rugby is played competitively at junior and senior levels worldwide, studies included in this review had to involve male rugby participants from the age of 10 years and above (adolescents to adults) from any country. Studies involving rugby participants living with disabilities were excluded.

Search strategy
The search strategy was developed in consultation with an expert librarian in systematic reviews from University of Cape Town (UCT) libraries. The search strategy (see Additional file 1 designed for Medline via PubMed) consisted of a combination of the following search themes connected with the Boolean terms AND: i. Construct-related general search terms: physical characteristics OR physiological characteristics. ii. Construct-related specific search terms: speed OR agility OR flexibility. iii. Target population-related search terms: adult OR adolescent OR youth. iv. Sport-related search terms: rugby OR rugby union OR rugby league.

Selection of articles
The selection process was conducted stepwise based on recommendations for performing systematic reviews by van Tulder et al. [25] and Reimers et al. [26]. The first author (MC) ran the search strategy across all databases. Two reviewers (JD and EB) independently reviewed the search results in two steps. The first step involved applying the inclusion criteria to select potentially relevant articles from titles. The abstracts of studies with titles considered relevant were retrieved for further inspection in the second step [26]. Provided that the abstract fulfilled the eligibility criteria or had insufficient information for a selection decision to be made, both reviewers retrieved the full text to further assess for eligibility [26]. Initially, disagreements among reviewers were discussed among themselves at the end of the selection process. In the case of further disagreements, a third (TM) reviewer intervened until a mutual consensus was reached. In addition, all retrieved articles were then reviewed again against the inclusion criteria by the lead investigator (MC).

Data extraction
Data extraction was performed by two independent people (TM and JD). Extracted data was documented onto a Microsoft Excel data extraction form. The following data were captured for the first objective: publication details of the study (first author, year of publication), the name(s) of the physiological characteristic examined in the study (captured as originally described by the authors) and the name of corresponding test(s) as described in the study used to measure the physiological characteristics. To enable the description of studies, additional information on sport contexts, age of participants, country, target population, study design and sample size were also extracted. The primary author (MC) acted as the data verifier, assessing the exhaustiveness and accuracy of data extracted from the included articles. Discrepancies in data extracted identified by the verifier were communicated to the two data extractors and disagreements resolved by mutual consensus.

Results: Stage 1
Since Stage 1 results were used to inform the methods and selection criteria for studies in the second stage of the systematic review, results for Stage 1 were presented here. The electronic searches revealed 23,976 studies and after initial selection based on abstract and title, 1909 studies were potentially eligible (Fig. 1). After fulltext evaluation, 70 studies were included. The majority of the studies did not meet the inclusion criteria because they did not report on physiological characteristics (Fig. 1).

Description of included studies
The general characteristics of the 70 included studies are shown in Table 1 [24,29].
Physiological characteristics and the corresponding tests Table 2 provides an overview of physiological characteristics, corresponding tests used to measure each construct in rugby and the absolute number of studies that used a specific physiological test. This review identified 15 physiological characteristics commonly evaluated among rugby players. These include speed, repeatedsprint and effort ability, repeated high-intensity exercise performance, prolonged high-intensity intermittent running ability/endurance, anaerobic endurance, maximal aerobic power and speed, agility, lower-body muscular power and strength, upper-body muscular strength and power, upper-body muscular endurance and abdominal endurance. However, there were no studies evaluating muscle flexibility of the rugby players that met the inclusion criteria. The majority of these physiological characteristics had multiple tests for measurement. Overall, the 70 studies included in the review described 63 physiological tests: speed (8), upper-body muscular endurance (8), agility/ change of direction speed (7), upper-body muscular power (6), upper-body muscular strength (5), prolonged high-intensity intermittent running ability/endurance (5), lower-body muscular strength (5), anaerobic   endurance (4), maximal aerobic power (4), lower-body muscular power (3), repeated high-intensity exercise performance (3), repeated-sprint ability (2), repeated-effort ability (1), maximal aerobic speed (1) and abdominal endurance (1). Table 3 summarises the procedures for administering each physiological test identified.  Bench press repetitions-to-fatigue at 60% 1RM [81] 1RM Bench press repetitions-to-fatigue at 60 kg   Maximal aerobic speed (MAS), Anaerobic speed reserve (ASR) [53,59] Anaerobic endurance Triple 120 m shuttle (T120S) test Players perform 3 sets of 120 m shuttle sequences.
Time taken to complete the 120 m shuttle, maximum heart rate, blood lactate, rating of perceived exertion [70] Wingate 60 (w60) cycle test Each player will perform a 60s all out maximal effort on a cycle ergometer according to the Wingate protocol.
Maximal heart rate, blood lactate, rating of perceived exertion [70] 300 m shuttle run test Players sprint maximally between two lines, 15 times, for a total distance of 300 m.
Total time to complete the run (s) [51] 400 m sprint test (Metabolic Fitness Index for Team Sports) Players run maximally an entire lap of the track for 400 m.
Time to complete the run (s) [42] Agility/change of direction speed  Illinois Agility test Players start lying in prone on the starting line.
On a signal the players stand up and accelerate towards and around the cones set up. They can sprint for 9 m return to the starting line; they swerve in and out of the four cones completing two 9 m sprints to finish the agility course.  Lower body muscular strength One repetition maximum back squat (1RM BS) Using an Olympic bar and free weights, players back squat until the top of the thigh is parallel with the ground and return to a standing position to record one repetition maximum.
Maximum weight lifted (kgs) [5,17,18,38,55,56,69,77,80] Isometric squat on force plate Players stand on a force plate with the bar of a Smith Machine resting on upper trapezius at a height which results in an angle of 135 degrees knee flexion.
Peak force generated (n) [75] 1 RM box squat Players use a self-selected foot position and lower themselves to sitting position briefly on the box and then return to standing position One repetition maximum (kgs) [13,42] 3RM full squat exercise Players perform this with the free weight Olympic-style barbell. Players lower their body until thighs are past parallel with the floor and fully extend the hip and knee joints Maximum weight lifted (kgs) [15] Upper body muscular strength One repetition maximum bench press (1RM BP) Players in supine, feet flat on floor, hips and shoulders in contact with the bench, lower the bar to touch the chest and push the bar until the elbows are locked out.
Maximum weight lifted (kg) [5,7,17,27,38,42,55,56,58,69,78,80] 3RM bench press The test is performed as above at three repetition maximum Maximum weight lifted (kg) [15,60] 1RM chin up test Players use a reverse underhand grip (palms facing towards face). Players instructed to start from a stationary position with arms fully extended and complete a repetition with the chin moving over the bar One repetition maximum (kgs) [17,42] Push-Up test Players begin in prone, with hands on the floor, thumbs shoulder width apart and elbows fully extended. Players are instructed to descend to the tester fist placed on the floor below the players' sternum and then ascend until the elbows are straight.
The number of push-ups in one minute (n) [27] 1RM Prone row Participants lay face down on a bench with the bench height determined by the players reach when the arms are fully extended. Participants have to pull the barbell towards the bench and Maximum weight lifted (kg) [18]  Time taken to complete 20 full push ups (s) [36] 20s chin up test Players assume a hanging position on the bar, hands shoulder width apart with supinated grip and arms extended. Players are to raise the body until the chin touched the top of the bar with the head in neutral position.
Maximum number of chin-ups in 20 s [36] Overhead ball throw test Players stand with 1 ft aligned with the a line marked on the ground facing the throwing direction, with a 3 kg medicine ball held in both hands behind the head, each player is required to plant the front foot with the toe behind the line and to throw the medicine ball overhead as far as possible.
Maximum distance thrown (m) [73] Chest throw test Players throw a 2 kg medicine ball horizontally as far as possible while seated with the back against the wall Maximum distance thrown (m) [41, 43-48, 57, 66] Bench throw test Players use a self-selected hand position and lower the bar to a self-selected depth approximately 90 degrees at the elbow and then throw or propel the bar vertically as explosively as possible.
Maximum weight thrown (kgs) [13] Upper body muscular endurance 60s push up test Players assume prone position, body lowered until the elbows are 90 degrees, followed by a return to the starting position with arms fully extended.
Maximum number of push-ups in 60s [36] 60s chin up test Players assume a hanging position on the bar, hands shoulder width apart with supinated grip and arms extended. Players are to raise the body until the chin touched the top of the bar with the head in neutral position.
Maximum number of chin ups in 60s [36] Bench Press repetitions-tofatigue (BP RTF) Players perform bench press repetitions as possible till fatigue at two markedly different resistances of 60-kgs and 102.5-kgs Number of repetitions (n) [81] Bench press repetitions-tofatigue at 60% 1RM Players perform bench press repetitions as possible till fatigue with a resistance of 60% of their one repetition maximum bench press Number of repetitions at 60% 1RM BP [81] Pull up test Using an underhand grip, and the hands 10-15 cm apart, players start in the hanging position and ascended to a position with the Maximal number of completed pull-ups [7]  Body mass bench press with repetition Using players body mass as resistance for as many repetitions as possible until fatigue Number of repetitions (n) [15] 30s Plyometric push up Repeated sprint and effort ability There were seven (10.0%) studies that evaluated repeatedsprint abilities of rugby players. However, only two tests were commonly used in these studies to evaluate the construct. The Repeated 20 m Sprint test was used in five of the seven studies [16,29,[49][50][51]. The test involves players performing 10 or 12 maximal effort sprints over a 20 m distance with each sprint performed on a 20 or 30s cycle [16,29,[49][50][51]. In addition, there were two studies that evaluated the repeated sprint abilities of rugby participants using the Rugby-Specific Repeated Speed (RS 2 ) test [17,52]. The Repeated-Effort Ability test was used in one study to investigate the physiological characteristic of repeated-effort ability in rugby players [51]. The protocol comprises of 12 × 20 m sprints and tackles with each sprint commencing every 20s and the tackle performed after each 20 m sprint [51].

Repeated high-intensity exercise performance
The ability to perform repeated high-intensity exercises by rugby players was assessed using specifically developed Repeated High-Intensity Exercise (RHIE) tests. Three tests were used in a study by Austin et al. [24] and were modified for RU backline players, RU forward players and RL forward players.

Maximal aerobic power and speed
Of the 70 studies, 32 ( [53,59]. The test involves performing 30s shuttle runs conducted at a pace governed by a pre-recorded beep and interspersed with 15 s periods of passive recovery. The test begins at 8 km/h and increased to 0.5 km/h at each successive running shuttle [53].

Anaerobic endurance
Three (4.28%) studies assessed the anaerobic endurance of rugby players. One study compared results of rugby players on two tests of anaerobic endurance: Triple 120 m (T120S) test and the Wingate 60 (w60) cycle test [70]. Other tests used in singular studies included the 300 m Shuttle Run test [71] and the 400 m Sprint test [42].

Change of direction speed/agility
The change of direction speed/agility of rugby players was commonly measured in a number of studies. It was the third most commonly measured physiological characteristic in the included studies. In total, 33 [18, 38, 39, 41, 43-48, 53, 55-57, 59, 60, 62, 63, 66, 67, 69, 75, 76]. The difference in the two vertical jump tests is that the CMJ involves participants standing with their hands positioned on the hips and usually jump from a jump mat as high as possible [18]. The Jump Squat (JS) test was used in five studies [13,75,[77][78][79].
Of the 70 studies, 14 (20.0%) assessed lower-body muscular strength of rugby players. The most frequently used test was the One Repetition Maximum Back Squat (1RM BS). The test was used in nine of the fourteen studies [5,17,18,38,55,56,69,77,80]. Using an Olympic bar or free weights, players are instructed to back squat until the top of the thigh is parallel with the ground and return to a standing position to record 1RM [5,17,38,55,56,69,77,80]. In addition, two studies used the 1RM Box Squat [13,42] and 3RM Back Squat [15,60], respectively.

Upper-body and abdominal muscular endurance
Of the included studies, upper body muscular endurance was assessed in five studies only (7.14%). One singular study utilised two tests: 60s Push-Up and Chin-Up tests [36]. Another study used the 1RM Bench Press Repetitions-to-Fatigue test at 60 kg, 102.5 kg and at 60% of 1RM [81]. Other tests used in singular studies included the Pull-Up test [7] and the body mass Bench Press with repetition test [15] and the 30s Plyometric push-up test [58]. Abdominal endurance was identified in one study and was assessed using the 60s Sit-Up test [58].

Stage 2: Methods
Stage 1 allowed us to identify tests commonly used for the measurement of physiological characteristics of speed, repeated sprint ability and effort, repeated highintensity exercise performance, prolonged high-intensity intermittent running ability/endurance, maximal aerobic power and speed, anaerobic endurance, change of direction speed/agility, lower and upper -body muscular strength, power, and abdominal endurance. Briefly, the second stage of the systematic review was conducted to provide evidence on the measurement properties of each identified physiological test from Stage 1. The ultimate aim, however, was to identify one physiological test per physiological construct with the strongest level of evidence on measurement properties on best evidence synthesis.

Literature search, search strategy and eligibility criteria
The electronic databases used for literature search in Stage 1 were used for Stage 2. Initially, we searched specifically for full-text studies with the primary purpose of investigating the measurement properties (reliability, validity and responsiveness) of the previously identified physiological tests in male rugby participants. This was done for the determination of physiological tests validated in the population of interest to the researcher (MC) for his future studies using rugby participants [21,82]. However, provided that there was no satisfactory information found on the measurement properties for certain physiological tests in rugby studies, it was pre-planned that we would search for the evidence from clinimetric studies on related, intermittent, collision team sports such as Australian Rules football (AFL), American football, Gaelic football and Soccer. But, included studies from related sports had to have a similar description of the procedure of the test as described in rugby-related studies. In cases where there were major adjustments according to the researcher (MC) in the procedure of test between sports such studies were excluded. A search strategy proposed by Terwee et al. [83] guided the selection of keywords (see Additional file 2). The strategy for searching clinimetric studies in rugby and related sports consisted of a combination of following search themes (i, ii, iii, iv) and (i, ii, iv, v), respectively, connected with the Boolean term AND: i.

Data extraction
The selection process of the identified articles was conducted as described previously in stage 1. Subsequently, data extraction was conducted using two independent people (SO and TM). All the data extracted was put on Microsoft Excel and was given to two other independent assessors (JD and TM) for further verification purposes on the accuracy of the data. The following data were extracted: publication details (first author, year of publication), title, purpose of the study, age of the participants, country, sport context, physiological construct evaluated, test(s) used to measure the construct, and the measurement properties assessed (reliability, validity and responsiveness). For the measurement properties, the following data were extracted: type of reliability or validity, interval period for test-retest and inter-rater studies, sample size and the results obtained for each physiological test.

Quality assessment of the clinimetric studies and measurement properties
The Consensus-based Standards for the Selection of health Measurement Instruments (COSMIN) checklist was used to evaluate the methodological quality of the included studies. Briefly, the COSMIN evaluates nine measurement property items (internal consistency, reliability, measurement error, content validity, construct validity (i.e. structural validity, hypothesis testing, crosscultural validity), criterion validity and responsiveness) ( Table 4). It also provides standardised information for evaluating the quality of each item based on design requirements and statistical methods [84,85]. The COS-MIN scoring system per measurement property is based on a point rating scale (poor to excellent) and the overall rating for the methodological quality of each study is obtained by taking the lowest score [83,84]. Two reviewers (JD and TM) with prior COSMIN experience evaluated the methodological quality of each study included in Stage 2. It was pre-planned that disagreements were resolved by discussion with the third person (CT) until a consensus was reached. In addition to the methodological quality assessment with the COS-MIN, the quality criteria for rating of measurement properties checklist as given by Terwee et al. [86] was used to rate each measurement property in the included articles as 'positive' , 'negative' or 'questionable' depending on the results of the property reported (Table 4). Studies with "poor" methodological qualities were not analysed for the quality of the results on the measurement properties.

Best evidence synthesis: levels of evidence
To help synthesise results from numerous studies on the same physiological construct, the "best evidence synthesis" was performed by the primary author (MC). The best evidence synthesis rating was determined based on the number of studies that have investigated the measurement property, the overall COSMIN score, and the rating and consistency of the measurement property result (positive, indeterminate, and negative) [87]. The possible levels of evidence are "strong" (when consistent findings in multiple studies of good methodological quality were found or in one excellent methodological quality study), "moderate" (when consistent findings in multiple studies of fair methodological quality were found or in one study of good methodological study), "limited" (if only one study of fair methodological quality was found), "conflicting" (conflicting findings) and "unknown" (if only studies of poor methodological quality were found or no studies) [87].

Results: Stage 2
Characteristics of included studies Figure 2 shows a flow chart for the selection of the studies. Of 824 studies identified from the electronic databases, 20 met the inclusion criteria. The majority of the studies did not meet the inclusion criteria because they did not report on measurement properties. The general characteristics of the included studies and a summary of the measurement properties evaluated in each study are summarised in Table 5. The studies were conducted in Australia (n = 9), Denmark, Brazil, Belgium (n = 2), Norway, Ireland, Iran, Italy and Croatia (n = 1). The age of the participants in the included studies ranged from 12 to 36 years.
Out of the 63 tests identified in stage 1, 20 studies described the measurement properties of only 21 tests. The tests were the 5 m, 10 m, 20 m and 30 m Speed tests (speed), 20 m Repeated-Sprint test (repeated sprinting ability), Repeated-Effort test (repeated effort ability), three Repeated High-Intensity Exercise tests (repeated high-intensity exercise performance), Yo-Yo IRT1 and 2 (prolonged high-intensity running ability), T120 s (anaerobic endurance), 505 test (agility), Modified 505 test (agility), L run (agility), Change of Direction Speed test (agility), Sergeant Jump test (lower-body muscular power), and three Bench Press Repetition-to-Fatigue tests (upper-body strength-endurance).
Of the 21 tests, 18 were studied for their measurement properties in rugby. The Yo-Yo Intermittent Recovery Level 1 and 2 and the Sergeant Jump tests had their measurement properties derived from other related sports (Soccer and Australian Rules football). Other than the tests mentioned above, there was no evidence on the measurement properties either in rugby or related sports

Construct validity
The extent to which scores on a particular questionnaire relate to other measures in a manner that is consistent with theoretically derived hypotheses concerning the concepts that are being measured (+) Specific hypotheses were formulated AND at least 75% of the results are in accordance with these hypotheses; (?)Doubtful design or method (e.g., no hypotheses); (−) Less than 75% of hypotheses were confirmed, despite adequate design and methods; (0) No information found on construct validity.
Criterion validity (predictive or concurrent The extent to which scores on a particular questionnaire relate to a gold standard C (+) correlation with standard ≥0.70 OR no statistically significant differences between the two tests found OR sensitivity and specificity ≥0.70 OR convincing arguments that gold standard is "gold" AND correlation with gold standard >0.70; (?)No convincing arguments that gold standard is "gold" OR doubtful design or method; (−) Correlation with standard <0.70 or AUC < 0.70 OR statistically significant differences between outcome measures and gold standard OR sensitivity or specificity <0.70

Responsiveness
The ability of a questionnaire to detect clinically important changes over time for all the other tests identified in stage 1. However, for the 21 tests identified in stage 2, none of the tests had all the measurement properties investigated. But, the majority of the studies (n = 7) investigated the reliability and validity of one or more physiological tests [6,19,74,[88][89][90][91].

Measurement properties and methodological quality assessments
Tables 6 and 7 provide an overview of the measurement properties for the identified physiological tests and the COSMIN rating of methodological quality for the studies per measurement property. Table 8 shows rating Floor and ceiling effects The number of respondents who achieved the lowest or highest possible score (+) ≤ 15% of the respondents achieved the highest or lowest possible score (?) Doubtful design or method (−) > 15% achieved the highest and lowest possible score despite adequate designs and methods (0) No information found on interpretation

Interpretability
The degree to which one can assign qualitative meaning to quantitative scores (+) Mean and SD scores presented of at least four relevant subgroups of patients and MIC defined; (?) Doubtful design or method OR less than four sub groups OR no MIC defined; (0) No information found on interpretation.       of the quality of the results on the measurement properties based on the quality rating criteria of measurement properties checklist given by Terwee et al. [86]. The results on the measurement properties for the physiological tests derived from studies of "poor" methodological quality were excluded from the rating.

Yo-Yo intermittent recovery level 1 (Yo-Yo IR1) test
Of the 20 studies included in the review, seven investigated at least one measurement property of the Yo-Yo IR1 test (Table 5). Validity was the most commonly studied measurement property with six studies evaluating at least one type of validity [88,89,[92][93][94][95]. There was evidence on known-group validity [88,92,93], convergent [89,94,95] and criterion validity [89] of the Yo-Yo IR1 test. However, all the six studies were rated "poor" on methodological quality mainly because of the inadequate sample sizes used in the validity analysis. Reliability was the second most commonly studied measurement property with four studies evaluating test-rest reliability (Table 5) [88,89,94,96]. The testretest intervals ranged from within one week to eight days [88,89,94,96]. On methodological quality, all the studies investigating the reliability of the Yo-Yo IR1 were rated "poor". In all these studies, the sample size had the lowest score and therefore determined the total score for the study. Another measurement property investigated for the Yo-Yo IR1 was responsiveness. However, responsiveness of the Yo-Yo IR1 test was reported in two studies of "poor" methodological quality [94,95].

Yo-Yo intermittent recovery level 2 (Yo-Yo IR2) test
Of the 20 studies included in the review, four studies provided evidence on at least one measurement property of the Yo-Yo IR2 test (Table 5) [91,94,97,98]. Validity and reliability were the most commonly studied measurement properties of the test [91,94,97,98]. Three studies evaluated the test-retest reliability of the Yo-Yo IR2 with a seven day interval between the assessments [91,94,98]. However, all the three studies were rated "poor" on methodological quality mainly because of small sample sizes used for the reliability analysis. On the other hand, there were four studies that investigated the validity of the Yo-Yo IR2 test (Table 5) [91,94,97,98]. Two studies provided evidence on convergent [94,97] and criterion [97,98] validity of the Yo-Yo IR2 test. In addition, singular studies investigated the known-group validity [97] and concurrent validity of the test [91]. All the studies were,       (Tables 6 and 8) [19]. The same study provided evidence on the construct validity of the test (Table 7). A positive rating for the known-group validity was found for the 5 m sprint test as specific hypotheses were formulated and at least 75% of the results were in accordance with these hypotheses (Table 8). There was no evidence on the responsiveness found for the test.
10 m sprint test Three different studies investigated the measurement properties of the 10 m sprint test (Table 5) [6,19,55]. Reliability was the most commonly studied measurement property. All the three studies had test-retest reliability evidence for the 10 m sprint test, with an interval of two to seven days between the assessments [6,19,99]. However, two of the studies were rated "poor" on methodological quality [6,99]. In one "fair" study, a positive rating for the test-retest reliability (ICC = 0.87) of the 10 m sprint test was found [19]. Validity of the 10 m sprint test was assessed in two studies [6,19]. The most common type of validity studied was construct validity (known-group validity). One study was rated as "poor" on methodological quality [6]. In that study, a positive rating of construct validity was found for the 10 m sprint test. There was no evidence found on the responsiveness of the test.
20 m sprint test Only one "fair" study investigated the measurement properties (reliability and validity) of the 20 m sprint test (Table 5) [19]. The 20 m sprint test was found to have positive rating for the testretest reliability (Tables 6 and 8) [19]. The same study provided evidence on the construct validity of the test (Table 7). A positive rating for the known-group validity was found for the 20 m sprint test as specific hypotheses were formulated and at least 75% of the results were in accordance with these hypotheses (Table 8). There was no evidence on the responsiveness for the test.
30 m sprint test Test-retest reliability evidence of the 30 m sprint test was provided by one study rated "poor" on methodological quality [6]. The study used a sample size of 11 participants to establish the reliability of the test with three days between the test-retest assessments.
In the same study, the 30 m sprint test was also assessed for its known-group validity [6]. However, the study was also rated "poor" on quality for the construct validity.
There was no evidence found on the responsiveness of the test.

Repeated-sprint ability (RSA) test
One study assessed the test-retest reliability of repeated sprint ability test with assessments being conducted after seven days (Tables 5 and 6) [51]. The study was rated of "poor" methodological quality mainly because of small sample size used in the reliability analysis. There was no evidence on validity or responsiveness found for the test.

Repeated-effort ability (REA) test
One study assessed the test-retest reliability of repeatedeffort ability test with assessments being conducted after seven days [51]. The study was rated of "poor" methodological quality mainly because of small sample size used in the reliability analysis. There was no evidence on validity found for the test.

Repeated high-intensity exercise (RHIE) tests
One study evaluated the test-retest reliability of three different repeated high-intensity exercise tests, namely, the repeated high-intensity exercise backs test, repeated high-intensity exercise rugby union forward test, and the repeated high-intensity exercise rugby league forward test [24]. The quality of the study was, however, rated "poor" mainly because of the small sample size per reliability analysis utilised for each test. There was no information on the validity or responsiveness of any of these tests in the literature.

30-15 intermittent fitness test (30-15 IFT )
One study assessed the test-retest reliability of the 30-15 Intermittent Fitness test with nine days separating the two assessments [68]. For the measure of reliability for the primary outcome of maximal intermittent running velocity (V IFT ), the study was rated as of "good" methodological quality. A positive rating (ICC = 0.89) for the test-retest reliability was reported for the test. Validity of the test was assessed in one study (Tables 5  and 7) [95]. The study was, however, rated "poor" on quality for the convergent validity of the 30-15 Intermittent Fitness test [95].
Triple 120-m shuttle test (T120S) One study examined the test-retest reliability of the Triple 120 m shuttle test for anaerobic endurance using a four day interval between assessments [70]. On the other hand, the same study evaluated the criterion validity of the test against the Wingate 60s (W60) cycle test. The study used a small sample size of 12 rugby league players both for the reliability and the validity study and was rated "poor" on methodological quality. No information was found on the responsiveness of the test.
Agility/change of direction speed tests 505 test One study examined both test-retest reliability (over two days) and the construct validity of the 505 test [19]. The study was rated "fair" on methodological quality and a positive rating (ICC = 0.90) was reported for the test-retest reliability. For the construct validity, a negative rating was found for the 505 test as the results of the test showed an unexpected marginal effect size (ES = 0.28) because there were no significant difference between groups on the performance of the test. No information on responsiveness was found for the test.
Modified 505 test Reliability of the Modified 505 test was investigated in one study [19]. The study was "fair" on methodological quality because of the large sample size. A positive rating (ICC = 0.92) on the test-retest reliability was found for the test. The same study investigated the construct validity of the test. The study had "fair" methodological quality on validity. A negative rating of construct validity (known-group validity) was found for the Modified 505 test as there was no significant difference between groups (ES = 0.32). Therefore, less than 75% of the results were in accordance with the hypotheses. No information was found for the responsiveness of the test.
L run test One study examined both the test-retest reliability (over two days) and the construct validity of the L run [19]. The study was rated "fair" on methodological quality and a positive rating (ICC = 0.95) was reported for the test-retest reliability. For the construct validity, a negative rating was found for the L-run test as the results of the test showed an unexpected marginal effect size (ES = 0.28). There was no information found on responsiveness of the test.
Change of direction speed test Two studies reported on the reliability of the change of direction speed test [6,74]. The test-retest interval ranged between three to seven days. The same studies provided evidence on the construct validity (known-group validity) of the test [6,74]. However, the two studies were rated "poor" on methodological quality for both reliability and validity. There was no information found on responsiveness of the test.

Sergeant (vertical) jump test
For the Sargent Jump test, there was only one study which was found evaluating inter and intra-rater reliability of the test [90]. Intra-rater reliability was assessed with testing sessions separated by two hours whilst inter-rater reliability assessments were separated by two days. The study was rated "fair" on methodological quality. A positive rating for intra-reliability (ICC = 0.99) and inter-rater reliability (ICC = 1.00) was reported for the test. The same study evaluated the validity of the Sergeant Jump test and showed positive criterion validity against the Jump Platform (JP) test using 45 soccer participants. The study was rated "fair" quality for criterion validity. There was no information found on responsiveness of the test.
Bench press repetitions-to-fatigue tests One study examined the construct validity of three different upper-body strength-endurance tests, namely, bench press repetitions-to-fatigue at 60% of one repetition maximum test (BP RTF 60% 1RM), bench press repetitions-to-fatigue at 60 kg (BP RTF 60) and bench press repetitions-to-fatigue at 102.5 kg (BP RTF 102.5) [81]. For the BP RTF 60 and 102.5, the study was rated "fair" on methodological quality because of the adequate sample size (n = 38). A positive rating of construct validity was found for the two tests. However, for the construct validity of the BP RTF 60% 1RM test, the study was rated "poor". There was no information on the reliability or responsiveness of the three tests in measuring upper body strength-endurance.

Best evidence synthesis: level of evidence
A summary of best evidence synthesis are presented in Table 9. The synthesis was derived from information on the rating of the methodological qualities of the studies and results on the measurement properties of the tests. Only studies with "fair" to "good" methodological quality were used to determine the level of evidence per test for each studied measurement property.

Discussion
The aim of the present systematic review was two-fold. Firstly, we systematically reviewed 70 studies in Stage 1 to identify physiological characteristics evaluated in rugby and the corresponding tests used to measure each construct. Thereafter, 20 studies were systematically reviewed in Stage 2 to provide an overview on the measurement properties of the physiological tests identified in the studies. Most of the included studies from stage 1 were from Australia, United Kingdom, New Zealand, and South Africa. This probably reflects the popularity of the sport of rugby in these respective countries. The fact that there were an almost equal number of adult and adolescent rugby studies indicates that rugby is extensively studied in junior and senior players. It is also possible to speculate that the sport is equally popular among junior and senior players.
One most important finding that emerged from stage 1 was that there are a number of physiological characteristics that are commonly investigated among rugby players. Fifteen physiological characteristics were identified. This extensiveness probably confirms wide interest researchers have in physiological characteristics. The interest could be linked with suggestions that success in rugby is highly dependent on physiological characteristics [75]. With increased professionalism and competition, there has been extensive investment in research towards establishing physical qualities important for successful performance in professional rugby. Moreover, this breadth of physiological characteristics under investigation potentially highlights the physical nature of the sport and diversity in attributes needed to meet the physical demands of the game. It is well-established that rugby is a physical sport requiring participants to partake in challenging physical collisions such as scrummaging, tackling, aggressive mauling and rucking which require optimal muscular strength, power and endurance [5]. This gives rationale to the preponderance of studies investigating lower and upper body muscular power [15, 16, 30-36, 40, 49, 61, 64, 73], lower and upper body muscular strength [5,7,18,27,38,42,55,56,69,78,80] and muscular endurance [7,15,36,81]. In addition, rugby players variably cover 5000 to 7000 m during match play and engage intermittently in high-intensity efforts which require exceptional agility, anaerobic and aerobic capacity, speed, repeated sprinting and effort ability and generation of high levels of concentric and eccentric force production [53,75]. This also provides justification for numerous studies investigating attributes such as speed, agility, prolonged high-intensity intermittent running ability, repeated sprint ability and explosive lower leg power [7, 16, 19, 30-38, 40, 49, 51, 53, 70, 72, 76]. Stage 1 findings also showed that almost all physiological characteristics had multiple tests for measurement. For example, this review showed that change of direction speed/agility is often evaluated using the 505, modified 505, Illinois Agility test, change of direction speed test among other tests. However, it was surprising to discover that for all the tests identified in Stage 1, none had all the measurement properties (reliability, validity and responsiveness) investigated using rugby participants. In addition, of the 63 tests identified in Stage 1, only 21 had information on at least one of the measurement properties from rugby and related sports. This suggests that there is limited reporting of the measurement properties for tests commonly used in rugby in the literature. This was particularly evident for the property of responsiveness. All these findings are interesting and raise questions on the rationale for selection of tests by researchers in the field of rugby. For example, speed was the most commonly studied physiological characteristic in the included studies. It was frequently measured from linear distances varying between 5 m and 60 m ( Table 2). The commonly tested sprinting distances for speed were, however, the 10 m, 20 m and 40 m. Professional rugby studies have provided the evidence that players seldom sprint distances greater than 40 m in a single bout [100]. This probably justifies the predominance use of the 10 m, 20 m and 40 m sprint tests in assessing rugby players in the literature [30][31][32][33][34][35][36][37][38][39][40]. In addition, straight-line sprinting is reported to be broken down into three phases: acceleration, attainment of maximal speed, and maintenance of maximal speed [101]. This is also possibly justifies the use of more than one sprinting distance for assessing speed as all these distinct qualities of speed should be evaluated separately.
Although there could be plenty of reasons researchers prefer a specific test over others, literature generally recommends the use of feasible, reliable, valid and responsive tests [102]. This review found that there is dearth of high-quality studies (according to the COSMIN scoring system) investigating the measurement properties of speed tests using rugby participants. Best evidence synthesis only showed that there is limited evidence for the test-retest reliability and the known-group validity of the 5 m sprint test, 10 m sprint test and the 20 m speed test.
Repeated-sprint ability has also been reported to be extremely important in rugby given the high-intense and intermittent nature of the sport [100]. This review showed that the construct is commonly measured using the Repeated 20 m sprint test and the Rugby-Specific Repeated Speed test. There were no high-quality studies found investigating the measurement properties of these tests in rugby. Only one study of "poor" methodological quality was found evaluating the test-retest reliability of the repeated 20 m sprint test using 12 rugby participants [51]. One needs to apply caution when adopting or using these tests in future studies using rugby players. High-quality future studies may need to explore the measurement properties of these tests. Repeatedsprint ability tests have been reported to underestimate the repeated high-intensity exercise demands of rugby [24]. To overcome the shortcomings of the repeated 20 m sprint test, Austin et al. [24] assessed the reliability of three repeated high-intensity exercise tests specifically developed for backline players, RU forward players and RL forward players. The study was, however, rated as of "poor" methodological quality because of the small sample size per reliability analysis of each test and short interval (2 days) for the test-retest assessments.
There is dearth of high-quality studies investigating the measurement properties of the Yo-Yo intermittent recovery (Level 1 and 2) tests in rugby. This is despite the popularity of the tests in assessing prolonged highintensity intermittent running ability/endurance and maximal aerobic power among rugby players [15,24,[53][54][55][56]69]. This creates a need for future studies to specifically evaluate the measurement properties of the test using rugby participants. However, much of the information on measurement properties of these tests reported in rugby studies is referenced from validation studies conducted using participants from other sports. There are multiple studies providing the evidence of the measurement properties (reliability, validity and responsiveness) of the tests in other related intermittent sports such as Soccer and Australian Rules football [88,89,[91][92][93][94][95][96][97][98]. However, no high-quality studies were found evaluating the measurement properties of the test according to the COSMIN guidelines. All the studies included in this review assessing the measurement properties of the tests showed "poor" methodological quality. The major drawbacks in all these studies were mainly related to the issues of inadequate sample sizes and lack of a clear description of the expected hypotheses. There were also no studies evaluating the measurement properties of other tests of prolonged high-intensity intermittent running ability such as the repeated 12 s sprint shuttle speed tests.
There were four tests identified estimating maximal aerobic power of rugby players: Multistage fitness, Yo-Yo intermittent recovery level 1 test, 30-15 intermittent fitness  ) and the 1500 m run. The multistage fitness was commonly used in a number of studies [7, 8, 10, 16, 27, 30-37, 40, 49, 50, 61-64]. However, there is paucity of information on the measurement properties for maximal aerobic power in rugby or related sports. Only one study of "good" methodological quality assessed the reliability and the usefulness of the 30-15 intermittent fitness in rugby participants [68]. Best evidence synthesis showed moderate evidence to support the test-retest reliability of the 30-15 Intermittent Fitness test. There were no high-quality studies providing evidence on the measurement properties of tests identified for measuring anaerobic endurance such as the T120 s, Wingate 60 cycle, 300 m Shuttle Run and the 400 m Sprint tests. Holloway et al. [70] evaluated the validity of the T120 s test and compared the validity of the test to the Wingate 60 cycle test. According to the COSMIN guidelines, the study was rated as of "poor" methodological quality as the study had 12 participants.
There were number of studies that evaluated agility/ change of direction speed of rugby players. There tests commonly used included: 505 test, Modified 505 test, Illinois Agility test, Change of Direction Speed test and Agility test [6,16,19,32,34,35,40,53,74,77]. There were no high-quality studies evaluating the measurement properties of these tests in rugby. This is despite the importance of agility as a physiological skill in the sport of rugby. There was only one study of "fair" methodological quality according to the COSMIN guidelines that evaluated the measurement properties of the 505 test, modified 505 test, and the L run test. The study showed positive rating for the test-retest reliability of these three agility tests. However, there was negative rating for the known group validity for these tests. These findings support best evidence synthesis results indicating that there is limited evidence on the reliability and construct validity of these tests in assessing agility of rugby players. There is still need for further high-quality studies evaluating the measurement properties of these tests in rugby players.
Lower-body muscular power was the second most commonly studied physiological characteristic among rugby players in the studies included in this review. Although, there were three tests identified estimating the lower-body muscular power in the included studies. We found no studies evaluating the measurement properties of all three tests in rugby. Evidence on the measurement properties were found in one "fair" study evaluating the intra/inter-reliability and criterion validity of the Vertical Jump test among soccer players. A positive rating was found for the intra/inter-reliability of the test. Evidence on criterion validity was found to be questionable (Table 8) as there was no convincing argument that the gold standard test used was "gold". Overall, best evidence synthesis indicates limited level of evidence for the inter/intra-rater reliability and criterion validity of the Sergeant (vertical) jump test.
There were also no clinimetric studies found testing the measurement properties of tests for lower-body muscular strength, upper-body muscular strength and power. However, one study of fair methodology provided the evidence on the known-group validity of two tests of upper-body muscular endurance (bench pressrepetitions-to-fatigue test at 60 kg and 102.5 kg). Best evidence synthesis indicates that there is limited evidence to support the validity of these two tests in evaluating upper-body strength-endurance.

Limitations
The results of this review paper should be interpreted with the understanding of a number of important limitations. Currently, there are no published reviews investigating measurement properties of performance-based tests measuring physiological characteristics in rugby. This renders comparisons with other review studies impossible. However, it suffices to suggest that these results expose a research gap on high-quality studies evaluating measurement properties for physiological tests commonly used in rugby. Although it could also be a major strength for this review, the inclusion criteria only considered full-text peer reviewed articles and completely excluded grey literature. This publication bias likely threatens internal validity of results obtained on measurement properties for this review as unpublished studies are more likely to report negative or unfavourable results. Although the COSMIN has been developed for the evaluation of measurement properties and has been generally used in the literature for that purpose, the guidelines appear well-suited and more applicable for appraising the quality of questionnaire-based studies. In the context of performance-based tests such as used in rugby, the applicability of the COSMIN as a quality rating tool for the studies on measurement properties still requires careful consideration.