- Research article
- Open Access
Random forest algorithm to identify factors associated with sports-related dental injuries in 6 to 13-year-old athlete children in Hamadan, Iran-2018 -a cross-sectional study
BMC Sports Science, Medicine and Rehabilitation volume 12, Article number: 69 (2020)
Traumatic dental injuries are one of the most important problems with major physical, aesthetic, psychological, social, functional and therapeutic problems that adversely affect the quality of life of children and adolescents. Recently the development of methods based on machine learning algorithms has provided researchers with more powerful tools to more accurate prediction in different domains and evaluate the factors affecting different phenomena more reliably than traditional regression models. This study tries to investigate the performance of random forest (RF) in identifying factors associated with sports-related dental injuries. Also, the accuracy of the RF model for predicting sports-related dental injuries was compared with logistic regression model as traditional competitor.
This cross-sectional study was applied to 356 athlete children aged 6 to 13-year-old in Hamadan, Iran. Random forest and logistic regression constructed by using sports-related dental injuries as response variables and age, sex, parent’s education, child’s birth order, type of sports activity, duration of sports activity, awareness regarding the mouthguard, mouthguard use as input. A self-reported questionnaire was used to obtain information.
Fifty-five (15.4%) subjects had experienced a sports-related dental injury. The mean age of children with sports injuries was significantly higher than children without the experience of injury (p = 0.006). The prevalence of injury was significantly higher in boys (p = 0.008). Children with illiterate mothers are more likely to be injured than children with educated mothers (p = 0.045). Awareness of mouthguard and its use during exercise has a significant effect on reducing the prevalence of injury among users (p < 0.001).
Random forest model has a higher prediction accuracy (89.3%) for predicting sports-related dental injuries compared to the logistic regression (84.2%). The results of the relative importance of variables, based on RF showed, mouthguard use, and mouthguard awareness has more contributed importance in dental sport-related injuries’ prediction. Subsequently, the importance of sex and age is in the next position.
Using predictive models such as RF challenges existing inaccurate predictions due to high complexity and interactions between variables would be minimized. This helps to achieve more accurate identification of factors in sport-related dental injury among the general population of children.
Traumatic dental injuries are one of the most important problems in oral health in children and adolescents. In addition to the physical aspect, it also impacts on psychosocial development through aesthetic concerns. These injuries can lead to impaired oral functions such as chewing and speech through severe dental or periodontal injuries such as tooth fracture, loosening, and direct erosion. Therefore, trauma to anterior teeth with major aesthetic, psychological, social, functional and therapeutic problems adversely affects one’s quality of life. Some part of the annual sport’s costs is spent on the treatment of sports-related dental injuries [1,2,3,4,5]. The cause of most dental injuries in children is their inability to identify traumatic situations. Traumatic dental injuries can occur not only during competitions but also during training and exercise sessions [6,7,8]. Almost 40% of dental injuries occur during sports activities .
Studies in different countries report different rates for tooth injuries in children. However, in a recent meta-analysis, the prevalence of dental injuries in children and adolescents worldwide is 17.5% and in boys twice as high as in girls .
Increasing numbers of violence, access to potentially risky recreational facilities, driving accidents, and greater participation of children in sports activities dramatically increased the dental trauma, making it an emerging public health problem .
Crashes, fights, sports, accidents, hitting objects or people are also factors that can cause tooth damage. The home, school, and street environments are the places most affected by tooth damage, most notably enamel fractures and dentin without pulp exposure [10,11,12,13,14,15,16,17].
Considerable research has also been done on the pathogenic, predisposing and risk factors for such injuries. Based on the available evidence, these factors can be broadly categorized into anatomical and social-behavioral factors. Anatomical factors that increase the risk of anterior tooth injuries include maxillary incisor overjet and teeth inadequate lip coverage of the anterior [11, 12]. Predictors of social behavioral factors also include sex, adverse social-psychological environment, problematic behavior, increased participation in sports, recreational activities and accidents [9, 11].
Therefore, identifying the factors associated with the prevalence of sports-related dental injuries in children is an important step in preventing them and will promote the oral health of future athletes.
The most previous research for identifying the factors associated with sports-related dental injuries is widely employed descriptive statistics methods and classical models such as the logistic regression model. However, in recent years the development of methods based on machine learning algorithms which account for non-linear relationships has provided researchers with more powerful tools to more accurate predictions in different domains and evaluate the factors affecting different phenomena more reliably. There are several supervised learning algorithms try to model relationships and providing acceptable classification models [17, 18].
Decision-tree algorithms such as random forest (RF), because of simplicity, are more popular than other machine learning algorithms in a different area . Decision trees are constructed through a sequential separation of data into distinct groups, and the purpose of this process is to increase the distance between groups in each isolation. One of the differences between decision tree methods is how this distance is measured. RF is a Tree-based method in the field of machine learning for classification and regression purposes. The RF is a supervised learning method that ultimately leads to a simple understanding and interpretation of its results by the user. Also, the production of prediction rules is a feature of the RF method. Prediction rules are logical statements of the form if (conditions) then (prediction) which are easy to use in decision making [20, 21].
Given these promising features, this study tries to investigate the performance of RF in identification factors associated with sports-related dental injuries in 6 to 13-year-old athlete children in Hamadan west of Iran. In this study, RF will be used for predicting sports-related dental injuries. Also, the relative importance of variables in the prediction of sports-related dental injuries will be identified. In this way, the accuracy of the RF model for predicting sports-related dental injuries was compared with logistic regression model as traditional competitor.
Ethical approval and consent to participate
The study was approved by a research ethics committee of Hamadan University of Medical Sciences with IR.UMSHA.REC.1397. 728 codes.
This cross-sectional study was carried out using a multi-stage cluster sampling method with randomly selected 356 athlete children aged 6–13 years who are active in sports clubs in Hamadan city (west of Iran) and also have more than 1 year of sports experience.
The sample size was calculated based on a sample error of 0.05, a significance level of 5%, and the prevalence of dental injuries of 20% and the design effects of 1.5. With a response rate of 70%, finally, 356 questionnaires were used for analysis.
All the clubs in the city included in the sampling frame, the clubs served as clusters, after random selection of clubs, athlete children were randomly selected in each sport. Those children who belonged to multiple sports clubs were excluded from this study. A letter was sent to all parents or guardians of the selected children explaining the purpose, characteristics, and importance of the study. All athlete children that the parent or guardian provided informed consent on behalf of the child were included in this study. Eligible participants were identified and information collected from June to October 2018.
A self-reported questionnaire was used to obtain information on the sports-related dental injury. The questionnaire of this study was designed based on similar studies and literature reviews [9, 13, 22, 23]. The questionnaire was divided into three sections. The first part consisted of questions related to age, sex, parental education, child’s birth order, type and history of exercise activity, duration of exercise activity during the week and day and enjoyment of playing. The second part included questions about the history of dental injury, the time of injury, the type of dental injury, the time of referral to medical centers. The third section also included questions about athlete awareness and use of oral protective equipment such as mouthguards . To assess the validity of the developed questionnaire face and content validity was used. Also, Cronbach’s alpha coefficient assessed internal consistency. After confirming the reliability and validity, the questionnaire sent to parents.
The parents’ response to the question “Has your child ever had a tooth injury during exercise” was used to assess the prevalence of dental injuries during exercise. Type of activity depending on exposure was divided into the non-contact sport: gymnastics, limited-contact sport: involving football, volleyball and basketball, semi-contact sport: karate and taekwondo, and full-contact sport including wrestling, boxing, and judo. The type of tooth injury was divided into types of crown fractures, mobility, and complete tooth extraction so that parents can be understood.
Descriptive and bivariate analysis
To summarize categorical study variables frequencies and percentages were used, and mean and standard deviations were computed for continuous variables. Furthermore, the univariate association of dental injury with categorical variables was analyzed by the Chi-square test. The significance level was considered to be 0.05. The analysis was performed using SPSS 21 software.
The RF algorithm is a recursive partitioning method generates large amounts of trees and then averages the results. Initially, bootstrap data sets were created through the resampling of the training data. Then for each of the bootstrap samples, RF will construct an unpruned tree according to the following procedure: at each node of the tree number of the predictors randomly selected and then selects the best split among all predictors. The classification error rate of the RF, which so-called out of bag (OOB) error will be estimated by considering all excluded samples by bootstrap samples. Finally, the one final classification is consists of the outputs of all trees [19, 20].
In this study, RF constructed by using sports-related dental injuries as response variables (including 2 class label: yes and no) and age, sex, parents education, child’s birth order, type of sports activity, duration of sports activity, awareness regarding the mouth guard, mouth guard use, are used as input as predictor variables.
The output of the variable importance is one of the main features of RF. Variable importance describes the relationship between a given variable and the classification result. In this regard, the permutation importance index was used in this study to assess variable importance. Calculation of the variable importance is performed by looking at the change in prediction error occurring when OOB data for that variable is randomly permuted while all other variables are left unchanged. The calculations are performed tree by tree while the RF is drawn. Compared to variables that are not important, permuting values of an important variable in the analysis problem at random leads to greater changes in prediction performance [19, 20].
We used default parameters for RF: the number of trees (ntree) equal to 1000 and the number of variables analyzed at each node to find the best split where the total number of variables in the problem is. Statistical analyses were performed using R packages random Forest and caret.
A logistic regression model was also used to evaluate the impact of different factors on dental sports injuries. It should be noted that the independents and dependent variables in the logistics regression model were similar to the random forest model. The results were presented in terms of odds ratio and 95% confidence interval for the odds ratio.
The predictive performance of random forest and logistic regression models are evaluated by constructing the confusion matrix. Besides, accuracy is also measured for each model.
Characteristics of the subjects according to the sports-related dental injury presented in Table 1. According to the results, of 356 participating children, 55 (15.4%) subjects experienced sports-related dental injury and 301 subjects (84.6%) had no history of sports-related dental injury. The mean age of children with sports injury (11.31 ± 1.61 years) was significantly higher than children without the experience of injury (10.61 ± 2.14 years) (p = 0.006). According to the univariate analysis based on the Chi-square test, the prevalence of injury was significantly higher in boys (20.1%) than in girls (9.9%) (p = 0.008). A mother’s level of education has a significant effect on the prevalence of dental sport-related injury (p = 0.045). The injury was higher in children who had first child than other children, although this difference was not significant (p = 0.407). Among the children with sports-related dental injuries, 36.4% (n = 20) had crown fracture, 58% (n = 32) had mobility and 5.6% (n = 3) had avulsion. There is no significant difference in the prevalence of injury in terms of experience and duration of exercise per week and day (p > 0.05). Awareness of mouthguard and its use during exercise has a significant effect on reducing the prevalence of injury among users (p < 0.001). Only 7.7% of people who have knowledge about the mouthguard has been injured, while 23.7% of people who were unaware suffered from dental injury (p < 0.001). The prevalence of injury was significantly lower among users of a mouthguard (7.8%) than the non-users of a mouthguard (17.6%) (p < 0.001).
Based on the results of multiple logistic regression model presented in Table 2, increasing age significantly increases the chance of injury occurring, with a one-year increase in age approximately 1.3 times the odds of injury was increased (95%CI: 1.04–1.55, p = 0.021). The odds of injury in boys are 2.3 times higher than girls (95%CI: 1.05–5.04, p = 0.037). Children that no awareness about mouth guard had 5.44 times more likely to having a dental injury than those with the awareness about a mouthguard (95%CI: 2.51–11.8, p < 0.001). Also, the odds of injury to those who did not use the mouth guard is approximately 9 times higher than those who did use the mouthguard during exercise (95%CI: 3.22–21.6, p < 0.001).
Also, the performance of both multiple logistic regression and random forest models in predicting dental sport-related injury was evaluated. The confusion matrix along with the accuracy of each model are provided in Table 3. The results showed that the random forest classification model has a higher prediction accuracy (89.3%) for sports-related dental injuries compared to the logistic regression model (84.2%). However, both models had less accuracy in predicting those who were injured than those who were not.
The results of the relative importance of each variable, based on the random forest model, in terms of mean decrease in accuracy, are presented in Fig. 1. The results showed mouthguard use, and mouth guard awareness has more contributed importance in dental sport-related injuries’ prediction. Subsequently, the importance of sex and age is in the next position.
In this study, the prevalence and factors affecting the sport-related dental injury were evaluated using logistic regression and random forest models. The results indicated that both models have a good prediction performance in terms of accuracy. However, the accuracy of the random forest model was better than the regression model. Also, the results of the variable importance based on the random forest model indicate that mouthguard use and mouth guard awareness have higher relative importance than other variables. Subsequently, sex and age were more contributed to the prediction of injury. These findings are consistent with those significantly variables identified in the multiple logistic regression model.
Our results show that the prevalence of sports-related dental injuries was 15.4%; this similar pattern is seen in similarly aged cohort of athletes from Japan with 13.3% prevalence . In the study conducted by Rouhani et al. on 80 professional contact sports athletes aged 20–30 years in northeastern Iran, 26.2% of athletes experienced one type of dental injury . However, the prevalence of dental injury in the study of Paiva et al. on 12-year-old of Brazilian children was 34.9% . In the Singh et al., in high school students aged 8–16 in northern India, 32% of girls and 29% of boys had a sports injury .
According to the literature, male is at greater risk of sports-related dental injury. Boys are usually more active and engage in stronger physical activities such as contact sports, fights, harder games, and use toys and equipment with a higher risk potential without adequate protection . In the present study, the incidence of injury in boys was twice that of girls.
In the absence of mouthguards, the risk of injury is 1.6–1.9 times higher, and several review studies have shown that using mouthguards was effective in reducing soft and hard tissue injuries [7, 26]. The mouthguards distribute the force of the blows to the mouth and reduce the damage. The results of the present study also confirm this issue, that way the use of mouthgraud reduces the risk of injury by approximately 2.5 times.
Intensity and frequency of contact are major contributors to these injuries. Higher risk of dental injuries happened in direct contact sports like boxing, soccer, basketball, and hockey . The results of this study also showed that the chance of injury in contact sports is significantly higher than non-contact sport.
The most common type of traumatic injuries in the teeth is enamel fracture and consequently enamel and dentin fracture [15, 16]. In this study, crown fracture with 36.4% is one of the most type of injury.
Although some researchers have reported that high school students with lower socioeconomic status are more likely to develop sport-related dental injuries, there are inconsistencies in various studies in this area . In the present study, children with illiterate mothers are more likely to be injured than children with educated mothers. This can be due to these children’s unfamiliarity with the mouthguards and even because these children play more contact sports.
With the development of machine learning models that are more predictive than conventional regression models, the need to use such models in a variety of contexts, including predicting and identifying factors affecting sports-related dental injury, has increased. Nowadays, random forest has been successfully applied for prediction and classification purposes in many scientific realms.
Although among all machine learning approaches RF represents valuable results in many scientific fields [27,28,29], it is still poorly applied in the context of sports dentistry and its related area. Even very limited studies have used decision tree-based algorithms in the field of dentistry. For example, Dima et al., applied the decision tree algorithm to investigate the effect of parental oral health on the experience of dental caries in children. The results showed that the model used in this study had an accuracy of 93.33% .
As mentioned, studies in the sports-related dental injuries area mainly use descriptive statistics and statistical tests such as chi-square and logistic regression models to analyze the results and identify factors affecting dental injury. However, none of these studies reported the ability to predict the regression model, therefore, it is not possible to compare the performance of these models with the random forest model in the present study.
One of the limitations of the present study, as a secondary study based on information from study aimed at assessing the prevalence of dental injuries and mouthgards use in children, is the lack of access to other important information such as social-behavioral and anatomical factors. Also, this study was performed as cross-sectional and the inverse causal relationship between exercise-related dental injury and study variables was not determined. In addition, the answers to the self-reported questions may have been influenced by the recall bias.
Using predictive models such as random forest challenges existing inaccurate predictions due to high complexity and interactions between variables would be minimized. Such algorithms can be used to identify children at risk for sports-related dental injuries. This helps to achieve more accurate identification of factors in sport-related dental injury among the general population of children.
Increased awareness, the existence of laws to force the use of oral protective equipment in high-risk sports, and encouraging athletes to use oral protective equipment regularly can reduce the occurrence of dental injuries. Children, and especially their parents, should be informed about the risks of dental injuries and their aftermath and the benefits of using the proper type of oral protection.
Availability of data and materials
The datasets during and/or analyzed during the current study available from the corresponding author on reasonable request.
Sgan-Cohen HD, Megnagi G, Jacobi Y. Dental trauma and its association with anatomic, behavioral and social variables among fifth and sixth grade school children in Jerusalem. Community Dent Oral Epidemiol. 2005;33:174–80.
Schildknecht S, Krastl G, Kuhl S, Filippi A. Dental injury and its prevention in Swiss rugby. Dent Traumatol. 2012;28(6):465–9.
Marcenes W, Murray S. Social deprivation and traumatic dental injuries among 14-year-old school children in Newham, London. Dent Traumatol. 2001;17:17–21.
Damé-Teixeira N, Alves LS, Susin C, Maltz M. Traumatic dental injury among 12-year-old south Brazilian schoolchildren: prevalence, severity, and risk indicators. Dent Traumatol. 2013;29:52–8.
Petrović M, Kuhl S, Slaj M, Connert T, Filippi A. Dental and general trauma in team handball. Swiss Dent J. 2016;126:682–6.
Frujeri MDLV, Frujeri JAJ, Bezerra ACB, Cortes MIDSJ, Costa ED. Socio-economic indicators and predisposing factors associated with traumatic dental injuries in schoolchildren at Brasília, Brazil: a cross-sectional, population-based study. BMC Oral Health. 2014;14:91.
Azami-Aghdash S, Ebadifard Azar F, Pournaghi Azar F, Rezapour A, Moradi-Joo M, Moosavi A, et al. Prevalence, etiology, and types of dental trauma in children and adolescents: systematic review and meta-analysis. Med J Islam Repub Iran. 2015;29(234):1–13.
Bemelmanns P, Pfeiffer P. Incidence of dental, mouth, and jaw injuries and the efficacy of mouthguards in top ranking athletes. Sportverletz Sportschaden. 2000;14(4):139–43.
Tsuchiya SH, Tsuchiya M, Momma H, Sekiguchi T, Kuroki K, Kanazawa K, et al. Factors associated with sports-related dental injuries among young athletes: a cross-sectional study in Miyagi prefecture. BMC Oral Health. 2017;7:168.
Garbin CA, Guimaraese Queiroz AP, Rovida TA, Garbin AJ. Occurrence of traumatic dental injury in cases of domestic violence. Braz Dent J. 2012;23(1):72–6.
Tuna EB, Ozel E. Factor’s affecting sports-related or facial injuries and the importance of mouthguards. Sports Med. 2014;44(6):777–83.
Traebert J. Accidents, sports, and physical leisure activities are the most frequent causes of traumatic dental injury and the rate of pulp necrosis is high following its occurrence in Pilsen, The Czech Republic. J Evid Based Dent Pract. 2011;11(2):102–4.
Galic T, Kuncic D, Poklepovic Pericic T, et al. Knowledge and attitudes about sports-relateddental injuries and mouthguard use in young athletes in four different contact sports—water polo, karate, taekwondo and handball. Dent Traumatol. 2018;34:175–81.
Thoren H, Numminen L, Snall J, Kormi E, Lindqvist C, Iizuka T, et al. Occurrence and types of dental injuries among patients with maxillofacial fractures. Int J Oral Maxillofac Surg. 2010;39(8):774–8.
Young EJ, Macias CR, Stephens L. Common dental injury management in athletes. Sports Health. 2015;7(3):250–5. https://doi.org/10.1177/1941738113486077.
Marchiori EC, Santos SE, Asprino L, de Moraes M, Moreira RW. Occurrence of dental avulsion and associated injuries in patients with facial trauma over a 9-year period. Oral Maxillofac Surg. 2012;7:7.
Farhadian M, Salemi F, Saati S, Nafisi N. Dental age estimation using the pulp-to-tooth ratio in canines by neural networks. Imaging Sci Dent. 2019;49:19.
Farhadian M, Shokouhi P, Torkzaban P. A decision support system based on support vector machine for diagnosis of periodontal disease. BMC Res Notes. 2020;13(1):1–6.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, Springer Series in Statistics. 2nd ed; 2009.
Liaw A, Wiener M. Classification and regression by random forest. R News. 2002;2:18–22.
Haiyan H. Mining patterns in disease classification forests. J Biomed Inform. 2010;43(5):820–7.
Rouhani A, Ghoddusi J, Rahmandost MR, Akbari M. Prevalence of traumatic dental injuries among contact sport practitioners in Northeast of Iran in 2012. JDMT. 2016;5(2):82–5. https://doi.org/10.22038/jdmt.2016.6618.
Paiva PCP, Paiva HND, Filho PMDO, Côrtes MIDS. Prevalence and risk factors associated with traumatic dental injury among 12-year-old school children in Montes Claros, MG, Brazil. Cien Saude Colet. 2015;20(4) https://doi.org/10.1590/1413-81232015204.00752014.
Mojarad F, Farhadian M, Torkaman S. The prevalence of sports-related dental injuries and the rate of awareness of mouthguard use among child athletes. J Pediatr Res. 2020;7(4):358–64. https://doi.org/10.4274/jpr.galenos.2020.92678.
Singh G, Garg S, Damle SG, Dhindsa A, Kaur A, Singla S. A study of sports related occurrence of traumatic orodental injuries and associated risk factors in high school students in North India. Asian J Sports Med. 2014;5(3):e22766.
Biagi R, Cardarelli F, Butti AC, Salvato A. Sports-related dental injuries: Knowledge of first aid and mouthguard use in a sample of Italian children and youngsters. Eur J Paediatr Dent. 2010;11(2):66.
Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, et al. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics. 2009;10:213. https://doi.org/10.1186/1471-2105-10-213.
Calle ML, Urrea V, Boulesteix AL, Malats N. AUC-RF: a new strategy for genomic profiling with random forest. Hum Hered. 2011;72:121–32. https://doi.org/10.1159/000330778.
Chen X, Wang M, Zhang H. The use of classification trees for bioinformatics. Wiley Interdiscip. Rev Data Min Knowl Discov. 2011;1:55–63. https://doi.org/10.1002/widm.14.
Dima S, Wang KJ, Chen KH, Huang YK, Chang WJ, Lee SY, et al. Decision tree approach to the impact of parents’ oral health on dental caries experience in children: a cross-sectional study. Int J Environ Res Public Health. 2018;15(4):692. https://doi.org/10.3390/ijerph15040692.
The authors wish to express their sincere gratitude to Vice Chancellor of Research of Hamadan University of Medical Sciences.
This study was supported by Vice-Chancellor of Research and Technology of Hamadan University of Medical Sciences, Contractor No. 9710186074. The funder had no role in the study other than providing financial support.
Ethics approval and consent to participate
This study was approved by the Ethics Committee of Hamadan University of Medical Sciences with IR.UMSHA.REC.1397. 728. All the participants were informed on the purpose of the study and written informed consent was obtained from the parent of all individual participants included in the study.
Consent for publication
There are no competing interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Farhadian, M., Torkaman, S. & Mojarad, F. Random forest algorithm to identify factors associated with sports-related dental injuries in 6 to 13-year-old athlete children in Hamadan, Iran-2018 -a cross-sectional study. BMC Sports Sci Med Rehabil 12, 69 (2020). https://doi.org/10.1186/s13102-020-00217-5
- Sports-related dental injuries
- Random Forest
- Logistic regression