A recent study published in Scientific Reports demonstrates the potential of machine learning (ML) models in predicting the risk of preterm births with an impressive accuracy rate of 82%. Researchers aimed to explore how various ML algorithms could be leveraged to predict preterm birth risk in 50 pregnant women. Despite extensive research into the underlying causes of preterm birth, identifying a definitive biological marker has remained elusive. The study represents a promising step towards more accurate predictions and timely interventions for high-risk pregnancies.
Preterm birth, defined as the delivery of a baby before 37 weeks of gestation, is a leading cause of neonatal morbidity and mortality. The World Health Organization (WHO) estimates that 1 in 10 babies are born prematurely each year. Preterm births increase the likelihood of complications, including respiratory issues, feeding problems, cerebral palsy, and even death. Although various risk factors such as maternal smoking, alcohol consumption, stress, and pollution exposure have been linked to preterm birth, the multifaceted nature of these influences has made it challenging to identify a single, reliable risk determinant.
Given the significance of preterm birth as a health concern, the study aimed to evaluate the effectiveness of different ML models in predicting preterm risk and facilitating early interventions. By improving the accuracy of preterm birth prediction, clinicians can better assess the risk and provide timely medical care to prevent adverse outcomes.
Background
Preterm birth has become increasingly common in recent decades, with research continuing to uncover various contributing factors, including maternal lifestyle, genetics, and environmental influences. However, the complexity of these factors has made the development of a single predictive model difficult. Currently, clinicians use risk evaluation tools that incorporate a variety of factors, but these models are often not highly accurate.
Machine learning models have become a significant tool in clinical decision support systems due to their ability to detect patterns and correlations that traditional statistical methods may miss. Their capacity to process diverse data types—such as ultrasound images, electronic health records (EHRs), and electrohysterogram signals—makes them especially useful in preventive medicine. This study sought to improve the performance of ML models in preterm birth prediction by fine-tuning algorithm parameters.
Study Design
The study, conducted at Dr. Antoni Biziel University Hospital in Bydgoszcz, Poland, focused on a cohort of 50 pregnant women, including 28 women who had experienced pregnancy-related complications and 22 women who had no such issues. Researchers utilized medical records, health evaluations, gynecological assessments, blood tests, and medical questionnaires to compile comprehensive participant data.
The study tested five ML algorithms: XGBoost, logistic regression, CatBoost, decision trees, and support vector machines (SVMs). Hyperparameter optimization was performed using the Optuna framework to maximize performance across four key metrics: accuracy, recall, precision, and F1 score. The study also employed chi-squared tests and Welch’s unpaired t-tests to assess the statistical significance of the models’ performances.
Key Findings
Among the five models tested, the linear SVM with optimized hyperparameters emerged as the top performer, achieving 82% accuracy, 86% recall, 83% precision, and 84% overall F1 score. This model outperformed others, including logistic regression, which came in second with an accuracy of 80%, recall of 82%, and precision of 82%. Despite the relatively simple nature of these models, they performed well in predicting preterm birth risk.
In contrast, more complex algorithms such as XGBoost and CatBoost performed less effectively, likely due to the small sample size of the study. These models struggled to generalize, possibly because of their higher complexity relative to the data available. Simpler algorithms like decision trees and random forests also showed underwhelming results, partly due to their difficulty handling the large number of features provided in the study.
Feature performance analysis revealed that several factors played a significant role in predicting preterm birth risk. Notably, C-reactive protein (CRP) levels, parity (number of previous pregnancies), hematocrit (HCT), and platelet count (PLT) were identified as important biological predictors. Socioeconomic factors, such as education level, were also found to influence preterm birth risk, suggesting that a combination of biological, social, and behavioral factors contributes to the condition.
Conclusions
This study demonstrates that linear SVMs, along with logistic regression models, offer strong predictive power for preterm birth risk, with accuracy rates of 82% and 80%, respectively. These findings suggest that simpler models may outperform more complex algorithms when working with small datasets. Moreover, the study highlights the importance of considering a broad range of factors, including both biological and socioeconomic determinants, in the prediction of preterm birth.
Despite the promising results, the study’s small sample size (n = 50) limits the ability to generalize the findings. Researchers recommend conducting larger-scale studies with more diverse datasets to validate these findings and further enhance predictive accuracy. Future research should also explore incorporating earlier pregnancy screenings to improve the timing and effectiveness of interventions.
You Might Be Interested In:
-
Psychedelic Treatments Show Promise for Treating PTSD and Depression
-
Diabetes Drugs May Boost Brain Health, But Experts Warn of Risks
-
Therapy Dogs Help First-Year Students Alleviate Stress, Anxiety, and Depression, Study Shows