By Cy Seeley, Rohan Madhur, Alexander Mollohan, and Jonathan David
Abstract
The NFL draft remains an unpredictable process, with teams evaluating a combination of athletic performance, college statistics, and positional value to determine player selections. This study applies machine learning techniques to predict an athlete’s draft position using NFL Combine performance metrics. Employing classification and regression models, including Random Forest, XGBoost, and Neural Networks, we analyze key factors influencing draft outcomes. While machine learning provides valuable insights, results indicate that athletic metrics alone cannot fully predict draft success due to team-specific preferences and intangible qualities. Future improvements include integrating scouting reports, college performance data, and advanced modeling techniques to enhance predictive accuracy.
Introduction
The NFL draft is the primary method for teams to acquire new talent, yet predicting player success at the professional level remains challenging. While traditional scouting relies on subjective evaluations, machine learning offers a data-driven approach to identifying patterns in player performance. This study leverages NFL Combine data—metrics such as the 40-yard dash, bench press, and vertical jump—to predict draft placement. By using classification and regression models, we assess which factors most influence draft outcomes. Our findings highlight the benefits and limitations of data-driven forecasting, revealing that while machine learning can provide meaningful insights, external factors beyond measurable athleticism significantly impact draft decisions.
Key Findings
- Principal Component Analysis (PCA) revealed that the first five components account for 96.74% of the dataset’s variance, with the first two capturing 73%.
- Feature selection identified the 40-yard dash as the most impactful predictor, with faster sprint times correlating strongly with higher draft positions.
- Random Forest and XGBoost models achieved Mean Absolute Errors (MAEs) of ~80 picks, indicating predictions were generally within one to two rounds of actual selections.
- Classification models, including Random Forest and Neural Networks, struggled with precision, achieving an accuracy range of 29.72%–36.99%.
- A binary model distinguishing drafted vs. undrafted players performed better, with 71.87% accuracy, highlighting clearer patterns among drafted athletes.
- The models exhibited bias toward later rounds due to class imbalances, making early-round predictions particularly challenging.
- Results suggest that NFL Combine metrics alone provide a useful but incomplete picture, as team strategies, scouting evaluations, and player intangibles play a crucial role in draft decisions.
Conclusion
This study demonstrates the potential of machine learning in predicting NFL draft outcomes using Combine performance data. While models captured key athletic indicators and provided reasonable estimates, their predictive accuracy remained limited due to missing qualitative factors such as scouting reports, team preferences, and injury histories. The findings reinforce that physical metrics are an essential component of draft evaluations but cannot fully account for the complexity of team decision-making. Future work should incorporate additional data sources, such as natural language processing of scouting reports and advanced deep learning techniques, to improve predictive capabilities. By bridging quantitative analysis with qualitative insights, machine learning could become a powerful tool in draft strategy and player evaluation.
Link to Paper
Leave a comment