Open Access Open Access  Restricted Access Subscription Access

Sport Analytics: Using Open Source Logistic Regression Software to Classify Upcoming Play Type in the NFL

Robert E. Baker, Ted Kwartler


The purpose of this study was to utilize data analytics as means to classify National

Football League offensive play types. The open source software R was employed to

create a logistic regression based on data for the Cleveland Browns and Pittsburgh

Steelers from 13 recent seasons. The regression is based on all first, second, and

third downs within regulation play, totaling 26,310 data points. The initial algorithms

classify rush or pass for each offense. Revealed through differing coefficients

of the independent variables, each team shows a slightly different approach

to play selections in response to in-game situations. Identifying the driving factors

to play selection is possible by isolating each attribute within the regression. Further

examination could yield improved precision to control for changes in head

coach, offensive coordinators, player personnel and other factors such as weather

because these may influence play type. Logistic regression shows promise as an

in-game aid to determining opponent behavior. Specifically, Cleveland's offensive

play selection algorithm was correct for 66.4% of plays versus 66.9% for Pittsburgh.

Use of open source software and logistic regression of NFL play selection

could be beneficial in aiding future game decisions. Further research is recommended

to explore possible improvement of the algorithm accuracy.


sport analytics; sport management; data mining; NFL; regression

Full Text:



  • There are currently no refbacks.

Copyright (c) 2018 Sagamore Publishing LLC