Predicting Men’s College Basketball Games

For a little over a month now I have been working on an algorithm to predict winners and losers in college basketball. I first started looking at what stats would help me figure out what team would win. I read many articles and blog post about what good teams did to win basketball games. I settled on the four factors of basketball by Deal Oliver. Those 4 factors are shooting(40%), turnovers(25%), rebounding(20%), and free throws(15%).

So using those factors I started looking at basketball stats. I started out with Effective Field Goal Percentage for shooting. The formula for eFG% is (FG + 0.5 * 3P) / FGA. The formula takes into account 2 pointers versus 3 pointers. While doing some research I started reading about TS%(Tue Shooting Percentage). True shooing percentage takes into account 2 point field goals, 3 point field goals and free throws. So decided to use TS% instead of eFG%

TOV%(Turnover Percentage) estimates how many times a team will turn over the ball per 100 plays. The formula for TOV% is 100 * TOV / (FGA + 0.44 * FTA + TOV

ORB% and DRB%Offensive and Defensive Rebound Percentage for this model I used DRB%. While writing this I am wondering how much of a difference it would be if I used TRB%. Defensive Rebound Percentage is an estimate of the percentage of available rebounds a teams gets. The formula is 100 * (DRB * (Tm MP / 5)) / (MP * (Tm DRB + Opp ORB))

Lastly I calculate FTR(Free Throw Rate). Which is a measure of how often a teams gets to the line and how often they actually make free throws. Again while typing this out I noticed something. My formula for free throw rate may be off. I was using FT/FGA. I think the formula should be FTA/FGA. I will update and post if there are any major changes.

So for my data analysis I use python and the pandas library. I will post the notebook on kaggle for anyone who may be interested.

So I use python to go out to sports reference and download the latest stats. I then clean up the data and calculate the the TS%, TOV%, DRB%, and FTR. for each team. There are sites that do this and post the stats. My problem was importing those states. So I just use the raw data and calculate myself.

Once the stats have been calculated I calculate what I call a c_score. Where I assign each stat a weighted value and add TS%,DRB% and FTR while subtracting TOV%.

Each team is assigned a c_score. If a team has a higher c_score than another team I predict that team will win. I currently predict 69% correct for the top 20 teams in the current ap poll. That means I have predicted the winner in each of the teams games to this point . So for all the games NC State has played so far this season my model has predicted the winner correctly 52% of the time.

NC State     52% correct (Model predicted Syracuse to win. The Orange’s top player got hurt in the 5 mins did not play the rest of the game) UNC             39%
Penn State 35%
Colorado 75%
Marquette    71%
Butler            72%
Houston    76%
Illinois           88%
Duke         83%
Gonzaga   85%
Baylor           45%
Kansas        55%
Dayton        86%
FSU           50%
Maryland 59%
Villanova      86%
Auburn     86%
Seton Hall    55%
West Virginia 59%
Oregon           57%
Kentucky         78%
Michigan St    75%
Iowa                75%
LSU                  74%

The model does not take into account injuries, home vs away or other factors.

The model has a low percentage with UNC of 39% and a high of 86% with Dayton, Villanova and Auburn. My goal is to have a c_score calculated for all the teams in NCAA tournament bracket using data through the conference tournaments.

If you have any questions or comments please let me know.

1 thought on “Predicting Men’s College Basketball Games”