Group Members

Xuehan Yang, xy2517
Yucong Gao, yg2834
Hao Zheng, hz2772
Pei Hsin Lin, pl2811

Tentative Project Title

Hey Tom Thibodeau, this is exactly what New York has been waiting for.

Motivation

As Knicks fans, we are witnessing probably the greatest revolution ever on the court, which is called “Small ball era”. Teams tend to use small and faster players instead of traditional giants to accelerate moving speed and improve shooting efficiency. Last year, the Knicks returned to playoff season after eight years, which brought great joy to the fans. In order to make this performance long-lasting, we feel obligated to research the key variables that contribute to the winning of a game, to help Knicks maintain existing strengths and make up for the disadvantages. The most obvious feature of the “small ball era” is the rise of 3 point shot attempt. In this way, we focus on three-point related variables along with other factors to conduct our analysis. Due to the huge volume of data released by NBA, our project seems promising and practical.

Intended Final Product

Our intended final product is a winning strategy for New York Knicks, based on the analysis of the relation between game winning and some influencing factors like the proportion of three points attempts, location(home or road), the number of fouls per game and so on, .

Data Source

Our datasets mainly come from the following two website:

https://www.basketball-reference.com/
https://www.nba.com/stats/

These websites contains up-to-date NBA box scores for each team per game and Player shooting data log which we can leverage in our data analysis. As there is no direct API for the datasets we need, we plan to use scrapping to get our data.

Analysis/Visualizations

Intended Data Analysis

Our analysis contains three parts, all of which are based on data from last 5 NBA regular seasons, from 2015-2016 season to 2020-2021 season.

The First analysis is to develop regression models to quantify the relation between game winning with potential factors. The potential factors selected are related with three points shooting. The dependent variables are expected to be game win or lose, 1 if win and 0 if lose. The potential independent variables are position(home, road), the proportion of three points shooting attempts, three points shooting rate, the number of AST, the number of TOV and possessions.

The second analysis is descriptive analysis, focusing on how to improve three point shooting. Variables such as Position in the team(PG, SG), position on the court(corner or not), number of dribbles before shot, time on the court and the position of the passers are taken into consideration .

The third analysis gives suggestions to a specific NBA team, New York Knicks. The advice mainly cover three point shooting strategy on the court and three points training strategy according to the results from the first two analysis.

Intended Data Visualization

  • A line plot of the average team three point shooting attempts trend over time
  • A parallel coordinates of three point attempt rate trend overtime breakdown by playoff teams and other teams
  • A hex plot that shows the three points shooting rate of a team on different position
  • A panel plot shows characteristics difference between playoff teams and other teams

Anticipated Challenges

  • There is no direct API from NBA official website about shooting log data, how to use scrapper to get the data is a big challenge.

  • Because the research object involves specific players, player status and on-site audience factors may affect player performance and match results, which may result in a decrease in prediction accuracy.

  • Choose a suitable statistical regression model to fit the data, and eliminate possible problems such as multicollinearity, heteroscedasticity, and autocorrelation.

Timeline

date work due
11/08 - 11/13 Complete proposal 11/13 by 1:00 pm
11/14 Meeting: make analysis plan NA
11/15 - 11/16 Data collection: scrapping from the Web NA
11/17 Meeting: work devision NA
11/18 - 11/24 Tidy and clean data NA
11/25 Meeting: webpage design NA
11/26 - 12/1 Data visualization and detailed analysis NA
12/2 - 12/9 Complete final report 12/11 by 4:00 pm
12/7 - 12/10 Complete final webpage 12/11 by 4:00 pm