This project predicts the outcomes of football matches in the J1 League using historical match data. Implemented in a Jupyter Notebook, it employs a Random Forest classifier to forecast future match results and provides insights through data visualization.
The project began with defining key features and setting objectives. I aimed to create a reliable predictive model capable of providing accurate forecasts based on the historical performance data of teams. To collect the data, I utilized web scraping techniques with Beautiful Soup, specifically targeting the J1 League section of fbref.com. This process involved sending HTTP requests to the relevant URLs, parsing the HTML content to locate the tables with match statistics, and extracting key data points, including match results, goals scored, and various team metrics, which I stored in a structured format for further analysis.
Once the data was collected, I performed exploratory data analysis (EDA) to understand patterns and correlations within the data, which informed my feature selection for the model. During this process, I implemented several strategies to enhance data quality. I addressed missing values by either imputing them or removing incomplete records. Additionally, I created new features that captured further insights, such as the average goals scored by a team over the last few matches, which helped improve predictive accuracy. Normalization of certain features ensured that no single metric disproportionately influenced the model's predictions.
As the sole developer, I was responsible for all aspects of the project, including data collection, processing, model training, and evaluation. My goal was to create a solution that could be easily used by fans and analysts to gain insights into J1 League matches. Users can input details about upcoming matches to receive predictions based on historical data and model training. I developed the prediction logic to present results in an easily interpretable format.
(Example of the scraped data)
The Github Repository can be found here: Football Result Predictor.