Machine Learning Kaggle Competition Part Three Optimization
Getting the most out of a machine learning model
How best to describe a Kaggle contest? It’s a machine learning education disguised as a competition! Although there are valid criticisms of Kaggle, overall, it’s a great community that provides interesting problems, thousands of data scientists willing to share their knowledge, and an ideal environment for exploring new ideas. As evidence of this, I never would have learned about the Gradient Boosting Machine, or, one of the topics of this article, automated model optimization, were it not for the Kaggle Home Credit contest.
In this article, part three of a series (Part One: Getting Started and Part Two: Improving) documenting my work for this contest, we will focus on a crucial aspect of the machine learning pipeline: model optimization through hyperparameter tuning. In the second article, we decided on the Gradient Boosting Machine as our model of choice, and now we have to get the most out of it through optimization. We’ll do this primarily with two methods: random search and automated tuning with Bayesian optimization.
All the work presented here is available to run on Kaggle in the following notebooks. The article itself will highlight the key ideas but the code details are all in the notebooks (which are free to run with nothing to install!)