A Complete Machine Learning Project Walk Through In Python Part Two

Published on May 17, 2018

Categories: machine learning , python , project

Model Selection, Hyperparameter Tuning, and Evaluation

Assembling all the machine learning pieces needed to solve a problem can be a daunting task. In this series of articles, we are walking through implementing a machine learning workflow using a real-world dataset to see how the individual techniques come together.

In the first post, we cleaned and structured the data, performed an exploratory data analysis, developed a set of features to use in our model, and established a baseline against which we can measure performance. In this article, we will look at how to implement and compare several machine learning models in Python, perform hyperparameter tuning to optimize the best model, and evaluate the final model on the test set.

The full code for this project is on GitHub and the second notebook corresponding to this article is here. Feel free to use, share, and modify the code in any way you want!

A Complete Machine Learning Walk Through In Python Part One

Published on May 16, 2018

Categories: machine learning , python , project

Putting the machine learning pieces together

Reading through a data science book or taking a course, it can feel like you have the individual pieces, but don’t quite know how to put them together. Taking the next step and solving a complete machine learning problem can be daunting, but preserving and completing a first project will give you the confidence to tackle any data science problem. This series of articles will walk through a complete machine learning solution with a real-world dataset to let you see how all the pieces come together.

We’ll follow the general machine learning workflow step-by-step:

Data cleaning and formatting
Exploratory data analysis
Feature engineering and selection
Compare several machine learning models on a performance metric
Perform hyperparameter tuning on the best model
Evaluate the best model on the testing set
Interpret the model results
Draw conclusions and document work

If Your Files Are Saved Only On Your Laptop They Might As Well Not Exist

Published on April 30, 2018

Categories: thoughts , computer , security

How to avert computer catastrophes

Last week, as I was working on one of my three final graduate course projects, my laptop decided it was a good time to give out. I spent a futile 15 minutes resetting the battery and holding down the power button trying to get a response, but to no avail: my laptop was done for good.

At this point a year ago, I would have been sobbing uncontrollably, my semester wreaked in the final week. However, this time, I set down my laptop, walked to the school library, logged onto a computer, downloaded my files from Google Drive where they had been synced up until the minute my laptop went dark, and was working on my final projects within 30 minutes. All in all, thanks to automatic back-ups, instead of losing an entire semester, I lost two lines of one report.

This near-tragedy illustrates two points that anyone who does any work on a computer must keep in mind:

You will have a complete computer failure sometime soon
This can be either a soul-crushing loss or no big deal depending on the safeguards you have in place

Web Scraping, Regular Expressions, And Data Visualization Doing It All In Python

Published on April 28, 2018

Categories: python , web , project

A Small Real-World Project for Learning Three Invaluable Data Science Skills

As with most interesting projects, this one started with a simple question asked half-seriously: how much tuition do I pay for five minutes of my college president’s time? After a chance pleasant discussion with the president of my school (CWRU), I wondered just how much my conversation had cost me.

My search led to this article, which along with my president’s salary, had this table showing the salaries of private college presidents in Ohio:

While I could have found the answer for my president, (SPOILER ALERT, it’s $48 / five minutes), and been satisfied, I wanted to take the idea further using this table. I had been looking for a chance to practice web scraping and regular expressions in Python and decided this was a great short project.

Bayesian Linear Regression In Python Using Machine Learning To Predict Student Grades Part 2

Published on April 20, 2018

Categories: bayesian , modeling , project

Implementing a Model, Interpreting Results, and Making Predictions

In Part One of this Bayesian Machine Learning project, we outlined our problem, performed a full exploratory data analysis, selected our features, and established benchmarks. Here we will implement Bayesian Linear Regression in Python to build a model. After we have trained our model, we will interpret the model parameters and use the model to make predictions. The entire code for this project is available as a Jupyter Notebook on GitHub and I encourage anyone to check it out!

As a reminder, we are working on a supervised, regression machine learning problem. Using a dataset of student grades, we want to build a model that can predict a final student’s score from personal and academic characteristics of the student. The final dataset after feature selection is:

We have 6 features (explanatory variables) that we use to predict the target (response variable), in this case the grade. There are 474 students in the training set and 159 in the test set. To get a sense of the variable distributions (and because I really enjoy this plot) here is a Pairs plot of the variables showing scatter plots, histograms, density plots, and correlation coefficients.