My Weaknesses as a Data Scientist

Published on October 26, 2018

Categories: data science , thoughts

Without recognizing our weak points, we’ll never be able to overcome them

If modern job interviews have taught us anything, it’s that the correct answer to the question “What’s your biggest weakness?” is “I work too hard.” Clearly, it’d be ludicrous to actually talk about our weaknesses, right? Why would we want to mention what we can’t yet do? While job applications and LinkedIn profile pages don’t encourage us to disclose our weak points, if we never admit our deficiencies, then we can’t take the steps to address them.

The path to getting better in an endeavor is simple:

Determine where you are now: identify weaknesses
Figure out where you want to be: make a plan to get there
Execute on the plan: take one small action at a time

We rarely get past the first step: especially in technical fields, we keep our heads down and continue working, using the skills we already have rather than attaining new ones that would make our jobs easier or open us up to new opportunities. Self-reflection — evaluating ourselves objectively — may seem like a foreign concept, but being able to take a step back and figuring out what we could do better or more efficiently is critical to advancing in any field.

With that in mind, I’ve tried to take an objective look at where I am now and identified 3 areas to work on to make me a better data scientist:

Software engineering
Scaling data science
Deep learning

The Power of I Don’t Know

Published on October 17, 2018

Categories: thoughts

Intellectual humility is not a weakness but a strength

The phrase “I don’t know” has almost disappeared from our discourse. From the job applicant who must claim to have mastered 100 different skills to politicians who need to have a confident opinion on every news event, the modern world does not encourage people to admit when they lack knowledge or skills. However, by refusing to acknowledge our ignorance, we limit our chances for personal improvement. Saying “I don’t know” — practicing intellectual humility — and adopting a growth mindset are powerful means for becoming smarter and more skilled individuals.

The Dangers of Certainty

When we are young, we refuse to say we don’t know something because we’re ignorant of our own ignorance. We simply have no conception that the world extends beyond our sphere of knowledge, an idea that should be — but often isn’t — disproven in school. Although we are naturally curious — think about the endless “why?” questions asked by children — school teaches us there are a set of certain facts about the world to memorize, focusing on the limited amount of current human knowledge rather than discussing how we continually discover new knowledge that reveals our past beliefs to be incorrect. Moreover, we’re taught to not question these facts, and if we don’t know one of them, the best response is to guess!

Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data

Published on October 12, 2018

Categories: statistics , data science

Understanding a statistical phenomenon and the importance of asking why

Imagine you and your partner are trying to find the perfect restaurant for a pleasant dinner. Knowing this process can lead to hours of arguments, you seek out the oracle of modern life: online reviews. Doing so, you find your choice, Carlo’s Restaurant is recommended by a higher percentage of both men and women than your partner’s selection, Sophia’s Restaurant. However, just as you are about to declare victory, your partner, using the same data, triumphantly states that since Sophia’s is recommended by a higher percentage of all users, it is the clear winner.

What is going on? Who’s lying here? Has the review site got the calculations wrong? In fact, both you and your partner are right and you have unknowingly entered the world of Simpson’s Paradox, where a restaurant can be both better and worse than its competitor, exercise can lower and increase the risk of disease, and the same dataset can be used to prove two opposing arguments. Instead of going out to dinner, perhaps you and your partner should spend the evening discussing this fascinating statistical phenomenon.

Building a Recommendation System Using Neural Network Embeddings

Published on October 4, 2018

Categories: deep learning , neural networks , books

How to use deep learning and Wikipedia to create a book recommendation system

Deep learning can do some incredible things, but often the uses are obscured in academic papers or require computing resources available only to large corporations. Nonetheless, there are applications of deep learning that can be done on a personal computer with no advanced degree required. In this article, we will see how to use neural network embeddings to create a book recommendation system using all Wikipedia articles on books.

Our recommendation system will be built on the idea that books which link to similar Wikipedia pages are similar to one another. We can represent this similarity and hence make recommendations by learning embeddings of books and Wikipedia links using a neural network. The end result is an effective recommendation system and a practical application of deep learning.

Most Similar Books to Stephen Hawking’s A Brief History of Time

Neural Network Embeddings Explained

Published on October 1, 2018

Categories: deep learning , embeddings

How deep learning can represent War and Peace as a vector

Applications of neural networks have expanded significantly in recent years from image segmentation to natural language processing to time-series forecasting. One notably successful use of deep learning is embedding, a method used to represent discrete variables as continuous vectors. This technique has found practical applications with word embeddings for machine translation and entity embeddings for categorical variables.

In this article, I’ll explain what neural network embeddings are, why we want to use them, and how they are learned. We’ll go through these concepts in the context of a real problem I’m working on: representing all the books on Wikipedia as vectors to create a book recommendation system.

Neural Network Embedding of all books on Wikipedia. (From Jupyter Notebook on GitHub).