Learn By Sharing

Why I’m ditching the library to write a data science blog

Traditional education is simple: sit down, shut up, and listen to the teacher. After class, go to the library to repeatedly read the same words, trying to figure out abstract topics with little meaning in our daily lives. Even as a graduate student, I still am routinely lectured at and expected to spend large portions of time outside of class alone in contemplating my studies. While this might work fine for subjects that require simple regurgitation of information on a test — looking at you history — it is entirely unsuited for modern technical topics such as data science.

With that in mind, here’s a radical proposal: rather than hitting the books when you want to understand a concept, you should hit your blog and try to explain it clearly to others. The idea is simple: if you can’t teach a topic to someone else, then you truly don’t get it yourself.

Well said

When I began grad classes, I decided to take a new approach to education. Instead of sitting passively in class, I aimed to ask at least one question every lecture. This small adjustment had a profound impact on my engagement in class. I focused my questions on how to implement concepts we covered which often were presented without any practical examples. This active participation made it easier to concentrate in class and to apply topics to problems both in my research and on assignments.

Outside of class, I spent less time studying alone and more time in the lab implementing data science techniques. I also made an effort to engage in conversations with other students about what we covered in class. In the process, I was trying to understand the topics not by memorization, but by explaining them to others. Informed by these discussions, my colleagues and I would try to use the techniques on our problems. Whether we failed or succeeded, we would come back for more debates, creating a productive feedback loop. Fortunately, I am in a lab with students and professors smarter than myself — it’s a good idea to never be the smartest person in the room — and every day I learn something new through seeing it done in practice.

With even more data science and Artificial Intelligence (AI) grad classes this semester, I need to step up my sharing game. My goal is to write at least one blog post explaining a topic covered in class each week. I don’t have as much time to develop cool side applications such as the stock exploration or weight tracker Python tools, but I can take the time normally spent reviewing class material and instead write about what I have learned. This serves both to test if I actually understand the material and to benefit others!

Communities are best served when information — at least non-harmful info— is freely shared. Some people think because they worked hard to learn what they know, others must do the same and they refuse to divulge anything that would make learning easier for others. I ardently disagree with this view: just because we pay tens of thousands of dollars for an education does not mean we should keep it to ourselves. Instead, I believe in the democratization of education and in helping others to learn from my (many) mistakes and (limited but growing) experience. In technical fields, particularly data science, the internet has expanded access to information, and it is now possible for anyone to learn and practice cutting edge techniques. Formal institutions no longer have a monopoly on knowledge, and I want to play a small part in lowering the barriers to these exciting new fields.

Libraries may own books, but nobody owns knowledge

Trying to explain concepts helps us understand them better ourselves. It takes genuine understanding and not memorization to translate a topic for a general audience. We have all experienced the situation where we exhaustively study a topic and think we completely understand the idea, only to completely blank when we have to apply it in a basic situation. Moreover, the most successful individuals in a field tend to not be the smartest, but those who can best communicate findings and show how they are relevant. Neil deGrasse Tyson is the best-known physicist in the world, not because he publishes the most brilliant papers, but because he translates tough concepts for a wide audience. Clear written and spoken communication skills are a major advantage that cannot be taught in a classroom!


These once a week posts will usually be about data science and machine learning with a focus on real-world examples and metaphors. A good indicator of my aim is this correlation vs causation post. While metaphors can oversimplify concepts, my intent is to provide a high-level framework for learning these concepts. It’s useful to have the basic ideas down before diving into the details. The specifics can be filled in by applying them to solve problems (or maybe in a book if you prefer that route). If you can’t wait for my posts, I suggest checking out the data skeptic podcast, which does a great job of distilling data science topics for a general audience. Better yet, start your own blog! Writing is the best form of thinking out your ideas and sharing knowledge benefits everyone in the community.

As always, I welcome feedback and constructive criticism. I can be reached on Twitter at @koehrsen_will.