Please Steal My Articles

(Source)

A personal license for improving the data science community

When Time Warner’s CEO heard Game of Thrones was the most pirated TV show in the world, he said this was “better than an Emmy”. The show’s director similar comments, saying that the illicit downloads created a “buzz” around the show. Instead of pursuing the perpetrators to the full extent of the law, HBO took a soft stance and let the downloads continue.

While this may seem counterintuitive, the thinking went the more people who found the show through any means, the more talk about it there would be, leading to a greater number of paying customers. Rather than spending fortunes trying to stop the inevitable, HBO accepted the piracy as a positive. While there are certainly other factors at play, this decision seems to be wise in light of the record viewership Game of Thrones would attain.

When I found out my articles s my articles and learns something, regardless of whether they know I wrote the article, the world is a better place.

Maybe this view strikes you as naive and you think I sound idealistic. My response is we could use more idealistic people who do things not for personal gain, but to make enrich the lives of their fellow humans. Sure, I could complain about my work being stolen ( it far and wide with or without my name attached!


Information Should be Free

The Google Chrome Extension Sci-Hub links is one of my favorite tools. If you end up on a locked journal article with a Document Identification Number (DOI), click on the extension and magically get transported to the PDF.

A magic tool for accessing articles behind paywalls.

This access is provided through Sci-Hub, a quasi-legal repository of paywalled journal articles and books that aims “to remove all barriers in the way of science.” Whether or not you think this is legal, you can’t argue with the idea that there should not be arbitrary barriers to advancing human progress.

What I love about data science is how accessible it is: there are no insurmountable impediments— such as a need for college — that prevent you from entering and contributing to the field. You can learn everything you need online and take free courses teaching cutting-edge techniques. This is an unalloyed good for data science, and it is part of my responsibility as a practicing data scientist to ensure it becomes even more accessible.

While I could have continued on an academic path and written articles read by a dozen people at most, I decided this would not be satisfying because I wouldn’t be able to help the general public get into data science. I’m not denigrating academics, and basic research certainly benefits the public once it filters down into accessible tools (such as Scikit-Learn for machine learning or Keras for deep learning). Nonetheless, I think academics should do more to make their work understandable to the public. Science is most effective when it can change people’s minds and that always comes down to communication.

Neil deGrasse Tyson is the best-known physicist in the world, not because he does the most advanced work, but because he has made it his career to communicate physics findings to a wide audience. The general public clearly has a desire to behold the wonders of physic — look at the popularity of Cosmos — but don’t want to spend hours trying to decipher esoteric papers.

In much the same way, there is an insatiable hunger to learn data science but people don’t have the time to read through all the journal articles(most of which will be obsolete in a few years anyway) or the resources to attend college. Fortunately, there are many data science communicators working to tear down those barriers.

The biggest hurdle to getting starting in most technical fields is access; however, in data science, everyone has access to the most up-to-date methods and technology (such as TPUs through Google Colab). The next hurdle is the vocab, it’s easy to get discouraged when you can’t understand what anyone is talking about. In my opinion, there is no need to use a long word when several shorter ones can better explain the concept. Using longer words doesn’t make you sound smarter, it just means whatever you say will be inaccessible to most people. Sure you may be technically right and feel a brief moment of superiority, but if no one understands you, they will quickly stop listening.

Ultimately, I view it as my responsibility to encourage and help as many people as I can learn data science. I believe a field only grows stronger the more different people that can participate. Data science holds an immense amount of power in a data-driven world, but if that power is wielded by a small group of people, the benefits will be limited. Again, this may be my idealism, but this is the foundational belief that gets me out of bed in the morning.


What You Can Do With My Articles

The short answer: anything. The slightly longer answer: I would prefer you didn’t run my articles with ads (thanks Medium, the subscription model is much better!) but even if you do, I’m not going to stop you. A credit is nice or a link to the original article, but again, it doesn’t really matter if you forget. You can copy and paste my articles anywhere (if you want to print a book version that would be neat) and translate them into any language you speak!

Fundamentally, I don’t believe I really own my articles. Sure, I spend dozens of hours writing the code, making Jupyter Notebooks, writing articles, and going through the painful process of editing, but the ideas I’m using are gathered from many different people and I can’t claim sole responsibility.

Maybe physics proceeds by standing on the shoulders of giants, but data science advances on the backs of tens of thousands of anonymous contributors — on GitHub, Stack Overflow, Medium, and everywhere else people are motivated to contribute for the good of the whole.

As Thomas Jefferson put it “ Knowledge is like a candle. Even as it lights a new candle, the strength of the original flame is not diminished.” ( without my consent, I’m no worse off as a result, and the data science community is better. If we are being honest, my articles should read: “written by the data science community for the data science community.” Sure, it’s nice to get recognition, but it’s even better to know that your work is helping people even if they don’t know you wrote it.

Definitely an idealist! (Source)


As always, I welcome criticism and constructive feedback. I can be reached on Twitter @koehrsen_will or through my personal website at willk.online.