The Poisson Distribution And Poisson Process Explained

(Source)

A straightforward walk-through of a useful statistical concept

A tragedy of statistics in most schools is how dull it’s made. Teachers spend hours wading through derivations, equations, and theorems, and, when you finally get to the best part — applying concepts to actual numbers — it’s with irrelevant, unimaginative examples like rolling dice. This is a shame as stats can be enjoyable if you skip the derivations (which you’ll likely never need) and focus on using the ideas to solve interesting problems.

In this article, we’ll cover Poisson Processes and the Poisson distribution, two important probability concepts. After highlighting only the relevant theory, we’ll work through a real-world example, showing equations and graphs to put the ideas in a proper context.

Read More

The Myth Of Us Vs Them

(Source)

Why the “Developed and Developing world” distinction no longer works: Reality Project Episode 4

Of all the myths spread by the media, perhaps none is more detrimental than Us vs Them: the idea that the world is divided into two groups — one good, one evil — and all events can be viewed as a struggle between the two. This two-sided view of the world takes advantage of our natural tendency to form tribes and is applied in many situations, from political parties to economic systems, and, to countries in the form of “developed vs. developing world.” In the last case, this takes the form: developed = rich with low birth rates (assumed to be us) vs. developing = poor with high birth rates (them).

A binary outlook may make for compelling news (conflicts and tribalism are sure ways to get our attention) but it’s false. All of human life, from the personal — income, height, sexuality — to the international — systems of government, economic systems, national wealth — exists not in two states, but along a continuum. Moreover, we dehumanize people by splitting them into two groups letting our instincts and cognitive biases do the thinking for us instead of using our rationality. In this article, the fourth episode of the Reality Project — an effort dedicated to becoming less wrong about the world with data — we’ll see why a separation between developing and developed countries no longer applies and look at factful ways to view nations instead.

Read More

The Reality Of Global Nuclear Weapons And How Russian Nukes Turned On Your Lights

India’s Polar Satellite Launch Vehicle (not a nuclear weapon!) launches in 2018 (Source)

Exploring the data on the decline in worldwide nuclear stockpiles and the most intriguing government program you’ve never heard of: Reality Project Episode 3

On a warm Boston summer in 2018, I was just settling onto the lawn at the Hatch Memorial Shell for a performance of my favorite symphony — Holst’s The Planets — when I saw a tent set up by the Union of Concerned Scientists. As someone generally concerned about the state of science, I wandered over and as I got closer, was drawn to a crowd gathered around a man discussing the threat of nuclear weapons. Having just finished Enlightenment Now by Steven Pinker, I was hoping to hear more positive news about the massive reductions in nuclear weapon stockpiles that occurred over the past 30 years.

Instead, the man’s speech — as Pinker tells us to expect from academics — was entirely negative. The gist was that human folly led us to create weapons which could wipe our species off the face of the Earth and we were in grave danger. Perplexed, when the man paused to take a breathe, I raised my hand and asked if he knew how many nukes there were worldwide and how this compared to numbers in the past. Confidently he replied: “There’s more now than ever before although I don’t know the exact number.”

At this point, buoyed by the confidence (and arrogance) that comes with possessing data someone else doesn’t, I pressed my factual advantage for all its worth stating: “In 1985, there were approximately 70,000 nuclear weapons in the world, and today, in 2018, there are less than 15,000. That represents a reduction of nearly 80%, and what’s more, there are 4 fewer countries with nuclear weapons today.” Surprised, the man asked for a fact check, and after an acceptable source was consulted, he acknowledged the optimistic numbers were correct. While my intention was not to defang the man, this was the unintended effect, and the crowd slowly began to disperse, the gusto gone from the man’s proclamations.

Although I had inadvertently cost the man his entire audience, he agreed to have a discussion with me (after I apologized for the intellectual ambush) and we had a fruitful debate with both of us making concessions: I agreed that 0 nuclear weapons was optimal (although not realistic at the moment) and he said he would reframe his message to emphasize the progress we’ve made in nuclear weapons reduction. Later, as I sat listening to the sounds of Holst’s magnificent work, I thought about what this experience had taught me:

  1. Even experts are seriously wrong in their view of the world.
  2. People assume the worst of humanity in the absence of data.
  3. Don’t act superior to someone when you are correcting them. Remember that you were ignorant as well before you read the statistics.

In this article, we are going to examine the data concerning nuclear weapons around the world. The basic stats above are correct — the number of nuclear weapons and the number of countries holding them have both declined drastically in the past 30 years. However, there are also points about which we should rightly worry, including tension between the US and Russia which threatens arms control treaties and the possession of nuclear weapons by unstable states. In addition to the numbers, we’ll also learn about Megatons to Megawatts, the most intriguing government program you’ve probably never heard of, in which Russian bombs literally turned on your lights.

Read More

A Non Technical Reading List For Data Science

(Source)

Books that will make you a better data scientist without delving into the technical details

Contrary to what some of data scientists may like to believe, we can never reduce the world to mere numbers and algorithms. When it comes down to it, decisions are made by humans, and being an effective data scientist means understanding both people and data.

Consider the following real-life example:

When OPower, a software company, wanted to get people to use less energy, they provided customers with plenty of stats about their electricity usage and cost. However, the data alone were not enough to get people to change. In addition, OPower needed to take advantage of behavioral science, namely, studies showing people were driven to reduce energy when they received smiley emoticons on their bills showing how they compare to their neighbors!

The simple intervention of putting a 😃 on people’s electricity bills when they used less than their neighbors, and a 😦 face when they could do better ended up reducing electricity consumption 2–3%, in the process saving millions of dollars and preventing emissions of millions of pounds of CO2 🏆! To a data scientist, this may be a shock — you mean people don’t respond to pure data !— but this was no surprise to the chief science officer of OPower, Robert Cialdini, a former psychology professor who wrote a book about the human behavior. The takeaway is you can have any data you want but you still need an understanding of how humans work to effect real change.

The most effective visualization isn’t a bar chart, it’s a smiley face.

In our daily work and formal education as data scientists, it’s difficult to get a glimpse into the workings of humans or to take a step back and think about the social implications of our work. Therefore, it’s critical to read not only technical articles and textbooks but also to branch out into works that look at how people make choices and how data can be used to improve these choices.

In this article, I’ll highlight 6 books that are non-technical — in the sense that they don’t delve into the math and algorithms — but critical reads for data scientists. These books are necessary for anyone who wants to accomplish the objective of data science: enable better real-world decisions through data.

The 6 books are listed here with brief reviews and takeaways following:

  1. The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t by Nate Silver
  2. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neill
  3. (Tie) Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian and Tom Griffiths and How Not to be Wrong: The Power of Mathematical Thinking by Jordan Ellenberg
  4. Thinking, Fast and Slow by Daniel Kahneman
  5. (Dark horse) The Black Swan: The Impact of the Highly Improbable by Nassim Nicholas Taleb
Read More

The Next Level Of Data Visualization In Python

(Source)

How to make great-looking, fully-interactive plots with a single line of Python

The sunk-cost fallacy is one of many y spent — sunk — so much time in the pursuit. The sunk-cost fallacy applies to staying in bad jobs longer than we should, slaving away at a project even when it’s clear it won’t work, and yes, continuing to use a tedious, outdated plotting library — matplotlib — when more efficient, interactive, and better-looking alternatives exist.

Over the past few months, I’ve realized the only reason I use matplotlib is the hundreds of hours I’ve sunk into learning the convoluted syntax. This complication leads to hours of frustration on StackOverflow figuring out how to format dates or add a second y-axis. Fortunately, this is a great time for Python plotting, and after exploring the options, a clear winner — in terms of ease-of-use, documentation, and functionality — is the plotly Python library. In this article, we’ll dive right into plotly, learning how to make better plots in less time — often with one line of code.

All of the code for this article is available on GitHub. The charts are all interactive and can be viewed on NBViewer here.

Example of plotly figures (source)

Read More