Exploratory Data Analysis With R No Code

Examining the Doctor’s Appointment No-Show Dataset

Author’s Note: The following exploratory data analysis project was completed as part of the Udacity Data Analyst Nanodegree that I finished in May 2017. All code for this project can be found on my GitHub repository for the class. I highly recommend the course to anyone interested in data analysis (that is anyone who wants to make sense of the mass amounts of data generated in our modern world) as well as to those who want to learn basic programming skills in an applied setting. This version of the Exploratory Data Analysis project has all the code removed for readability. The version with all the R code included is also on Medium.

Doctor’s appointment no-shows are a serious issue in the public health care field. Missed appointments are associated with poorer patient outcomes and cost the health care system in the US nearly $200 each. Therefore, it comes as no small surprise that reducing the rate of no-shows has become a priority in the United States and around the world. Numerous studies have been undertaken in order to determine the most effective means of reducing rates of absenteeism at with varying degrees of success. The first step to solving the problem of missed appointments is identifying why a patient skips a scheduled visit in the first place. What trends are there among patients with higher absence rates? Are there demographic indicators or perhaps time-variant relationships hiding in the data? Ultimately, it was these questions that drove my exploratory data analysis. I was curious as to the reasons behind missed appointments, and wanted to examine the data to identify any trends present. I choose this problem because I believe it is an excellent example of how data science and analysis can reveal relationships which can be implemented in the real-world to the benefit of society.

I wanted to choose a dataset that was both relatable and could be used to make smarter decisions. Therefore, I decided to work with medical appointment no shows data available on Kaggle. This dataset is drawn from 300,000 primary physician visits in Brazil across 2014 and 2015. The information about the appointment was automatically coded when the patient scheduled the appointment and then the patient was marked as having either attended or not. The information about the appointment included demographic data, time data, and conditions concerning the reason for the visit.

Read More

Facial Recognition Using Googles Convolutional Neural Network

Labeled Faces in the Wild Dataset

Training the Inception-v3 Neural Network for a New Task

In a previous post, we saw how we could use Google’s pre-trained Inception Convolutional Neural Network to perform image recognition without the need to build and train our own CNN. The Inception V3 model has achieved 78.0% top-1 and 93.9% top-5 accuracy on the ImageNet test dataset containing 1000 image classes. Inception V3 achieved such impressive results — rivaling or besting those of humans — by using a very deep architecture, incorporating inception modules, and training on 1.2 million images. However, this model is limited to identifying only the 1000 different images o which it was trained. If we want to classify different objects or perform slightly different image-related tasks (such as facial verification), then we will need to train the parameters — connection weights and biases — of at least one layer of the network. The theory behind this approach is that the lower layers of the convolutional neural network are already very good at identifying lower-level features that differentiate images in general (such as shapes, colors, or textures), and only the top layers distinguish the specific, higher-level features of each class (number of appendages, or eyes on a human face). Training the entire network on a reasonably sized new dataset is unfeasible on a personal laptop, but if we limit the size of the dataset and use a consumer-grade GPU or a Google Cloud GPU compute engine, we can train the last layer of the network in a reasonable amount of time. We probably will not achieve record results on our task, but we can at least see the principles involved in adapting an existing model to a new dataset. This is generally the approach used by industry (embodying the DRY: Don’t Repeat Yourself programming principle) and can achieve impressive results on a reduced time-frame than developing and training an entirely new CNN.

Inception Module: The Building Block of the Inception CNN

Read More

Exploratory Data Analysis With R

Examining the Doctor’s Appointment No-Show Dataset

Author’s Note: The following exploratory data analysis project was completed as part of the Udacity Data Analyst Nanodegree that I finished in May 2017. All code for this project can be found on my GitHub repository for the class. I highly recommend the course to anyone interested in data analysis (that is anyone who wants to make sense of the mass amounts of data generated in our modern world) as well as to those who want to learn basic programming skills in an applied setting.

Abstract

Doctor’s appointment no-shows are a serious issue in the public health care field. Missed appointments are associated with poorer patient outcomes and cost the health care system in the US nearly $200 each. Therefore, it comes as no small surprise that reducing the rate of no-shows has become a priority in the United States and around the world. Numerous studies have been undertaken in order to determine the most effective means of reducing rates of absenteeism at with varying degrees of success. The first step to solving the problem of missed appointments is identifying why a patient skips a scheduled visit in the first place. What trends are there among patients with higher absence rates? Are there demographic indicators or perhaps time-variant relationships hiding in the data? Ultimately, it was these questions that drove my exploratory data analysis. I was curious as to the reasons behind missed appointments, and wanted to examine the data to identify any trends present. I choose this problem because I believe it is an excellent example of how data science and analysis can reveal relationships which can be implemented in the real-world to the benefit of society.

Read More

The Ascent Of Humanity

A Review of Sapiens by Yuval Harari

One-sentence summary: The history of humanity is best viewed as three revolutions: the cognitive revolution beginning 70,000 years ago characterized by the development of language; the agricultural revolution that began 12,000 years ago and led to the first permanent settlements enabled by large-scale cooperation; and the scientific revolution which commenced around 1600 when the modern ideals of humanism, liberalism and democracy were first adopted and technological progress began its exponential path.

It’s perfectly acceptable if you graduated from high school with no desire to ever pick up a history book again. The endless listing of names and dates typical of the American history curriculum is incredibly effective at driving any enthusiasm for studying the past out of students. My knowledge of history post-high school consisted of a jumbled mix of (entirely American) names and events (Betsy Ross wrote the Constitution right?). The few times I went so far as to begin a history book since then, I have felt my mind shut down at the first mention of a name-date-event combination. I enjoy books with bold ideas, those that examine trends and try to explain movements, rather than those that get mired in the endless details of who exactly did what when. Some history books seem like they are on the verge of stepping back and looking at the big picture only to zoom right back in and lose the forest in a thicket of trees. I like the sound of studying the past to learn from our mistakes, but when the past is presented in list form, it can be pretty hard to take away anything relevant. Therefore, I was excited if somewhat skeptical when I heard about Sapiens: A Brief History of Humankind, a book with an idea no less grand than the entire story of humanity, from our first upright steps on the African Savannah 2 million years ago to this very day (and even slightly into the future). The fact that the book seeks to explain central driving themes of human progress gave me hope that this would be a history book that eschewed traditional formats.

Read More

Make An Effort Not An Excuse

Badwater Ultramarathon: The Ultimate in No Excuses

Overcoming the “I would, but…” response

There must be a universal law prohibiting the discussion of New Year’s Resolutions after the first three weeks of January. By that point, even the most determined of us have lost our resolve and our collective shame renders the subject taboo. Naturally, this means that I decided the best time to discuss my goals for the year was in the middle of summer. For such an extremely data-driven person — there isn’t a product on my desk that didn’t get a 4.5 customer rating or above — it may be surprising that I forgo specific quantifiable resolutions. That is, I don’t outline a certain number of tasks that I need to finish by a certain date. Instead, I like to think of yearly themes. If this sounds a little abstract, and too much like some self-help magazine headline (“The year of holistic harmony”), stick with me. The theory behind a yearly principle is straightforward; it is not the infrequent, large decisions that dictate the course of our lives, but the countless everyday choices we make without a second thought. Where we go to college might matter in the long run, but our options for college were dictated by thousands of smaller decisions leading up to the decisive moment, such as whether or not we decided to study for that extra hour, or if we took the few minutes to fill out a scholarship form. Keeping a yearly theme in mind means we can make these daily decisions not by mere habit, but in the context of a guiding precept.

To give this concept some firm backing, consider some past yearly themes: efficiency: don’t work more, work more effectively; relax: this test, project, game, etc. will not dictate the rest of your life; and failure: accept mistakes as a chance to learn. If the yearly theme is learning from failures rather than trying to avoid them, then we can implement this by being more willing to try new things or take on difficult challenges that carry a chance we won’t succeed. Likewise, a yearly theme of efficiency means I might lock my computer down from every program except Word (or in reality Sublime Text 3 or Microsoft Visual Studio) for 4 hours a day and get all my work done in that period instead of spending 10 hours a day on my computer where half of that time is spent on various distraction-delivery platforms. We don’t need to repeat the mantra over and over, or use it for every single decision — trying to pick out a breakfast cereal that agrees with the theme of relaxation could be a real conundrum — but by keeping an overarching principle in mind, we can subtly influence the everyday decisions that over time determine the course of our lives.

Read More