"Welcome to my messy workshop, let me show you behind the scenes!"Data Science Renee

This is my blog / portfolio. Here are some of my favorite posts in recent memory:

Modeling cognitive impairment using NHANES data

In this blog post, I build a machine learning model to predict possible cases of cognitive impairment / dementia in a population of individuals over the age of 60. My data for this model comes from the 2013-2014 NHANES (National Health and Nutrition Examination Survey) study cohort, which is a nationally representative, longitudinal study of health in the US. As an outcome measure, I’ll create a composite index of cognition by combining data from the Animal Fluency and Digit Symbol Substitution tasks.

How many people live near a landmark / point-of-interest?

It can be useful to know how many people live near a landmark / point-of-interest (POI). For example, a location is often considered “walkable” if you can walk to it in 10 minutes or less. Understanding how many people live near a POI is one way of estimating how many people are within walking distance of a POI, if they were to walk from their home to the POI. In this post, I start with a point-of-interest, “Times Square, NYC”, and using the Census API I find out how many people live within the census tract that contains this POI (a tract is one of the smallest sub-divisions for which the Census provides population estimates).

Identifying pneumonia from chest x-rays using EfficientNet

In my last post I used EfficientNet to identify plant diseases. I was surprised at how well this pre-trained model worked, with so few modifications, and I was curious how an approach like this might generalize to other visual image detection problems. In this post I use a similar approach to identify childhood pneumonia from chest x-ray images, using the Chest X-Ray Images (Pneumonia) dataset on Kaggle. Using this approach, I was able to achieve 97% accuracy, 97% precision, and 97% recall.

Identifying plant diseases with EfficientNet

As I continue to practice using tensorflow for image recognition tasks, I thought I would experiment with the Plant Pathology dataset on Kaggle. Like MNIST, this is an image recognition challenge. But in contrast to the simplicity of MNIST, this challenge is about making “fine-grained” visual discriminations. The images are larger and in RGB color, and the features are smaller and more nuanced. I ran into a few challenges here because the task was so compute intensive.

MNIST digit recognition using a convolutional neural net (CNN)

I’ve been working my way through the TensorFlow in Practice Specialization on Coursera. I’m learning how to use neural networks to solve problems like image recognition. I decided to take a break from the course and try applying what I’ve learned so far to one of the Kaggle competitions. The MNIST is a database of more than 50,000 handwritten numbers. The goal, usually, is to train a model that can be used for digit recognition.

Why White Men win at work

I’ve been collaborating on a mini research study in collaboration with Matt Wallaert (@mattwallaert) to better understand why White Men win at work. Past research has found that White Men are more likely to take risks and at work and reap the rewards of “failing upward,” compared to Women and People of Color. But they’re not more talented — they’re just more confident. What responsibility do organizations and leaders have to make everyone feel safe to take risks and feel that it’s OK to fail sometimes?

An interactive ML tool for predicting heart disease

In a previous blog ("Modeling the UCI Heart Disease dataset") I trained a model to predict the presence of heart disease. So I have a model, now what? Machine learning models like this can be put to work generating predictions on new inputs, and they’re great for simulations as well. Let’s say we wanted to know the likelihood of heart disease for a 60 year-old male with a cholesterol value of 244, and a resting blood pressure value of 88.