Hello!"Welcome to my messy workshop, let me show you behind the scenes!" — Data Science Renee
This is my blog / portfolio. Here are some of my favorite posts in recent memory:
In this blog post, I build a machine learning model to predict possible cases of cognitive impairment / dementia in a population of individuals over the age of 60. My data for this model comes from the 2013-2014 NHANES (National Health and Nutrition Examination Survey) study cohort, which is a nationally representative, longitudinal study of health in the US. As an outcome measure, I’ll create a composite index of cognition by combining data from the Animal Fluency and Digit Symbol Substitution tasks.
The Pareto Principle says that “for many events, roughly 80% of the effects come from 20% of the causes”. What if you could get 80% of the enjoyment from a TV show by watching only the top 20% of its episodes?
It can be useful to know how many people live near a landmark / point-of-interest (POI). For example, a location is often considered “walkable” if you can walk to it in 10 minutes or less. Understanding how many people live near a POI is one way of estimating how many people are within walking distance of a POI, if they were to walk from their home to the POI. In this post, I start with a point-of-interest, “Times Square, NYC”, and using the Census API I find out how many people live within the census tract that contains this POI (a tract is one of the smallest sub-divisions for which the Census provides population estimates).
In my last post I used EfficientNet to identify plant diseases. I was surprised at how well this pre-trained model worked, with so few modifications, and I was curious how an approach like this might generalize to other visual image detection problems. In this post I use a similar approach to identify childhood pneumonia from chest x-ray images, using the Chest X-Ray Images (Pneumonia) dataset on Kaggle. Using this approach, I was able to achieve 97% accuracy, 97% precision, and 97% recall.
As I continue to practice using tensorflow for image recognition tasks, I thought I would experiment with the Plant Pathology dataset on Kaggle. Like MNIST, this is an image recognition challenge. But in contrast to the simplicity of MNIST, this challenge is about making “fine-grained” visual discriminations. The images are larger and in RGB color, and the features are smaller and more nuanced. I ran into a few challenges here because the task was so compute intensive.
I’ve been working my way through the TensorFlow in Practice Specialization on Coursera. I’m learning how to use neural networks to solve problems like image recognition. I decided to take a break from the course and try applying what I’ve learned so far to one of the Kaggle competitions. The MNIST is a database of more than 50,000 handwritten numbers. The goal, usually, is to train a model that can be used for digit recognition.
I’ve been collaborating on a mini research study in collaboration with Matt Wallaert (@mattwallaert) to better understand why White Men win at work. Past research has found that White Men are more likely to take risks and at work and reap the rewards of “failing upward,” compared to Women and People of Color. But they’re not more talented — they’re just more confident. What responsibility do organizations and leaders have to make everyone feel safe to take risks and feel that it’s OK to fail sometimes?
In a previous blog ("Modeling the UCI Heart Disease dataset") I trained a model to predict the presence of heart disease. So I have a model, now what? Machine learning models like this can be put to work generating predictions on new inputs, and they’re great for simulations as well. Let’s say we wanted to know the likelihood of heart disease for a 60 year-old male with a cholesterol value of 244, and a resting blood pressure value of 88.
Does the growth in COVID-19 cases have anything to do with Big 5 Personality traits? To answer this question, I compute country-level aggregates on the Big 5 test, and a country-level aggregate that represents for “growth” over time in coronavirus cases, using data current as of March 20, 2020.
Using logistic regression, I trained a machine learning model to predict heart disease, using 14 attributes and 303 observations (e.g., age, sex, chest pain, resting ECG).