This weekend I decided to dive into the Categorical Feature Encoding Challenge II that Kaggle is hosting so I could learn some things about using categorical features in machine learning models. For this challenge, the dataset contained only categorical features and also different kinds of categorical features: Binary, nominal, and ordinal. On top of that, some of the features had low cardinality, while other features had high cardinality. Cardinality In the context of machine learning, “cardinality” refers to the number of possible values that a feature can assume.
We’re starting to use SMS at work for communicating with customers, and there was a need for a tool that would allow us to send SMS messages and check the history of calls and messages to the phone number we are sending from. With that in mind, I coded a simple frontend to the Twilio API. The frontend is a page with 4 navigation tabs that allow sending a single SMS, sending SMS in bulk, seeing the status of sent SMS messages, and fetching the inbound call/SMS log.
Continuing to practice my python skills. I decided to try modeling the Telco Customer Churn dataset from Kaggle. Churn is when customers end their relationship with a company (e.g., by cancelling their subscription to a service). Companies want to retain customers, so understanding and preventing churn is naturally an important goal.
I wanted to try out some of what I’ve learned with python for data science. I thought: Why not try it on the Kaggle Titanic challenge?
I was inspired to participate in Kaggle’s 2019 Data Science Bowl. In this post I link to some of my exploratory analysis and predictive models.
I made an algorithmic trader in R based on the “moving average crossover” technical indicator and performed a backtest using the TSLA stock. It did not perform well, but it was an interesting challenge that involved some timeseries analysis.
Using body measurement data from the National Health and Nutrition Examination Survey (NHANES), I created a model that predicts Gildan t-shirt sizes from height and weight.
A mini-tutorial on web scraping with R.
I wrote an R package for the Nutritionix API to do nutrition analysis on foods and recipes.
An analysis of the data science job market from scraped LinkedIn data