Introduction This week, I’m taking a break from the HDB resale flat pricing model and turning my attention to a hotter topic: mobile phone plans. Some of my colleagues noted an arbitrage opportunity in Singtel’s offer for the iPhone XS / XS Max on its Combo 12 plan. The idea...
[Read More]
HDB Resale Flat Dataset - Feature Engineering III: Categorical Features
Introduction
Thus far, I have written about ways to transform and encode numeric features, and to geocode and develop clusters of resale flats. These two steps generated new categorical features. In this post, I will demonstrate several ways to encode categorical features for our machine learning model.
[Read More]
HDB Resale Flat Dataset - Feature Engineering II: Geocoding
Introduction In my second post on HDB resale flat prices, I attempted to price resale flats in Jurong West by creating clusters of flats. The big idea behind this methodology was the realisation that it is impossible to explicitly account for all qualitative reasons why home buyers would want to...
[Read More]
HDB Resale Flat Dataset - Feature Engineering I: Numeric Features
Introduction
In this post, I demonstrate two broad techniques for engineering numeric features in the HDB resale flat dataset: data transformation and binning. The objective for engineering numeric features is to convert them into a form that facilitates machine learning.
[Read More]
(Re-)Exploring HDB Resale Flat Data in 17 Graphs
A New Approach About a year ago, I published my first post on data&stuff. I applied econometric techniques to develop three least squares regression models to explain HDB resale flat prices. A year on, I’m re-visiting the expanded dataset (now includes an additional year of data) with new skills and...
[Read More]
Preventing Training Data Contamination
Contamination? Lately, I’ve been working a lot on the Kaggle Titanic competition. To pick up good data science practices, I’ve been taking the time to work through many issues in my machine learning pipeline. One issue that I discovered in the pipeline was contamination. Contamination occurs when you include information...
[Read More]
A Fresh Start!
WordPress to GitHub, R to Python.
It’s been a while since I’ve posted on my data&stuff (WordPress) blog. The reason is that I’ve come to the realisation that while R is extremely useful, Python is the future. I spent the past few weeks picking up Python: I did several online courses and I’m currently working on...
[Read More]