A few weeks ago we hosted our annual Data Science Hackathon. This has become an annual Geolytix tradition, and this was the 4th iteration. With previous hackathons focussing on regression, segmentation and analogue models, this year the goal was for each team to build a binary classification model.
The Goal
The main aim of the day is for everyone to try something new, in a no pressure environment, and hopefully learn something along the way! As for the exercise, this year’s goal was to accurately predict the results of direct marketing campaigns, for a Portuguese banking institution (a bit different to our usual spatial questions), using a classification approach of each team’s choosing. A feature matrix was provided, though feature engineering was required, with a few red herrings thrown into the mix. This included details on the customer themselves, wider market conditions, previous responses and other factors which might help determine how likely an existing customer is to purchase a particular product.
There was a host of different techniques used, and it was great to see so much variety and each team trying something new! Below are some of the approaches used:
- Logistic regression
- Various GBMs (e.g. AdaBoost, CatBoost, LightGBM etc.)
- Neural networks
- Direct outputs from GPT4o
- TabNet
The models were then all applied to the same hold-out set. Impressively, all teams managed to build working models in just a single day, including visualisations on model contributions for 3 specific example hold-out records.
We presented back the approaches, results and learnings at our monthly Data Science Forum. Discussion points included which metrics were most suitable when measuring the success of our models. This is an especially important question for a problem like this, where finding a “yes” is more important than avoiding a “no”, as call durations tend to be short (hence low-risk), and a customer subscribing to a product can be profitable to the bank. In our sample, ~1 in 10 customers would subscribe to, so there was significant bias in the likelihood of not subscribing. With this in mind, a combination of recall, F1 score and MCC were determined to be the most appropriate measure of model quality, in our case.
Next year will be our fifth hackathon, so ideas on a postcard for our next challenge!
Whilst a little different from our usual geospatial, if any of this sounds up your street, we’re also currently hiring for a Data Scientist position. Please see here for details: https://geolytix.com/blog/could-you-be-our-next-data-scientist/

Author: Danny Hart, Head of Data Science at GEOLYTIX
Main Image: Photo by Matt Ridley on Unsplash