The reality of data science
It’s Friday evening, another week comes to a close and the weekend can start. Yet somehow, I can’t fully disconnect. That’s usually what happens after a long week filled with focus time, deep brainstorm sessions and lots of data. I like days like this: I got into this field to work with complex problems and spend my days solving them. But it doesn’t make them any less exhausting.
What strikes me the most on this Friday evening is the contrast between how people talk about data science on the Internet vs. how the job truly is on a day-to-day basis. When I finished my PhD and looked to pivot to data science, I spent a lot of time reading about the field and learning all about the technical aspects (statistics, machine learning, deep learning, etc.). In terms of raw knowledge, I learnt a lot during that time which gave me the confidence I could tackle any problem. And I wasn’t wrong: so far, that knowledge has served me well and has gotten me quite far on a number of topics I encountered on my day-to-day. Yet there is an entire aspect to this job you just don’t learn from blog articles: the business side.
As a data scientist, your value is dependent on the impact of your analysis or model on the business outcomes, whether that is direct (e.g., customer facing predictive model) or as an enabler for data-driven decisions (e.g., enhancing planning through forecasting). The area where most of us technical contributors generally struggle is in communication. Communicating your results with stakeholders on the business side can be quite difficult. It sometimes feels like speaking a different language, yet your entire career depends on doing this well. This is true everywhere, including (especially?) at large companies like IKEA. But that’s not what got me today. I’m quite lucky to have a large team of fellow data scientists and engineers with whom I can debate and discuss technical results so that we can refine our message before it reaches our stakeholders. This really makes my life easier and is a great way to continuously improve that skill.
No, the reason I’m writing this blog post on a Friday evening is because sometimes, an analysis that’s been running for weeks ends up yielding a disappointing result. If I’ve learnt anything during my time in academia, it’s that negative results are as important as positive ones. Yet in a business setting, it’s all too easy to think you must have good news constantly. This post is about how that just isn’t true. Let me give you some more context.
I’ve been working for about a year now on a tool to disaggregate high-level revenue forecasts into forecasted basket demand (whose aggregate matches the high level forecast). If we only converted a revenue forecast into a forecast of item demand, this would be your usual reconciliation problem for hierarchical forecasts. But our tool adds an extra layer of sophistication to this problem: by their very nature, baskets encode correlations between items sold together in a given area, which is valuable information. So our data product is both a data pipeline and a complex optimization algorithm. And it works well!
Over the past month, we’ve been investigating using this algorithm on a specific use case, relying on secondary and tertiary basket properties obtained from our disaggregation. The end goal was to compare this output to the naive reconciliation method I mentioned before. So far, this has been some of the most interesting work I’ve done at IKEA, data science-wise. It involved reconciling various sources of data (generated by different models, queried from different systems, etc.), comparing model performance across various dimensions, all the while keeping an eye on the business requirements. When phrased this way it doesn’t sound too hard: just join some tables, make a few plots and find out what happened, right? Alas, life is often a lot more fickle than this. What I ended up doing for the better part of two weeks is deep diving into the data to figure out where tiny discrepancies were coming from: why are there no sales originating from this unit? Why are the total sales from one source so different from another?1 It takes quite some investigative skills to explain all these data quality issues, yet dealing with them is paramount to the overall quality of the analysis. Don’t get me wrong, I actually love this part. Surmounting all these data issues is often the best way to truly understand your domain, and gather novel insights. And that’s what I’m being paid for!
This brings me to what I wanted to discuss. Today, after more than two weeks of analysis, I had to conclude that our nice, sophisticated method underperforms against the naive, baseline method.2 That’s a bummer. Tonight, I feel bummed out. Not because I necessarily care what method we end up using (spoiler alert: I don’t, as long as it works), but because my teammates and I have invested a lot of time and energy in this project, and sometimes you just need a win. You just want all that work to amount to something that will be used by others and will bring value. Alas, for this use case, it didn’t go this way.
So what’s next?
As I said above, tonight I’m bummed, that’s okay. But Monday will come, and I will dust myself off and keep going. Because there’s some upside to this:
- This wasn’t the main use case for our product, only a tangential one. So not all is lost!
- We learnt a lot about our model during this analysis, and solved some critical bugs.
- The simplest model won. That’s actually good news! Simple models are cheap and easy to explain.
- Diving deep in the data made it clear what works and what doesn’t, so we know what to work on next.
I said before that in a business setting, it feels like you must always go to your stakeholder with good news. I mean, otherwise, how am I bringing value? Well I was hired for my expertise and my time. It doesn’t mean every idea will automatically work out, because there is a certain degree of uncertainty in the research part of data science. What’s important is to learn from what didn’t work, to find some value even in a failed experiment or a disappointing model performance. That’s the essence of research and learning how to do this is how you can drive meaningful business impact.
Footnotes
-
One of the harshest lessons I learnt in the past 16 months is that companies, no matter how large, will be very chaotic in how they handle data. That is true even for something as crucial as what and how much we are selling… ↩
-
To be fair, simple methods often tend to work surprisingly well, which is why they should always be investigated first! ↩