Most of my programming career has been focused on keeping things simple and eschewing premature abstractions summarized aptly by: “duplication is far cheaper than the wrong abstraction”

Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary. We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it (the “sunk cost fallacy“). It’s as if our unconscious tell us “Goodness, that’s so confusing, it must have taken ages to get right. Surely it’s really, really important. It would be a sin to let all that effort go to waste.”

I’m not a data scientist but I’ve been trained for it (as part of my computer science degree) and I’ve worked in a bunch of related fields. I wholly endorse this take by Vicki Boykis on the state of data science and what you need to be successful at it.

1. Learn SQL

2. Learn a programming language extremely well and learn programming concepts

3. Learn how to work in the cloud