Before We Begin..
First, let me go ahead and say – by no means am I a Machine Learning Engineer. I’ve done a few projects with machine learning before and wanted to get better at it. That is why I’ve wanted to do this tutorial for a while too. So, please do not take this tutorial as completely correct, but rather a learning tool for you and I both. I repeat, I am not a machine learning engineer – but from job descriptions and research this is what I think a machine learning engineer would need to know. Let’s get started.
Breaking down the Job requirements above
This particular position was for a senior level machine learning engineer position. However, I thought it would provide a good goal for us to work towards throughout these tutorials.
I’m going to go through these requirements with what I think whenever I read them – not necessarily with the mindset of a recruiter. If you happen to be a recruiter for these types of positions please let me know and I’ll gladly update this post.
4-7 Years software development experience with highly scalable systems involving machine learning and big data
On a side note, I hate positions that have hard requirements for the number of years someone spends in the industry. I get that recruiters need a way to narrow down applicant fields immediately, but I believe applicants should be judged by their abilities and not the time in the field. Just my two cents.
Anyways, to me this says you need machine learning and big data experience – of course right? Highly scalable also says that we should be able to know how to arrive at results whenever we’re processing as much as 50 MB of data or 300 GB+. We should have techniques in mind for both small data sets and much larger ones too.
Expertise with data analysis languages
Sorry friends of R and Scala, but Python is the bread and butter of my blog. So we’ll assume that we can get by with just Python. However, learning new coding languages isn’t bad whenever we know one. We’ll touch up on Python and see how to use it with machine learning packages.
Experience with Hadoop and Spark – or equivalent cluster.
Don’t worry – I had to look these up too so we’ll get through this together at some point in this tutorial. Apache Hadoop is a collection of open source utilities that facilities the usage of a network of computers to compute through massive data loads. Seems pretty powerful if we’re able to figure it out.
Apache Spark seems to be comparable but made for optimizing the speed of the application while Hadoop is more robust.
Experience with Cloud Computing
No brainer right? If we’re going to be heading up a project involving a huge amount of data to include we’re going to need the security and scalability of a cloud application. I’m far from knowledgable about every application in AWS, Azure, or Google computing, but I’m going to gamble on AWS for this tutorial. With the constant addition of new services to Amazon’s Web Services, I feel our time in this tutorial could best be spent on this platform.
Experience with Source Control
Tough decision here, but… We’re using Git. No question.
But in all seriousness, we’re going to need to understand and be good with Git in any developer job in any context. I think Git should be the second thing you learn programming wise – right after your programming language of choice.
Advanced Degree (MS)
Sorry Indeed Job board, I’m not going back to school any time soon. So, we’ll just have to be extraordinary with everything else listed.
I may be a little biased due to being a self taught developer – but I think the skills are a lot more important than the degree.
But hey, I’m always up for going back to college if it’s paid for by the company. Right?
Let’s Start Learning
Since I’m learning these as we go too, I’ll use this space as a on going reference to the tutorials of the information I’ve learned along the way – coming back and editing as new posts go up. If you land here and we havn’t gotten to the tutorial you’re needing then beat me to making it and send me the link!
The X’s below will represent tutorials we havn’t quite made it to yet while checkmarks will represent completed ones.