Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

Latest commit

 

History

History
290 lines (220 loc) · 13.6 KB

datascience-misconceptions.md

File metadata and controls

290 lines (220 loc) · 13.6 KB
title date category slug summary description cover showtoc
Unveiling Common Misconceptions of “Democratizing” Data Science/Machine Learning
2020-06-15
Data Science
unveiling-common-misconceptions-of-democratizing-data-science-machine-learning
Are you aspiring to be a Data Scientist or a Machine Learning expert? Then allow me to cut through the hype & clear some of the misconceptions you might be already strangled in.
Are you aspiring to be a Data Scientist or a Machine Learning expert? Then allow me to cut through the hype & clear some of the misconceptions you might be already strangled in.
image alt caption relative
A man facepalming himself while seated on a couch
Photo by Nik Shuliahin on Unsplash
false
true

Around 4 years ago, it was this specific video — MarI/O — Machine Learning for Video Games on YouTube which piqued my interest in Artificial Intelligence & Machine Learning. Being an avid gamer as well as also having an academic background in Economics, I thought to myself, “Oh I already have half of the skills required to make Mario do stuff like this on his own”.

You see, that was the first misconception I had about Machine Learning (or Data Science in general). Little did I know what Reinforcement Learning was, where & how it was used. But did I care? Nope. All I thought about was creating my own ML model.

The lesson here? Misconceptions arise due to half-baked knowledge & lack of curiosity to dig further into the rabbit hole. [1]

Regardless, I taught myself to code & now I provide my expertise in Computer Vision & ML to my clients.

What Can You Expect From This Article

Looking at recent trends, AI is definitely on the rise. It would be unwise for anyone to lose out on the abundant employment opportunities at the moment. Luckily, it’s easier now, then ever was, to dive into the field of Machine Learning or Data Science. With the thousands of online resources available right now, it’s easier to learn ML all by yourself.

But with ease of access to learning resources comes a major caveat.

You learned to code, you just figured out what an SVM is & how to tune its hyper-parameters to achieve great results. All good & dandy, but now ask yourself. Do you’ve any clue on how to implement your newfound knowledge & skills into real-world problems?

If you’re dumbfounded with the lack of an answer to the question, well then, give the rest of the article a quick read.

Through this article, I hope to clear up certain misconceptions you might’ve before you dive into the field of Machine Learning.

Watch & Learn From the Humble Giants

Andrew Ng & Jeremy Howard, two of the pioneers in the field of Machine Learning, have been humble enough to share their knowledge without any monetary compensation in return. Their motive behind doing so? Just to make the world a better place by helping the community thrive & enabling them to be more employable sometime in the near future.

Years ago when I wrote my first print(“Hello World”) statement on Python, their resources were invaluable for me to build the basic foundation of the aspirations I’ve right now. But was it enough to be actually employable in the industry?

Not at all.

The nature of online learning resources out there like boot-camps, MOOC, video tutorials, etc is, they’re marketed to “Teach Yourself To Code”. True to their words, that’s exactly what you invest your time & money in. You learn to code, not understand the why-to-code in a very specific manner or style.

This creates a problem of its own; The students following the courses mentioned above take it hoping it would be enough to land a job in real life. But if that was the case, the 3M+ of students who take an Udemy course alone would be more than enough to fill up the vacant ML jobs in the industry.

So, obviously, MOOC or other self-paced online courses willn’t cut the thread for you.

Here’s why.

The Pitfalls of Democratizing Machine Learning Educational Resources

The field of Machine Learning or Data Science, in general, is dominated by academicians primarily. The trend appears to be changing gradually for the better due to certain academicians who’re willing to part with their knowledge for the community.

This philanthropic approach of the individuals enabled hundreds of thousands of individuals around the globe to seek employment opportunities in the field. Besides, we can also see how fast the field is advancing just because they decided to share their knowledge with the rest of the world.

Put simply, democratizing is good for the community as a whole.

But “Democratize Data Science & Machine Learning” is a buzz right now anyway. Everyone wants to create a tutorial or two for the community. So why did I point it out specifically, you might wonder to yourself?

Well because I want the community to not be misguided.

To reiterate the statement, let me give you an example of a real-life incident as described by another writer, Rahul Agarwal.

Rahul Agarwal, in his article — Don’t Democratize Data Science, stated an instance of an interview where the candidate was experienced & was perhaps self-taught too. [2]

Quoting from his article;

…He explained the higher-level concepts well enough that I decided to dig a little deeper into his mathematical understanding of the techniques he had applied in his projects. And that was where things changed…

His account of the interview is a prime example of an individual who relied solely on just skimming through tutorials, creating projects one after the other & building a sound portfolio. Suffice to say, just learning how to code isn’t going to take you far enough.

What can we learn from this context?

You might be an all-star programmer but if you fail to understand basic underlying concepts, you won’t progress far enough to be employable.

Why is it so?

You see, most businesses often operate on a shoestring. Besides, production environments are volatile, susceptible to not just the market competition but the whims of the consumers as well. This makes businesses to be in an unfavourable position & forced to pay attention to the tiniest bit of details in the product.

From speeding up the data pipeline to making inferences in milliseconds can be a matter of making a huge profit or going bankrupt. To have an eagle eye for such precision is what Rahul tried to convey through his article.

But You Don’t Always Have To Have an Eagle Eye

Contrary to Rahul’s opinion, yet another expert in the field of Machine Learning & an entrepreneur is Caleb Kaiser. He is an advocate of making ML software available to those who need it gravely, especially developers who’re not Data Scientists. His article — Deep Learning Isn’t Hard Anymore is a wonderful report in the field of production capable ML software.

In context to this write-up, quoting an excerpt from his article;

…people within the community develop libraries and projects that abstract common utilities away until the tooling is capable and stable enough to be used in production.

*At this stage, the engineers using it to build software are not concerned about sending HTTP requests or connecting to databases — all of that is abstracted away — and are solely focused on building their product.* [3]

The key phrase here — “all of that is abstracted away”.

Now as an entrepreneur, he’s obligated to answer not just to his employees but his financial backers about the product. Him, trying to figure out the underlying concepts of the inner working of a product could be a waste of time. Besides, considering his position in the company, him spending time learning something new unrelated to his business is money lost for the company. Instead, he could just hire someone else, an expert in the craft, to do it instead.

My point being, it’s fine if you lack knowledge of the underlying concepts of Machine Learning if your priorities are, say running a company. But if you’re looking for employment as an expert in Machine Learning, well you gotta do what you’re expected to do.

AutoML Isn’t The Evil Genius, Ready To Take Away Your Job

I really don’t understand the logic behind speaking against automation. Why get your hands dirty when the job could be automated not just efficiently but saving time for you to work on something else! Regardless, I guess the history of protesting against automation goes back in time to when mankind had just started innovating technology.

Image of the leader of the Luddites

Read up on the Neo Luddism movement, started by an organization named Luddite. Heck, there’s also a term we use in Economics to refer to the fear of technological unemployment called the Luddite Fallacy.

Speaking more about Luddism would be out-of-context of the article, so that’s a topic for another day.

Anyway let me direct your attention to Rahul’s article once again, here’s something that he mentioned which I can’t bring myself to agree with.

*…The availability of such packages has led a lot of people to think that data science could be fully automated, eliminating the need for data scientists altogether. Or, if the processes can’t be automated, these tools will allow anyone to become a data scientist*. [1]

I believe he’s concerned about the advent of AutoML software like MindsDB which is completely baseless.

MindsDB is an amazing piece of open-source AutoML software though, check them out.

The misconception of anyone being able to create Neural Nets using AutoML software arose after Sundar Pichai announced Google’s new invention to the global audience. In his pitch, he stated AutoML would enable our community to create & design Neural Nets which was previously a skill held by only a few with a Ph.D.

And I quote him;

Today, designing neural nets is extremely time-intensive, and requires an expertise that limits its use to a smaller community of scientists and engineers. That’s why we’ve created an approach called AutoML, showing that it’s possible for neural nets to design neural nets. We hope AutoML will take an ability that a few PhDs have today and will make it possible in three to five years for hundreds of thousands of developers to design new neural nets for their particular needs. [5]

Did you read where he mentions AutoML is meant for developers? Understand what that means.

To give an analogy consider I own a company & our app can make your picture look older using Generative Adversarial Networks(GANs). There’s one position available to develop my app further. I could hire an individual with a Ph.D. or an experienced software developer. Sure the Ph.D. holder would’ve got a greater understanding of how exactly a GAN works underneath. While the developer would prefer using a framework like Keras to abstract away building the GAN & focus more on perfecting how the app works.

If I want my company to sustain profitably, should I hire a guy with a Ph.D. or a developer?

Without a doubt, I would place my bets on the developer to deliver me a production-capable product on time, considering the constraints I’m under.

Personal opinion aside, abstractions in technology are unavoidable. At some point in time, repetitive tasks like data augmentations, data cleaning, etc NEED to be abstracted to save up on developer time.

And Finally, Wrapping Up

The field of Data Science & ML is extremely broad with specific requirements even within its subfields! There’s no way an individual can become a true master of Data Science in one lifetime. But fortunately, being a jack of all trades & master of none in the field of Data Science can be good. You just need to apply the right skill to the right situation.

We’re living at an amazing point in time.

The advancements made in the field of AI & ML can at times be overwhelming no doubt. Many of us aspire to make the most out of this opportunity, financially. You might be just happy with a full-time job while your friend might want to start his own ML venture. Each of your approaches will definitely be different as I’ve mentioned previously.

So know what you want to do in the future & figure out the right approach for your aspirations.

Besides, we should always remind ourselves time-to-time, the benefits such fast-paced developments have brought to us as a community. Change is good, we just need to know how to mould ourselves according to the ever-changing technological environment out there.

References

[1] Imarticus Nirmal, 10 Common Misconceptions About Machine Learning, Data Science Association (2018)

[2] Rahul Agarwal, Don’t Democratize Data Science, Towards Data Science (2020)

[3] Caleb Kaiser, Deep Learning Isn’t Hard Anymore, Towards Data Science (2020)

[4] The Leader of the Luddites. Hand-coloured Etching, Luddite — Wikipedia (1812)

[5] Rachel Thomas, Google’s AutoML: Cutting Through the Hype, fast.ai (2018)