.βΏβ
`:-. Ξ ΒΌ
-yyyyys+:` β β
/yβββyyyyy` +o+/:-. Γ‘ β
`oββββββββy- :hhhhhhhs β -
`sβββββββββy+ yhhhhhhhy β ,.
`-/oyyyyyyββyyo :hhhhhhhhs ββ Β½
-/oyyyyyyyyyyββyo` shhhhhhhhs β / k Β½
+yyyyyyyyyyyyyyββ+ `hhhhhhhhho β β Β½ Γ―
+yyyyyyyyyyyyyyββ .hhhhhhhhho β ; Β½ -
`yyyyyyyyys+/. ββ yhhhhhhhhhs` ; Y β
-yso+/-.` ββ .yhhhhhhhhhy. β β \,
```...ββ` `shhhhhhhhhy. . - βΏ.
-://++++++++ββ++/-` /yhhhhhhhhy. . ^- ~,
/+++++++++++ββ+++++/. `ohhhhhhy+` .β βΏ, ββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
`+++++++++++ββ++++++++++-` -` ββ
`......-...ββ+++++++++++/`
ββ`-/+++++++:`
ββ `.:/+:`
Computer, compute to the last digit the value of pi.
Welcome to Mobify's data guide! We have provided a list of readings that would be useful in getting started on with working with any data set.
This is an open-source guide that is intended to gather feedback from various people that have worked with data teams. In Mobify, we work closely with talents from wide variety of backgrounds.
π₯ π€ π π π π πΎ π β π
We hope that by opening some of our onboarding materials, this will give you a taste for what is our style of work, as well as helping out candidates on interviews, or data hackathons.
We denote each type of articles with Emoji: π π π
- π Articles - expect around 10-15 mins reading time
- π Tutorials - expect minimal half day exercise
- π Advance Reference (optional readings) - vary in reading time
We recommend you at least go through the articles and take the:
- Python + Pandas tutorial
- Setup your environment following Setting up your data dojo and run some practices
This is meant to be a list of selected resources on what we think is the minimal set to bootstrap to working on data challenges.
- Getting started
- Data Science 101
- Engineering tools 101
- Setting up your data dojo
- Think about the problem
See CONTRIBUTING.md for contributing guideline
So you would like to work on data eh? There are many great resources to get you started on the path to work with data. We recommend a few of these articles:
-
π Quora's answer - How can I become a data scientist?
- Gives good overview for background/readings that would be helpful
- A few of these articles we will dive in at following sections
-
π Applying the Scientific Method to Software Engineering
- This is a good article explaining the intersection between academia and a real-world engineering scenario
If you come from a non statistics/machine learning background, this will be a good starting point.
- π Statistics for hackers - have a basic list of readings about statistics knowledge required.
- π Machine Learning for hackers - give good coverage of various aspects of machine learning.
- π Scikit-learn estimator map - is my go to place for picking the right model to use.
Learning to code is an important step in becoming data literate. There are 3 main engineering tools we use.
At Mobify, we are a Python shop which makes us focus our analysis on Python + Pandas. Below is some of our favourite tutorial to get started:
- π DataQuest/Data scientist is a good onboarding for Python and Pandas.
- (advance) π Pandas with Seaborn give a simple article on how to do various Seaborn plots for data visualization.
SQL is used everywhere.
- The π Codecademy SQL course is our favourite tutorial.
Being comfortable with command line will help a great deal in your work. We recommend taking π Codecademy command line course for this.
Git solves 2 big communication challenge working as a team:
- Resolving how multiple people work on the same piece of code, on their own computer. Foe example we have π branching strategy which helps us to organize code.
- π Code review and π pull request. on Github. For example, see a π pull request on this repo.
The π Codecademy git course is our recommended way to learn git.
So are you ready to get started? One thing we found correlated to the ability of interview candidates is the ability to get comfortable with the environment that you will use during the interview. We try to give a few tips.
Also, see Disclaimer - that Mobfiy is a Python shop and likely to be Python focus for our data dojo! Our tool of choice is Jupyter notebook
π Data Science workbench is a great way to get started. It presents you with a hosted version of the notebook. And the onboarding was useful.
If you want to setup a self-hosted version of Jupyter, you might want to check out π this tutorial
π Short cut keys for Jupyter will make you a Jupyter pro.
As most of us being proud of diving into our problems, and present our solutions. Over time, we learn a few tools to align colleagues/fellow hackers with our thoughts. Here are a few:
If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions β Albert Einstein
It is a surprisingly difficult skill to learn how to work on the right problem. Here are a few tips:
-
Whiteboarding and Canvasing is a great way to open our mind. More at π Introducing the Deep Learning Canvas - a variation on Startup Canvas - You can print this out or grab a whiteboard and draw this out.
-
Data π Design sprint - Keeping open minded. We also enjoy a minimal version of this π The 25-Minute Design Sprint which we find it helpful to adjust and adapt.
I'm not a great programmer; I'm just a good programmer with great habits - Kent Beck
Writing a readable notebook and explaining the result is a great habit. π Clean code in Jupyter notebooks in our go to guide in how to create a clean notebook.
We would like to Keep your analysis reproducible
Reproducibility is important because it is the only thing that an investigator can guarantee about a study. -- Roger Peng
We are a data shop with engineering focus shop and is opinionated towards
selecting easy to get started tools that work with our well with our stack (e.g. Python
,
Jupyter Notebook
) - this is a way that we found it works well for us.
We have no affiliation to any of the companies mentioned in this list.