Mobify Data Guide

                                                .ⁿ─
                  `:-.                        Γ     ¼  
                 -yyyyys+:`                  ╛       ╕
                /y╔▓▓yyyyy`  +o+/:-.        á         ╕
              `o▄▓▓▓▓▓▓▄y-  :hhhhhhhs      ╒           -     
             `s╙▀▀░▓▓░▀▀y+   yhhhhhhhy    ╒                ,.   
         `-/oyyyyyy▓▓yyo   :hhhhhhhhs                    ⌂╞    ½    
      -/oyyyyyyyyyy▓▓yo`   shhhhhhhhs    ┘              /  k     ½   
    +yyyyyyyyyyyyyy▓▓+    `hhhhhhhhho   ╛              ⌂     ½     ï   
    +yyyyyyyyyyyyyy▓▓     .hhhhhhhhho  ┘              ;        ½     -   
    `yyyyyyyyys+/. ▓▓      yhhhhhhhhhs`              ;           Y     ╚
     -yso+/-.`     ▓▓      .yhhhhhhhhhy.            ⌂              ╘      \,   
             ```...▓▓`      `shhhhhhhhhy.         .                   -      ⁿ.
       -://++++++++▓▓++/-`    /yhhhhhhhhy.      .                        ^-     ~,   
       /+++++++++++▓▓+++++/.   `ohhhhhhy+`   .⌐                              ⁿ,   ▓▄
      ░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓░▓▓▓▓
       `+++++++++++▓▓++++++++++-`  -`                                             ▓▀
        `......-...▓▓+++++++++++/`
                   ▓▓`-/+++++++:`
                   ▓▓    `.:/+:`

Computer, compute to the last digit the value of pi.

Mobify Data Guide

Welcome to Mobify's data guide! We have provided a list of readings that would be useful in getting started on with working with any data set.

🤔 Why this guide?

This is an open-source guide that is intended to gather feedback from various people that have worked with data teams. In Mobify, we work closely with talents from wide variety of backgrounds.

🔥 🤔 😎 🕐 🚀 💭 🍾 😈 ⚖ 💕

We hope that by opening some of our onboarding materials, this will give you a taste for what is our style of work, as well as helping out candidates on interviews, or data hackathons.

🔖 Legend

We denote each type of articles with Emoji: 📜 🏛 📚

📜 Articles - expect around 10-15 mins reading time
🏛 Tutorials - expect minimal half day exercise
📚 Advance Reference (optional readings) - vary in reading time

What happens if I am preparing for an interview/hackathon tomorrow?

We recommend you at least go through the articles and take the:

Python + Pandas tutorial
Setup your environment following Setting up your data dojo and run some practices

🕐 Content of this guide

This is meant to be a list of selected resources on what we think is the minimal set to bootstrap to working on data challenges.

Getting started
Data Science 101
Engineering tools 101
Setting up your data dojo
Think about the problem

See CONTRIBUTING.md for contributing guideline

💭 Getting started

So you would like to work on data eh? There are many great resources to get you started on the path to work with data. We recommend a few of these articles:

📜 Quora's answer - How can I become a data scientist?
- Gives good overview for background/readings that would be helpful
- A few of these articles we will dive in at following sections
📜 Applying the Scientific Method to Software Engineering
- This is a good article explaining the intersection between academia and a real-world engineering scenario

🚀 Data Science 101

If you come from a non statistics/machine learning background, this will be a good starting point.

📚 Statistics for hackers - have a basic list of readings about statistics knowledge required.
🏛 Machine Learning for hackers - give good coverage of various aspects of machine learning.
📜 Scikit-learn estimator map - is my go to place for picking the right model to use.

🚅 Engineering tools 101

Learning to code is an important step in becoming data literate. There are 3 main engineering tools we use.

Python + Pandas

At Mobify, we are a Python shop which makes us focus our analysis on Python + Pandas. Below is some of our favourite tutorial to get started:

🏛 DataQuest/Data scientist is a good onboarding for Python and Pandas.
(advance) 📜 Pandas with Seaborn give a simple article on how to do various Seaborn plots for data visualization.

SQL

SQL is used everywhere.

The 🏛 Codecademy SQL course is our favourite tutorial.

Command line

Being comfortable with command line will help a great deal in your work. We recommend taking 🏛 Codecademy command line course for this.

Git

Git solves 2 big communication challenge working as a team:

Resolving how multiple people work on the same piece of code, on their own computer. Foe example we have 📚 branching strategy which helps us to organize code.
📜 Code review and 📜 pull request. on Github. For example, see a 📚 pull request on this repo.

The 🏛 Codecademy git course is our recommended way to learn git.

🚀 Setting up your data dojo

So are you ready to get started? One thing we found correlated to the ability of interview candidates is the ability to get comfortable with the environment that you will use during the interview. We try to give a few tips.

Also, see Disclaimer - that Mobfiy is a Python shop and likely to be Python focus for our data dojo! Our tool of choice is Jupyter notebook

Hosted version

📚 Data Science workbench is a great way to get started. It presents you with a hosted version of the notebook. And the onboarding was useful.

Local setup (Advance)

If you want to setup a self-hosted version of Jupyter, you might want to check out 🏛 this tutorial

Getting familiar with Jupyter notebook

📜 Short cut keys for Jupyter will make you a Jupyter pro.

😎 Think about the problem

As most of us being proud of diving into our problems, and present our solutions. Over time, we learn a few tools to align colleagues/fellow hackers with our thoughts. Here are a few:

Focus on the right problem to work on

If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions ― Albert Einstein

It is a surprisingly difficult skill to learn how to work on the right problem. Here are a few tips:

📜 Focus on problem and learn from customers
Whiteboarding and Canvasing is a great way to open our mind. More at 📜 Introducing the Deep Learning Canvas - a variation on Startup Canvas - You can print this out or grab a whiteboard and draw this out.
Data 📜 Design sprint - Keeping open minded. We also enjoy a minimal version of this 📜 The 25-Minute Design Sprint which we find it helpful to adjust and adapt.

Communicating the results

I'm not a great programmer; I'm just a good programmer with great habits - Kent Beck

Writing a readable notebook and explaining the result is a great habit. 📜 Clean code in Jupyter notebooks in our go to guide in how to create a clean notebook.

We would like to Keep your analysis reproducible

Reproducibility is important because it is the only thing that an investigator can guarantee about a study. -- Roger Peng

😈 Disclaimer

We are a data shop with engineering focus shop and is opinionated towards selecting easy to get started tools that work with our well with our stack (e.g. Python, Jupyter Notebook) - this is a way that we found it works well for us.

We have no affiliation to any of the companies mentioned in this list.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
CONTRIBUTING.md		CONTRIBUTING.md
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mobify Data Guide

🤔 Why this guide?

🔖 Legend

What happens if I am preparing for an interview/hackathon tomorrow?

🕐 Content of this guide

💭 Getting started

🚀 Data Science 101

🚅 Engineering tools 101

Python + Pandas

SQL

Command line

Git

🚀 Setting up your data dojo

Hosted version

Local setup (Advance)

Getting familiar with Jupyter notebook

😎 Think about the problem

Focus on the right problem to work on

Communicating the results

😈 Disclaimer

About

Releases

Packages

Contributors 2

mobify/mobify-data-guide

Folders and files

Latest commit

History

Repository files navigation

Mobify Data Guide

🤔 Why this guide?

🔖 Legend

What happens if I am preparing for an interview/hackathon tomorrow?

🕐 Content of this guide

💭 Getting started

🚀 Data Science 101

🚅 Engineering tools 101

Python + Pandas

SQL

Command line

Git

🚀 Setting up your data dojo

Hosted version

Local setup (Advance)

Getting familiar with Jupyter notebook

😎 Think about the problem

Focus on the right problem to work on

Communicating the results

😈 Disclaimer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages