From 75ed35125a0e928549ef4e17228a80437028edbc Mon Sep 17 00:00:00 2001 From: Mohamed NIANG <45575893+Niangmohamed@users.noreply.github.com> Date: Sun, 31 Jan 2021 16:51:09 +0100 Subject: [PATCH] add te'chnical test for data scientist --- Technical_Test_For_Data_Scientist.ipynb | 876 ++++++++++++++++++++++++ 1 file changed, 876 insertions(+) create mode 100644 Technical_Test_For_Data_Scientist.ipynb diff --git a/Technical_Test_For_Data_Scientist.ipynb b/Technical_Test_For_Data_Scientist.ipynb new file mode 100644 index 0000000..b788811 --- /dev/null +++ b/Technical_Test_For_Data_Scientist.ipynb @@ -0,0 +1,876 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.7" + }, + "colab": { + "name": "Technical-Test-For-Data-Scientist.ipynb", + "provenance": [], + "collapsed_sections": [] + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "6bvGGOTC-hPS" + }, + "source": [ + "# Technical Tests for Data Scientists \n", + " By Mohamed Niang, Data Scientist / ML Engineer" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NRtbjdm9-hPb" + }, + "source": [ + "Presentation \n", + "\n", + "This notebook is a compilation of technical test exercises and answers for data scientists and Computer Scientist." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "McwCnKck-hPc" + }, + "source": [ + "## Exercise 0" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ORsrpzNz-hPc" + }, + "source": [ + "QUESTION\n", + "\n", + "Our client is a French bank with international subsidiaries.\n", + "\n", + "This client consults us to help him with the following problem: the bank is losing customers in its French market.\n", + "\n", + "Following a Data Science approach, formulate an answer to the following 3 questions:\n", + "\n", + "1.\tWhat are the questions to ask our client to understand his problem?\n", + "2.\tWhat data would you need to retrieve from the customer to conduct one or more analysis(s) on his problem?\n", + "3.\tDescribe the technical approach you would propose to our client to solve his problem. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oB3bxlAK-hPd" + }, + "source": [ + "ANSWERS\n", + "\n", + "1. The questions that can be asked to the client to understand the problem are :\n", + " - Characteristics of clients who have already churned (left the bank);\n", + " - the behaviors (predictors) of these clients before the churn occurred.\n", + "\n", + "\n", + "2. We need labeled client data consisting of their behaviors (predictors) and a binary churn response variable (yes or no).\n", + "\n", + "\n", + "3. The technical approach to solving the problem is the use of supervised learning methods to solve the problem (using the XGBoost model or logistic regression)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rP6QHlsG-hPd" + }, + "source": [ + "## Exercise 1" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CAiBrvUO-hPe" + }, + "source": [ + "Array Challenge\n", + "\n", + "Have the function ArrayChallenge(arr) read the array of numbers stored in arr which will contain a sliding window size, N, as the first element in the array and the rest will be a list of numbers. Your program should return the Moving Median for each element based on the element and its N-1 predecessors, where N is the sliding window size. The final output should be a string with the moving median corresponding to each entry in the original array separated by commas.\n", + "\n", + "Note that for the first few elements (until the window size is reached), the median is computed on a smaller number of entries. For example: if arr is [3, 1, 3, 5, 10, 6, 4, 3, 1] then your program should output \"1,2,3,5,6,6,4,3\"" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AShK5KY0-hPe" + }, + "source": [ + "import pandas as pd\n", + "\n", + "def ArrayChallenge(arr):\n", + " \n", + " # code goes here\n", + " window = arr[0]\n", + " arg = pd.Series(arr[1:]) \n", + " opt = arg.rolling(window, min_periods=1).median().round().astype(int)\n", + " s = pd.Series(opt, dtype=\"string\")\n", + " res = s.astype(str).str.cat(sep=',')\n", + " \n", + " return res" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "MS0IylWX-hPf", + "outputId": "f7b2a5bb-fa3f-4993-a885-7eeb2b6210ef" + }, + "source": [ + "# keep this function call here \n", + "arr = [3,1,3,5,10,6,4,3,1]\n", + "print(ArrayChallenge(arr))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "1,2,3,5,6,6,4,3\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EY6XlU16-hPh" + }, + "source": [ + "## Exercise 2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P4JxMuLL-hPh" + }, + "source": [ + "Array Challenge\n", + "\n", + "Have the function ArrayChallenge(arr) take the array of numbers stored in arr and return the string true if any combination of numbers in the array (excluding the largest number) can be added up to equal the largest number in the array, otherwise return the string false. For example: if arr contains [4, 6, 23, 10, 1, 3] the output should return true because 4 + 6 + 10 + 3 = 23. The array will not be empty, will not contain all the same elements, and may contain negative numbers." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "L22RlIqM-hPi" + }, + "source": [ + "def ArrayChallenge(arr): \n", + " \n", + " # code goes here\n", + " largest_number = arr[0]\n", + "\n", + " for i in range(0, len(arr)): \n", + " if(arr[i] > largest_number): \n", + " largest_number = arr[i]\n", + "\n", + " sum = 0\n", + "\n", + " for elem in arr:\n", + " if elem != largest_number:\n", + " sum = sum + elem\n", + "\n", + " res = sum >= largest_number\n", + "\n", + " res = str(res).lower()\n", + "\n", + " return res\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "gza_kFGm-hPi", + "outputId": "b66b8830-a5e4-4376-86ee-9431caa0f27c" + }, + "source": [ + "# keep this function call here \n", + "arr = [4,6,23,10,1,3]\n", + "print(ArrayChallenge(arr))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "true\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aLqSdn5I-hPi" + }, + "source": [ + "# Exercise 3" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1rnW5qsY-hPj" + }, + "source": [ + "Array Challenge\n", + "\n", + "Have the function ArrayChallenge(arr) take the array of non-negative integers stored in arr, and determine the largest amount of water that can be trapped. The numbers in the array represent the height of a building (where the width of each building is 1) and if you imagine it raining, water will be trapped between the two tallest buildings. For example: if arr is [3, 0, 2, 0, 4] then this array of building heights looks like the following picture if we draw it out:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qX_rzVeC-hPj" + }, + "source": [ + "![Image 1](https://media.geeksforgeeks.org/wp-content/uploads/20200429012307/Untitled-Diagram811.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z8QwFEIA-hPj" + }, + "source": [ + "Now if you imagine it rains and water gets trapped in this picture, then it'll look like the following (represent water):" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6U2VuwUg-hPk" + }, + "source": [ + "![Image 1](https://media.geeksforgeeks.org/wp-content/uploads/20200429012756/Untitled-Diagram821.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v8t8UFBW-hPk" + }, + "source": [ + "This is the most water that can be trapped in this picture, and if you calculate the area you get 7, so your program should return 7." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7WLsblnX-hPk" + }, + "source": [ + "def ArrayChallenge(arr):\n", + " \n", + " # code goes here\n", + " n = len(arr)\n", + " area = 0\n", + "\n", + " for i in range(1,n-1):\n", + " left = arr[i]\n", + " for j in range(i):\n", + " left = max(left,arr[j])\n", + "\n", + " right = arr[i]\n", + " for j in range(i+1,n):\n", + " right = max(right,arr[j])\n", + "\n", + " area = area + (min(left,right) - arr[i]) \n", + "\n", + " return area" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "EMP05_h7-hPk", + "outputId": "5f9a67da-1a97-4408-bd0c-3270f602affc" + }, + "source": [ + "# keep this function call here \n", + "arr = [3,0,2,0,4]\n", + "print(ArrayChallenge(arr))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "7\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mU7bLTWd-hPl" + }, + "source": [ + "# Exercise 4" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K9UWKoFY-hPl" + }, + "source": [ + "Back-end Challenge\n", + "\n", + "In the Python file, write a program to perform a GET request on the route https://coderbyte.com/api/challenges/json/age-counting which contains a data key and the value is a string which contains items in the format: key=STRING, age=INTEGER. Your goal is to count how many items exist that have an age equal to or greater than 50, and print this final value." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XrQLwhmY-hPl" + }, + "source": [ + "import requests\n", + "\n", + "r = requests.get('https://coderbyte.com/api/challenges/json/age-counting')\n", + "data = r.json()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "mDusWaW_-hPm", + "outputId": "8f1cc2be-4ed9-4e10-c4fb-a5b359b016bd" + }, + "source": [ + "count = 0\n", + "\n", + "for i,value in enumerate(data[\"data\"].split(\",\"),0):\n", + " if(i%2==1) and int(value.strip()[4:]) >= 50:\n", + " count = count + 1\n", + "\n", + "print(count)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "128\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BWrM5kgE-q2A" + }, + "source": [ + "# Exercise 5" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vkidOici--MI" + }, + "source": [ + "![Image 0](https://www.codingame.com/work/servlet/fileservlet?id=29331119030579)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "naMg-vGYJVDA" + }, + "source": [ + "En prenant en compte le fait que le facteur de réplication est de 3 et suivant le schéma ci-dessous, ce facteur est-il respecté ?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CYfmU5nE_WAK" + }, + "source": [ + "1. Oui.\r\n", + "\r\n", + "2. **Oui, mais les données dans le schéma montrent que nous avons un phénomène de sur-réplication.**\r\n", + "\r\n", + "3. Non, la donnée est sous-répliquée.\r\n", + "\r\n", + "4. Cassandra ne gère pas les facteurs de réplication." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NC1dun62Dzyk" + }, + "source": [ + "# Exercise 6" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U_hGW31rD4kq" + }, + "source": [ + "Parmis les propositions suivantes, lesquelles permettent de calculer le carré de la variable val?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K_2dAS_aEONT" + }, + "source": [ + "1. **val*val**\r\n", + "\r\n", + "2. **pow(val,2)**\r\n", + "\r\n", + "3. val**2\r\n", + "\r\n", + "4. Aucune des propositions ci-dessus" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DLUrrWJyGNlh" + }, + "source": [ + "# Exercise 7" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6TTnw-wyGP_6" + }, + "source": [ + "```Python\r\n", + "\r\n", + "Soit le code Python 3 suivant :\r\n", + "\r\n", + "class Animal:\r\n", + " pass\r\n", + "\r\n", + "class Dog(Animal):\r\n", + " def cry():\r\n", + " print('woof')\r\n", + "\r\n", + "Parmi les propositions ci-contre, cochez toutes celles qui sont valides.\r\n", + "\r\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qEsX6jqdHMaR" + }, + "source": [ + "1. Animal hérite de Dog\r\n", + "\r\n", + "2. **Dog hérite de Animal**\r\n", + "\r\n", + "3. **Dog hérite de object**\r\n", + "\r\n", + "4. Aucune des propositions ci-dessus" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "htUHuTBWMT8l" + }, + "source": [ + "# Exercise 8" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1_1GU621MVHW" + }, + "source": [ + "Quel type de retour pour le filtrage collaboratif permet d'éviter le traitement des données clairsemées (sparse data) ?\r\n", + "\r\n", + "Une ou plusieurs réponses attendues." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aJ3iQQgMMbhO" + }, + "source": [ + "1. Tout type de retour. Il n’est pas possible d’avoir une représentation de données en matrice sparse\r\n", + "\r\n", + "2. Aucun type de retour. Il n’est pas possible d’avoir une représentation de données en matrice dense\r\n", + "\r\n", + "3. **Retour implicite**\r\n", + "\r\n", + "4. Retour explicite" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ElcsMV6KM_Cm" + }, + "source": [ + "# Exercise 9" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CFA0NlPKNGAc" + }, + "source": [ + "En tant qu'hyperparamètre de l’algorithme du gradient, quelle est l'influence du learning rate sur cet algorithme ?\r\n", + "\r\n", + " Plusieurs réponses attendues." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8p5jSLc_29_b" + }, + "source": [ + "1. **learning rate ralentit la convergence si sa valeur est trop petite**\r\n", + "\r\n", + "2. learning rate ralentit la convergence si sa valeur est trop grande\r\n", + "\r\n", + "3. learning rate augmente les risques de non-convergence si sa valeur est trop petite\r\n", + "\r\n", + "4. **learning rate augmente les risques de non-convergence si sa valeur est trop grande**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B6hoD8dH40_V" + }, + "source": [ + "# Exercise 10" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ag_gcUxx43qH" + }, + "source": [ + "If you know the design pattern used in this piece of code, type its name in the text field (1 word only)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HLOYth955Ljh" + }, + "source": [ + "![Image 1](https://www.codingame.com/work/fileservlet?id=990709721)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W7zopjsd5Tt-" + }, + "source": [ + "Réponses attendues\r\n", + "\r\n", + "1. Singleton\r\n", + "\r\n", + "2. Singleton Pattern" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gpw_j8b35fI0" + }, + "source": [ + "# Exercise 11" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MZeFRMFa5m0E" + }, + "source": [ + "In this exercise, you have to analyze records of temperature to find the closest to zero." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UWRDhT6h5qEA" + }, + "source": [ + "![Image 2](https://www.codingame.com/work/fileservlet?id=1728588048)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "orUofJiN50hD" + }, + "source": [ + "Implement the method closestToZero to return the temperature closer to zero which belongs to the array ts.\r\n", + "\r\n", + "1. If ts is empty, return 0 (zero).\r\n", + "\r\n", + "2. If two numbers are as close to zero, consider the positive number as the closest to zero (eg. if ts contains -5 and 5, return 5).\r\n", + "\r\n", + "Input:\r\n", + "\r\n", + "1. Temperatures are always expressed with floating point numbers ranging from -273 to 5526.\r\n", + "2. ts is never null." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N614W8W66cDH" + }, + "source": [ + "```Java\r\n", + "\r\n", + "// Java code below\r\n", + "\r\n", + "import java.math.*;\r\n", + "\r\n", + "class Solution {\r\n", + "\r\n", + " static double closestToZero(double[] ts) {\r\n", + " if (ts.length == 0)\r\n", + " return 0;\r\n", + "\r\n", + " double closestToZero = ts[0];\r\n", + " double absClosest = Math.abs(closestToZero);\r\n", + "\r\n", + " for (int i = 0; i < ts.length; i++) {\r\n", + " double absValue = Math.abs(ts[i]);\r\n", + "\r\n", + " if (absValue < absClosest || absValue == absClosest && ts[i] > 0) {\r\n", + " closestToZero = ts[i];\r\n", + " absClosest = absValue;\r\n", + " }\r\n", + " }\r\n", + "\r\n", + " return closestToZero;\r\n", + " }\r\n", + "}\r\n", + "\r\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4E6wFm2M7Aoc" + }, + "source": [ + "# Exercise 12" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7ONfVocT7DOT" + }, + "source": [ + "![Image 3](https://www.codingame.com/servlet/fileservlet?id=1495214999799)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xt3o8FA19zY-" + }, + "source": [ + "Modify the query to select only the products having a price greater than 100. Results should be ordered by price in descending order.\r\n", + " \r\n", + "Output exactly the PRODUCT_ID, NAME and PRICE columns as in the example below.\r\n", + " \r\n", + "Example of output:\r\n", + "\r\n", + " ---------------------------------------------\r\n", + " | PRODUCT_ID | NAME | PRICE |\r\n", + " ---------------------------------------------\r\n", + " | 14 | ProForm 6.0 RT | 499.99 |\r\n", + " | 12 | GreenWorks 25022 | 172.73 |\r\n", + " | 0 | Kindle Fire 7 | 119.00 |\r\n", + " | 15 | Weider 190 RX | 100.90 |\r\n", + " ---------------------------------------------\r\n", + "\r\n", + "Throughout the test, the SQL syntax to be used is the ANSI syntax. For example:\r\n", + "\r\n", + "1. for inner joins use the \"INNER JOIN ... ON ...\" keywords and not the \"USING\" syntax\r\n", + "\r\n", + "2. for outer joins, use the \"LEFT OUTER JOIN\" keywords and not the specific Oracle \"a=b(*)\" syntax or Sybase \"a *= b\" syntax" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N3W0bu8z_TVq" + }, + "source": [ + "```SQL\r\n", + "\r\n", + "-- SQL request(s) below\r\n", + "\r\n", + "SELECT PRODUCT_ID, NAME, PRICE\r\n", + "FROM product\r\n", + "WHERE price > 100\r\n", + "ORDER BY price DESC\r\n", + "\r\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ANU_IFHraCEx" + }, + "source": [ + "# Exercise 13" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9xoSg3H7aVUA" + }, + "source": [ + "You are working on a feature branch with a few local commits that isn't quite ready for a full merge. But you want to push a single commit from the feature branch to master.\r\n", + "\r\n", + "Which command would you use?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IQXyjiSvadX_" + }, + "source": [ + "1. git merge\r\n", + "2. **git cherry-pick**\r\n", + "3. git squash\r\n", + "4. git stash" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7rg8Kjs1aptf" + }, + "source": [ + "# Exercise 14" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "S6ES21Aba4Wl" + }, + "source": [ + "Write the command line used to create a new branch new_branch from the current branch and switch to the newly created branch." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tmhN4eQfa7_L" + }, + "source": [ + "Réponses attendues\r\n", + "\r\n", + "1. git checkout -b new_branch\r\n", + "\r\n", + "2. git checkout --branch new_branch" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kuzntRQpbDH2" + }, + "source": [ + "# Exercise 15" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ppF8ePb2bSUH" + }, + "source": [ + "You need to save one or more images from your local docker storage to a tar archive.\r\n", + " \r\n", + "Which command is suitable for this?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_XZeKE90bW3a" + }, + "source": [ + "1. **docker image save**\r\n", + "2. docker image get\r\n", + "3. docker image tar\r\n", + "4. docker image import\r\n", + "5. docker image push\r\n", + "6. docker image export" + ] + } + ] +} \ No newline at end of file