-
Notifications
You must be signed in to change notification settings - Fork 4
Machine Learning Algorithm Data Exchanges
Yannick Warnier edited this page Oct 28, 2021
·
8 revisions
Due to the huge amount of (non-intrusive) data we expect to collect through this project, we rely on Machine Learning (ML) algorithms to provide useful recommendations to the final users regarding the best training they can follow to obtain the skills required to get into their dream job.
This document describes the information we expect to provide the ML algorithm and what we expect in return.
There are 2 topics, one major, one minor, for which we expect insightful results from the ML algorithm: training recommendations and job opportunities matching.
Info | Description (optional) |
---|---|
Training ID | Unique ID from the SILKC system |
Training name | Name of the training (not sure this is necessary) |
Skills required | Array of skills (English names or URIs) required in order to enter the training (the user MUST have those to qualify) |
Skills acquired | Array of skills (English names or URIs) to be acquired by following the training. |
Location | Latitude, Longitude of the training (affected by online training circumstances) |
Online | Boolean value indicating if the training is exclusively online (true). False by default. |
Cost | Float value. At this point, higher prices lowers preferences, but this might change in the future to be a more precise indicator based on the price of other trainings with similar skills acquired or required. |
Duration in hours | Integer value. At this point, we consider a higher duration as lowering the preference for this training. Might change in the future. |
Based on the information we collect from the final users, we can provide the ML algorithm with the following information.
Info | Description (optional) |
---|---|
User ID | An integer |
Year of birth | An integer year |
Coordinates of residency | As latitude, longitude coordinates |
Acceptable commute distance | (up_to_distance field) The number of kilometers from his/her residency where the user is willing to travel for work of training |
Skills acquired through jobs | Array of skills references (could be provided either as English name or URI) the user has acquired through work. This list is considered as very reliable, as previous work experience can usually be verified relatively easily. |
Skills acquired through training | Array of skills references (could be provided either as English name or URI) the user has acquired through training. This list is considered as reliable, as previous training experience can usually be proven, although not as easily verified as work experience. |
Skills personally reported as acquired | Array of skills references (could be provided either as English name or URI) the user has acquired through other media and the user is, himself/herself, reporting as acquired. Due to the non-correlation with previous reported training or work experience, we consider this list as being of lesser reliability than the rest. |
Previous ocupations | Array of occupations (could be provided either as English name or URI) the user has had. |
Current occupation | Array of occupations (could be provided either as English name or URI) the user has at the moment. For now, there is only one item in the array (we consider only one current occupation) |
Training(s) followed | Array of internal training IDs from the SILKC application. |
Score given to followed training | Array of followed training (by ID) with a score (1 to 5) expressing a preference of the user towards one training or the other. We don't consider this preference to be a very strong differentiator, but want to include it as a stronger future differentiator (when there is a huge amount of data) |
Dream occupation | The English name or URI of the dream occupation (the final goal) of the user |
Dream occupation skills | Array of skills (either English names or URIs) of the dream occupation. This element could be skipped if we otherwise store and maintain, in the ML, a match between occupation and skills. |
Job openings | Details on job openings in this dream occupation. Array of job openings that match the dream occupation or the dream occupation's skills list. This array should also contain the location for the job opening, so that a match can be calculated in terms of distance from the residency. |
Professional experience | Number of years since this person started his/her professional life. We assume this will act as a differentiator, over time, for recommended training, but not really at the beginning. |
Info | Description (optional) |
---|---|
User ID | Integer value |
Recommended training | List (array) of recommend training sessions (by Training ID), accompanied by a level of confidence |
# of vacancies | The number of vacancies for this dream ocupation currently opened, by distance from the user |
Info | Description (optional) |
---|---|
User ID | |
Year of birth | |
City and country of residency | |
Acceptable commute distance | (up_to_distance field) The number of kilometers from his/her residency where the user is willing to travel for work of training |
Skills acquired through jobs | |
Skills acquired through training | |
Skills personally reported as acquired | |
Previous ocupations | |
Current occupation | |
Training(s) followed | Training details provided separately? |
Score given to followed training (expressing preference) | |
Dream occupation | {including skills required for that occupation} |
Professional experience | Number of years since this person started his/her professional life |