Skip to content

Machine Learning Algorithm Data Exchanges

Yannick Warnier edited this page Oct 28, 2021 · 8 revisions

Due to the huge amount of (non-intrusive) data we expect to collect through this project, we rely on Machine Learning (ML) algorithms to provide useful recommendations to the final users regarding the best training they can follow to obtain the skills required to get into their dream job.

This document describes the information we expect to provide the ML algorithm and what we expect in return.

There are 2 topics, one major, one minor, for which we expect insightful results from the ML algorithm: training recommendations and job opportunities matching.

Training recommendations

Feeding training data to ML

Info Description (optional)
Training ID Unique ID from the SILKC system
Training name Name of the training (not sure this is necessary)
Skills required Array of skills (English names or URIs) required in order to enter the training (the user MUST have those to qualify)
Skills acquired Array of skills (English names or URIs) to be acquired by following the training.
Location Latitude, Longitude of the training (affected by online training circumstances)
Online Boolean value indicating if the training is exclusively online (true). False by default.
Cost Float value. At this point, higher prices lowers preferences, but this might change in the future to be a more precise indicator based on the price of other trainings with similar skills acquired or required.
Duration in hours Integer value. At this point, we consider a higher duration as lowering the preference for this training. Might change in the future.

Feeding user data to ML

Based on the information we collect from the final users, we can provide the ML algorithm with the following information.

Info Description (optional)
User ID An integer
Year of birth An integer year
Coordinates of residency As latitude, longitude coordinates
Acceptable commute distance (up_to_distance field) The number of kilometers from his/her residency where the user is willing to travel for work of training
Skills acquired through jobs Array of skills references (could be provided either as English name or URI) the user has acquired through work. This list is considered as very reliable, as previous work experience can usually be verified relatively easily.
Skills acquired through training Array of skills references (could be provided either as English name or URI) the user has acquired through training. This list is considered as reliable, as previous training experience can usually be proven, although not as easily verified as work experience.
Skills personally reported as acquired Array of skills references (could be provided either as English name or URI) the user has acquired through other media and the user is, himself/herself, reporting as acquired. Due to the non-correlation with previous reported training or work experience, we consider this list as being of lesser reliability than the rest.
Previous ocupations Array of occupations (could be provided either as English name or URI) the user has had.
Current occupation Array of occupations (could be provided either as English name or URI) the user has at the moment. For now, there is only one item in the array (we consider only one current occupation)
Training(s) followed Array of internal training IDs from the SILKC application.
Score given to followed training Array of followed training (by ID) with a score (1 to 5) expressing a preference of the user towards one training or the other. We don't consider this preference to be a very strong differentiator, but want to include it as a stronger future differentiator (when there is a huge amount of data)
Dream occupation The English name or URI of the dream occupation (the final goal) of the user
Dream occupation skills Array of skills (either English names or URIs) of the dream occupation. This element could be skipped if we otherwise store and maintain, in the ML, a match between occupation and skills.
Job openings Details on job openings in this dream occupation. Array of job openings that match the dream occupation or the dream occupation's skills list. This array should also contain the location for the job opening, so that a match can be calculated in terms of distance from the residency.
Professional experience Number of years since this person started his/her professional life. We assume this will act as a differentiator, over time, for recommended training, but not really at the beginning.

Obtaining insightful data back from ML

Info Description (optional)
User ID Integer value
Recommended training List (array) of recommend training sessions (by Training ID), accompanied by a level of confidence
# of vacancies The number of vacancies for this dream ocupation currently opened, by distance from the user

Job vacancies

Feeding data to ML

Info Description (optional)
User ID
Year of birth
City and country of residency
Acceptable commute distance (up_to_distance field) The number of kilometers from his/her residency where the user is willing to travel for work of training
Skills acquired through jobs
Skills acquired through training
Skills personally reported as acquired
Previous ocupations
Current occupation
Training(s) followed Training details provided separately?
Score given to followed training (expressing preference)
Dream occupation {including skills required for that occupation}
Professional experience Number of years since this person started his/her professional life

Obtaining insightful data back from ML