Machine Learning Algorithm Data Exchanges

Due to the huge amount of (non-intrusive) data we expect to collect through this project, we rely on Machine Learning (ML) algorithms to provide useful recommendations to the final users regarding the best training they can follow to obtain the skills required to get into their dream job.

This document describes the information we expect to provide the ML algorithm and what we expect in return.

There are 2 topics, one major, one minor, for which we expect insightful results from the ML algorithm: training recommendations and job opportunities matching.

Training recommendations

Feeding training data to ML

Info	Description (optional)
Training ID	Unique ID from the SILKC system
Training name	Name of the training (not sure this is necessary)
Skills required	Array of skills (English names or URIs) required in order to enter the training (the user MUST have those to qualify)
Skills acquired	Array of skills (English names or URIs) to be acquired by following the training.
Location	Latitude, Longitude of the training (affected by online training circumstances)
Online	Boolean value indicating if the training is exclusively online (true). False by default.
Cost	Float value. At this point, higher prices lowers preferences, but this might change in the future to be a more precise indicator based on the price of other trainings with similar skills acquired or required.
Duration in hours	Integer value. At this point, we consider a higher duration as lowering the preference for this training. Might change in the future.

Feeding user data to ML

Based on the information we collect from the final users, we can provide the ML algorithm with the following information.

Info	Description (optional)
User ID	An integer
Year of birth	An integer year
Coordinates of residency	As latitude, longitude coordinates
Acceptable commute distance	(`up_to_distance` field) The number of kilometers from his/her residency where the user is willing to travel for work of training
Skills acquired through jobs	Array of skills references (could be provided either as English name or URI) the user has acquired through work. This list is considered as very reliable, as previous work experience can usually be verified relatively easily.
Skills acquired through training	Array of skills references (could be provided either as English name or URI) the user has acquired through training. This list is considered as reliable, as previous training experience can usually be proven, although not as easily verified as work experience.
Skills personally reported as acquired	Array of skills references (could be provided either as English name or URI) the user has acquired through other media and the user is, himself/herself, reporting as acquired. Due to the non-correlation with previous reported training or work experience, we consider this list as being of lesser reliability than the rest.
Previous ocupations	Array of occupations (could be provided either as English name or URI) the user has had.
Current occupation	Array of occupations (could be provided either as English name or URI) the user has at the moment. For now, there is only one item in the array (we consider only one current occupation)
Training(s) followed	Array of internal training IDs from the SILKC application.
Score given to followed training	Array of followed training (by ID) with a score (1 to 5) expressing a preference of the user towards one training or the other. We don't consider this preference to be a very strong differentiator, but want to include it as a stronger future differentiator (when there is a huge amount of data)
Dream occupation	The English name or URI of the dream occupation (the final goal) of the user
Dream occupation skills	Array of skills (either English names or URIs) of the dream occupation. This element could be skipped if we otherwise store and maintain, in the ML, a match between occupation and skills.
Job openings	Details on job openings in this dream occupation. Array of job openings that match the dream occupation or the dream occupation's skills list. This array should also contain the location for the job opening, so that a match can be calculated in terms of distance from the residency.
Professional experience	Number of years since this person started his/her professional life. We assume this will act as a differentiator, over time, for recommended training, but not really at the beginning.

Obtaining insightful data back from ML

Info	Description (optional)
User ID	Integer value
Recommended training	List (array) of recommend training sessions (by Training ID), accompanied by a level of confidence
# of vacancies	The number of vacancies for this dream ocupation currently opened, by distance from the user

Job vacancies

Feeding data to ML

Info	Description (optional)
User ID
Year of birth
City and country of residency
Acceptable commute distance	(`up_to_distance` field) The number of kilometers from his/her residency where the user is willing to travel for work of training
Skills acquired through jobs
Skills acquired through training
Skills personally reported as acquired
Previous ocupations
Current occupation
Training(s) followed	Training details provided separately?
Score given to followed training (expressing preference)
Dream occupation	{including skills required for that occupation}
Professional experience	Number of years since this person started his/her professional life

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine Learning Algorithm Data Exchanges

Training recommendations

Feeding training data to ML

Feeding user data to ML

Obtaining insightful data back from ML

Job vacancies

Feeding data to ML

Obtaining insightful data back from ML

Clone this wiki locally