-
Notifications
You must be signed in to change notification settings - Fork 2
The matching procedure for profiles
How does the user find her peers?
The core service of openmediate is to supply any user with a list of other users that are similar or relevant to her. This similarity concerns the symptoms and other features of the user's constitution. In the following the main aspects of this list-generating mechanism are described.
1. Representing the constitution of the users
When the users generate their openmediate-profile they are asked to fill out a structured form (medical data editor) whose items characterise the users medical consitution. Is designed according to the demands in the ticket Create Medical Profile. An example would be an item corresponding to "pain", which would consist in enquiries concerning the location and the tissue where the pain occurs, the intensity with which the pain occurs, the type of pain and its frequency or conditions of appearance. Once an item is filled out it generates one or several medical finding that encode the information entered by the user. A finding could be an array. In the "pain" example the users's finding could look like this
type of pain | location, | tissue | trigger | intensity |
---|---|---|---|---|
stinging | right shoulder | joint | when moved | "can hardly move, but sleep o.k." |
The entries of the array are refered to as "particles" of the finding. After the form is completed the user is associated with a "profile" of such findings which represents all aspects of the user's constitution that she consideres relevant for her well being.
2. How to find "interesting" profiles
Assume for now that the user has such a "profile" of findings. The core mechanism that generates a list of potentially interesting other users for her combines the following three heuristics:
a) Given a tool that associates to each pair of users (to be precise their pair of sets of findings) a numerical value - called general score - this heuristic orders all users in descending order with respect to the value they score with the user, for which a list is to be generated. The first m members of this list are considered for presentation to the user.
b) Consider a set of "established illnesses", each characterised by a set of findings and equipped with an illness-specific score function and the list of users, ordered by their illness-specific score for the respective illness. Based on the user's illness-specific score for each such "established illness" a certain number of members of the respective illness's list of users - that additionally have a high general score and/or illness-specific score with the user - is considered for presentation to the user.
c) Given a method to find more extrem cases of the user's condition a list of such cases is considered for presentation to the user.
The resulting three lists of considered candidate-users are recombined in a way that addresses how well each of the heuristics performed. The obtained ordered list is presented to the user. One way of deciding on the proportion of each heuristics candidate-users is to for example increase the proportion of users from method (b) if the user can be well-associated two one or two established illnesses. The three heuristics are explained in more detail in the following.
2.1 The general score
The general score function consists of two parts. The first part is a list of marginal score functions that allows to compare different types of findings. The second part is a list of weights for the aggregation of the marginal scores to the general score function. Consider the following example:
set(user1) =(a1,b1,c1)
set(user2)=(a2,b2,d2)
Each letter a,b,... corresponds to a type of finding.
Given the marginal score functions and the weights the general distance is computed as follows
gscore(user1,user2)=(waa mscore(a1,a2)+ wab nscore(a1,b2)+wad mscore(a1,d2)+...+wcd mscore(c1,d2)
mscore(.,.) represents the marginal score of two types of findings. waa, wab, etc. denote the positive weights. The appropriate score distance for two types of findings is assumed to be selected by mscore(.,.)`.
An important challenge to find a distance function that also addresses the time dynamic of the medical constitutions of users.
2.1.2 The marginal scores
There are several distinct comparisons possible. We exemplify the idea with the example for "pain". Then the marginal score could look like this
mscore_(pain)(pain1, pain2)= (w_tpain* score_typeofpain(type1,type2)+ w_location* score_location(location1,location2)+ w_tissue * score_tissue(tissue1,tissue2)+...)
.
Most of these particle specific scores could be trivial in the sense that they equal 1 if only if the compared particles are equal. For the location a 2-dimensional map of the body could be used. The score in that case could be
score_location(location1,location2)= -d(location1,location2)/(1+d(location1,location2))
where d(location1,location2) denotes the euclidean distance of the coordinates. Medically more similar types of pains should increase the score. Similarly for the triggers of pain (always there, only when moved, only when strained, always but strain matters, less when moved) it seems reasonable to implement the function such that
score_trigger(only when moved, only when strained) > score_trigger(only when moved, less when moved)
The scores for "trigger" and "typ of pain" could be represented by a symmetric matrix, whose entry in row i and column j encodes the particle specific scores if the respective particle equals value_i for one user and value_j for the other.
The particle specific weights such as w_tpain
or w_location
again have to be chosen wisely. This could be guided by medical insight and calibrated in a similar way as done for the weights of the general score.
What needs to be done:
- specify for each type of finding a meaningful comparison, which leads to a numerical quantification of similarity.
The weights used for the aggregation of the marginal scores are a central object of the general score and decisive for its performance. There are several ways of calibrating them.
The first is to use medical expertise to asses, which kind of differences are more or less relevant to compare patients' constitutions.
The second is a sanity test based of the "established illnesses". One such test could be to alternatingly perform a k-means cluster analysis - based on the topology induced by the general score - and to adapt the weights such that the "established illnesses" are reproduced "well enough" as cluster centers. k would be equal to the number of "established illnesses" or slightly larger.
The third is to regularly adapt the weights such that the general score users that declare each other useful (via a tag and/or via fruitful interaction) increases, while the one of "uninteresting" but highly ranked proposed users decreases. This could be done weight by weight. A weight wab
is increased if the marginal score of cooperating users user1 and user2 is larger than between say user1 and user3, while user3 is higher ranked in user1's list of proposed users. In the same spirit a weight wab
is decreased if the marginal score of cooperating users user1 and user2 is smaller than between user1 and user3. It might be an improvement to monitor this adaptive calibration by medical experts to improve the quality of such changes.
What needs to be done:
- Find a meaningful set of initial values for the weights
- Implement and carry out one or many of the above calibrations (using bootstrapped versions of the established illnesses)
There is a lot of medical knowledge on illnesses and their symptoms. One store for this knowledge is the ICD. It is the aim to incorporate at least some of this knowledge into the matching procedure. For this purpose a "complete" list of clearly diagnosable illnesses is translated into a set of profiles that characterise different stereotypical versions of these illnesses.
Each of these sets is combined with a specific numerical comparison, that is designed to quantify how similar a user's set is to the "stereotype profiles" of the respective illness. The stereotypes of an illness could be different variants of the symptoms of the same illness or differently treated developments.The user's profile is compared with each stereotype profile in terms of the illness-specific score. If the maximum of the resulting scores is the largest over all "established illnesses" the user is associated to that illness.
The illness-specific scores can be more complex than the general score function as they incorporate what is known about the illness and about how to diagnose it reliably.
What needs to be done:
- Translate ICD illnesses into lists of stereotypical profiles
- Specify illnes-specific functions to compare user profiles with the illnesses
It can be very fruitful for the user to get in contact with people that suffer from potentially the same illness but with a higher degree of intensity. The third heuristic identifies such users. This could be done via using "established illnesses" the user is associated with. Each such "established illness" could also be equipped with a method of identifying users that suffer from a more intense manifestation of the concerned condition. Alternatively some finding could have features that represent intensities and one could design a method that finds users that share the same findings as the user but in a higher intensity.
**2.4 Recombining the heuristics** A method is needed that integrates the suggested profiles of the three heuristics (general distance, established illnesses, archetypes). This method should favour suggestions of either heuristic that worked well. For instance more suggestions based on established illnesses could be included if the respective profile is very similar two one or two of those registered illnesses. For this comparable schemes for the assessment of the accuracy or success of the used heuristics are needed. They could yield numerical values that determine the respective proportions of suggested profiles.