Skip to content

Collaborative filtering RS

Ishani Kathuria edited this page Dec 24, 2022 · 2 revisions

Collaborative filtering uses similarities between users and items simultaneously to provide recommendations. In CF, we tend to find similar users and recommend what similar users like. In this type of recommendation system, we don’t use the features of the item to recommend it, rather we classify the users into clusters of similar types and recommend each user according to the preference of its cluster. It takes into consideration the basic fact that if person X and person Y have a certain reaction to some items then they might have the same opinion for other items too.

User-based recommendation

Here, we look for the users who have rated various items in the same way and then find the rating of the missing item with the help of these users.

Item-based recommendation

Here, we explore the relationship between the pair of items (the user who bought Y, also bought Z). We find the missing rating with the help of the ratings given to the other items by the user. It was first invented and used by Amazon in 1998.

Attacks on collaborative recommender systems

  • Collaborative recommender systems are vulnerable to malicious users who seek to bias their output, causing them to recommend (or not recommend) particular items.
  • Each of the separate identities assumed by the attacker is referred to as an attack profile.
  • Best attack against a system is one that yields the biggest impact for the least amount of effort.
  • product push attack promotes the recommendations made for items.
  • product nuke attack demotes the recommendations made for items.
  • A high-knowledge attack is one that requires the attacker to have detailed knowledge of the rating distribution in a recommender system’s database. Some attacks, for example, require that the attacker know the mean rating and standard deviation for every item.
  • A low-knowledge attack is one that requires system-independent knowledge such as might be obtained by consulting public information sources.
  • An attacker that has more detailed knowledge of the precise algorithm in use would be able to produce an informed attack.

Random Attack (BASIC)

  • Random ratings distributed around the overall mean assigned to the filler items and a prespecified rating assigned to the target item.
  • Target item is assigned the maximum $(r_{max})$ / minimum $(r_{min})$ rating in case of push/nuke.
  • Minimal knowledge required but not very effective.

Average Attack (BASIC)

  • More powerful and uses individual mean for each item rather than the global mean.
  • Each assigned rating for a filler item corresponds to the mean rating for that item.
  • Can also be used as a nuke attack by using $r_{min}$ instead of $r_{max}$
  • Difference between average and random attacks is in the manner in which ratings are computed for the filler items in the profile.

Bandwagon (LOW KNOWLEDGE)

  • Build attack profiles containing those items that have high visibility (popular).
  • Good probability of being similar to a large number of users.
  • Does not require any system-specific data.
  • Uses selected items which are likely to have been rated by a large number of users in the database.
  • The items are assigned the maximum rating value together with the target item.
  • Ratings for the filler items are determined randomly in a similar manner as in the random attack.

Segment (LOW KNOWLEDGE)

  • Push an item to a targeted group of users with known or easily predicted preferences.
  • Ex: The producer of a horror movie might want to get the movie recommended to viewers who have liked other horror movies.
  • Attacker determines a set of segment items that are likely to be preferred by his intended target audience.
  • These items are assigned the maximum rating value together with the target item.
  • Minimum rating is given to the filler items.

Love/Hate attack (NUKE)

  • Target item is given the minimum rating value, $r_{min}$.
  • Ratings in the filler item set are the maximum rating value, $r_{max}$.

Reverse Bandwagon (NUKE)

  • Selected items are those that tend to be rated poorly by many.
  • These items are assigned low ratings together with the target item.
  • Target item is associated with widely disliked items, increasing the probability that the system will generate low predicted ratings for that item.

Popular (INFORMED)

  • Attack profiles are constructed using popular items from the domain under attack.
  • Rates the filler items either $r_{min}+1$ or $r_{min}$, according to whether the average rating for the item is higher or lower.
  • Can also be used for nuke attacks. Filler items either $r_{max}$ or $r_{max}-1$ to the more- and less-liked selected items.

Probe Attack (INFORMED)

  • Obtain items and their ratings from the system itself.
  • Attacker creates a seed profile to use it to generate recommendations from the system.
  • Recommendations are generated by the neighbouring users and so they are guaranteed to be rated by at least some of these users and the predicted ratings will be well-correlated with these users’ opinions.
  • Way for the attacker to incrementally learn about the system’s rating distribution.

Countermeasures for attacks

  • Use model-based or hybrid algorithms
    • More robust
    • Comparable accuracy
    • Less vulnerable
  • Increase profile injection cost
    • Using captchas
    • Using low-cost manual insertion
  • Use statistical attack detection methods
    • Detect groups of users who collaborate to push/nuke items
    • Monitor the development of rating for an item
      • Changes in average rating
      • Changes in rating entropy
      • Time dependant metrics (bulk ratings)
    • Use ML methods to detect fake profiles
Clone this wiki locally