Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Personalized Recommendations of Manga #1220

Open
1 task done
peachblacky opened this issue Dec 19, 2024 · 10 comments
Open
1 task done

Personalized Recommendations of Manga #1220

peachblacky opened this issue Dec 19, 2024 · 10 comments

Comments

@peachblacky
Copy link

peachblacky commented Dec 19, 2024

Describe your suggested feature

As of now (as i see from the source code), the Recommendations block in app just suggest some bunch of random manga to scroll through.
I think it would be a great idea to make some more thoughtful algorithm to do it, for example some Linear Recommendation model (like EASE or SANSA)

Thing to take into account

I am currently working as Recommendation System Engineer, and i think i could try to approach this problem and make a cool recommendations for people to find new manga.
But to do this, a couple of thing should be connsidered

Computational Resources. Are we constrained to only the app itself, or does the team have some sort of server/cluster, where we could set up a services to train model and run API for them?
Does the app somehow collect statistics for users now? Like with which manga has user interacted, when that happened etc.
Would be great if someone could have a discussion with me on this point :-)

Acknowledgements

  • This is not a duplicate of an existing issue. Please look through the list of open issues before creating a new one.
@Koitharu
Copy link
Member

I guess using AI for recommendation is overengineering: this is a secondary functionality that will require a lot of resources. But any ideas are welcome

@MariusAlbrecht
Copy link
Contributor

MariusAlbrecht commented Dec 21, 2024

Like with which manga has user interacted, when that happened etc.

I don't want this app to track my every move and then send that data somewhere on the web.

@peachblacky
Copy link
Author

peachblacky commented Dec 21, 2024

Like with which manga has user interacted, when that happened etc.

I don't want this app to track my every move and then send that data somewhere on the web.

User data is always anonymous during statistics collection, especially when the app is open-source... So nothing will be leaked, all anonymization is easy do be done

Additionally, we can just ask users for permission to collect their data for statistics.
Everybody will still have recommendations, but training of models will only be done on those who have gave confirmation

@peachblacky
Copy link
Author

peachblacky commented Dec 21, 2024

I guess using AI for recommendation is overengineering: this is a secondary functionality that will require a lot of resources. But any ideas are welcome

Advanced models could actually lead to some serious resource demand.
But under AI there is a very wide (in terms of resource "weight") variety of options.
There is some simple statistics models (i think, some of them are even able to be implemented on-edge).

I think since there is already a "Recommendations" block in the app, it should at least contain some thoughtful list of titles, and not some random duplicated stuff, which can be found there now.
Current Recommendations sections is practically useless for user, in my opinion, user still in need to sort all irrelevant stuff manually...

I think that such functionality will greatly engage users to explore manga more and use Kotatsu more)

@MariusAlbrecht
Copy link
Contributor

User data is always anonymous during statistics collection, especially when the app is open-source... So nothing will be leaked, all anonymization is easy do be done

I'd be careful here, anonymising data can be very hard. Large sets of usage data can, depending on the circumstances, quite easily identify an individual even when the data doesn't contain anything that directly identifies said individual.

Additionally, we can just ask users for permission to collect their data for statistics.
Everybody will still have recommendations, but training of models will only be done on those who have gave confirmation

That, I'm happy with. Earlier it sounded like the model should be deployed on a central server with clients sending requests (including their personal usage data) to that server to get recommendations.

@MariusAlbrecht
Copy link
Contributor

MariusAlbrecht commented Dec 22, 2024

Maybe we could also just rely on the recommendations provided by some sources instead of coming up with our own?
I could, for example, imagine the following scheme to not be great but "good enough":

  • somehow figure out a couple of topics (genres, pieces of media, whatever) the user likes. For example, look at all favourites and get the top 5 genres and look at tracker entries and get the top 5 highest ratings.
  • somehow figure out sources which could provide reasonable recommendations for those topics. For example, use the top 5 most-used sources which provide reccomendations
  • get the recommendations for the previously determined topics from the previously determined sources
  • do some filtering on the results. For example, rank mangas which occurred multiple times higher and consider ratings on the source site

@peachblacky
Copy link
Author

User data is always anonymous during statistics collection, especially when the app is open-source... So nothing will be leaked, all anonymization is easy do be done

I'd be careful here, anonymising data can be very hard. Large sets of usage data can, depending on the circumstances, quite easily identify an individual even when the data doesn't contain anything that directly identifies said individual.

Additionally, we can just ask users for permission to collect their data for statistics.
Everybody will still have recommendations, but training of models will only be done on those who have gave confirmation

That, I'm happy with. Earlier it sounded like the model should be deployed on a central server with clients sending requests (including their personal usage data) to that server to get recommendations.

Well, regarding the second point - we actually need to store some info regarding user behaviour to get recommendations, but i might be a lower magnitude of data collection
It would be best to just ask user to collect data and then show recommender block

@peachblacky
Copy link
Author

Maybe we could also just rely on the recommendations provided by some sources instead of coming up with our own? I could, for example, imagine the following scheme to not be great but "good enough":

  • somehow figure out a couple of topics (genres, pieces of media, whatever) the user likes. For example, look at all favourites and get the top 5 genres and look at tracker entries and get the top 5 highest ratings.
  • somehow figure out sources which could provide reasonable recommendations for those topics. For example, use the top 5 most-used sources which provide reccomendations
  • get the recommendations for the previously determined topics from the previously determined sources
  • do some filtering on the results. For example, rank mangas which occurred multiple times higher and consider ratings on the source site

Sound interesting, but need to listen to some people who worked with sources to understand the complexity of such solution

Perhaps, we will still need to store some statistics data, as least locally

@MariusAlbrecht
Copy link
Contributor

MariusAlbrecht commented Dec 24, 2024

The majority of usage statistics are already being stored locally. The only thing I can think of that isn't are the ratings the user leaves with the tracking services.

We also are already getting recommendations from the sources. That might even mean that we don't need any new functionality in the parsers (which is important as that'd be a lot of work)

@Caellian Caellian mentioned this issue Feb 3, 2025
1 task
@Caellian
Copy link

Caellian commented Feb 3, 2025

Computational Resources. Are we constrained to only the app itself, or does the team have some sort of server/cluster, where we could set up a services to train model and run API for them?

Any backend requirements can (and likely will) break the functionality in the future. Any free tier isn't enough to cover the metrics that would need to be collected (besides maybe gradually building a model over time). Hosting user information is probably out of the picture as well.

Even with cheapest hosting, assuming the total cost ends up being as low as 5 USD/mo (which it won't due to number of users), there's no guarantee the owner could/would want to continuously pay for the hosting. The moment they stop, the recommendation feature is reverted to old behavior (or broken).

So basically what @MariusAlbrecht said: something that recommends based on history/favorites by weighing most read topics and showing random top results of those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants