Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Caching System #15

Open
thehowl opened this issue Apr 22, 2018 · 5 comments
Open

Better Caching System #15

thehowl opened this issue Apr 22, 2018 · 5 comments
Milestone

Comments

@thehowl
Copy link
Member

thehowl commented Apr 22, 2018

We need to update our caching system. This is necessary due to the fact that osu! asked us to make less requests. We thus need to find a better way to cache beatmaps and serve them.

I'm mostly creating this issue to document what the plan is for the Ripple beatmap mirror and how we're going to solve the problem. Also, I want to clear up the process I have in my head, so that I can then proceed to write the code.

Problem

We need to limit requests to osu! as much as possible. At most 10000 requests per month (in a first phase, then we'd need to gradually scale it down to ~2000) for ranked beatmaps, and 600 requests per month for unranked beatmaps. This allows about 20 unranked beatmaps per day, so roughly one every 1hr12m. If possible, we'd also need to gradually scale down the unranked beatmap allowance, as that's very expensive for osu! to do.

Solution

Beatmaps are served on three levels:

  • Server cache (fastest): we have 150GB of SSD that we allocated on the server for the mirror. Whenever possible, we serve data directly from the server's own cache.
  • Wasabi Hot Storage: we cache beatmaps we download from osu using Wasabi. I can estimate that we will probably use about 1 TB of data by storing only beatmaps downloaded in the previous 6 months and have some rate limiting (3000 reqs/day per IP, roughly one beatmap every ~30 secs. At most 5 uncached unranked beatmaps per day)
  • osu!: if a beatmap is not available neither on server nor on wasabi, then we download it directly from osu! like we've always done.

CheeseGull will keep all of its discovery code, but will additionally remove stale beatmaps from its own cache and from Wasabi whenever they're discovered to be stale.

@thehowl
Copy link
Member Author

thehowl commented Apr 22, 2018

Another approach: Decentralization

tl;dr: we can't because there's no verification mechanism. This is mostly just a braindump.


Another way to solve this would be decentralization. Either using a protocol of our own, or relying on IPFS, or Torrent, anything would do, but we could make a sort of "alliance" of people which rely on the mirror (mostly other Ripple-based osu! private servers) so that we all agree to take part in the decentralization effort by offering space or asking our users. This might seem a bit weird but it would solve quite a few issues:

  • Other mirrors wishing to pop up and having to rely on clever hacks like we do (scraping, etc) and can just join in the network and pull everything that they need
  • The space problem (if it's all decentralized it probably won't be a problem to handle it, thanks to other people helping us with the load)
  • Also, making an online beatmap search system, we could provide downloads with WebTorrent in a similar fashion to PeerTube, lowering bandwith requirements

The Verification Problem

There is a problem, though: that is, how can you know that a beatmap file is actually the same as the one that osu! gives? The answer right now is: you download the beatmap from the osu website and check the contents. There is no checksum or signature of the .osz file, so if you want to verify it, you'd run again in the original problem, which is lowering requests to osu!.

There is indeed a way to verify .osu files, because the osu! API provides a file_hash, which we can just use to verify them. That's one big part of the problem solved - except that a beatmapset is not just composed of .osu file, but must have a song and background and may have hitsounds, skin and video. How do we verify all of them?

As a note: we could ask on the osu-api issues to add the osz checksum, but as anyone who's been around in the osu! development community for long enough will know, asking something on osu-api is like shouting in a void.

So, we have no means of doing proper verification on the file contents

The reason we need this is that I wouldn't want Ripple to be the single point of contact with osu! - I'd like to have other servers (or even users!) be able to join in and add more beatmaps, either linking to the osu! API to have verification, or even better would be the beatmap file PGP-Signed (or with any other mechanism) by osu!, and even better signed with with the metadata of the beatmap as well.

We don't want to trust other servers when they say "Hey, look, this comes from osu!" without any verification. Sure, we could unzip everything and check the osu files and check all the beatmap information we can obtain via the osu! API, but there would be no mechanism to check the song file or any of the other files I listed previously.

Thus

OP is the best solution, at least until some verification is in place. sigh

@thehowl thehowl added this to the v2.2 milestone Apr 22, 2018
@MaxKruse
Copy link

From a gameplay standpoint, it is not necessary to verify:

  • Hitsounds
  • Backgrounds/Storyboards
  • Song File

These can be changed by users however they desire without it affecting the submission process of scores they might get.
There is nothing stopping people at this current moment from changing these files as it does not affect score submissions. It is actually happening quite a lot (some people delete the song files because they dislike them for whatever reason, change the BG of all maps to grey (rrtyui flashbacks) and swapping the custom hitsounds provided with the mapset for others).

Therefore, we don't need to verify these. It is completely fine to check for file_hash of each .osu file.

@thehowl
Copy link
Member Author

thehowl commented Apr 23, 2018

Yes, I sort of considered that. But it is, however, important. We could even discard everything BUT we'd still need to check the music file. Mixing up music files is really not an option.

Besides, if all we could guarantee to be correct were beatmap files, then we could just serve data from osu.ppy.sh/osu/:id, with a random picture and a random mp3 file.

@thehowl
Copy link
Member Author

thehowl commented May 13, 2018

For future reference and those reading along, I'll leave the counterargument I exposed talking with @ilyt on Discord:

there's quite a few issues with that, first of all peppy won't be happy with it because you'd be downloading unranked beatmaps too. he said that if I wanted to do that I'd have to pay $500 upfront for the costs that they have
ilyt - Today at 21:32
ahh ok
it'd only be occasionally downloading them (with the exception of the start)
Howl - Today at 21:33
i'm not sure if downloading beatmaps straight after they are updated is ok for him (in the sense of whether they have them on amazon or on their download servers)
but i think they probably don't go on the download servers straight after being updated, instead they are just placed on amazon until an user requests them
ilyt - Today at 21:33
ahh
Howl - Today at 21:33
which would probably mean hosting a mirror is an ongoing cost for them, which will probably become an ongoing cost for you
i don't know which one of the two they do. i tried to ask, but I never got a response
ilyt - Today at 21:34
hmm
because thats really the only feesable way of doing it (in my head)
is having them all stored on your own server and then deleting upon ranking
Howl - Today at 21:36
the rest of your idea is basically what i explained on the first two posts, mostly the first though, mostly with more technical stuff on where to store the data so that we don't have to get an expensive server with ~4 TB of space (at least)
If I could host at home or have the server which I can modify on my own, that probably wouldn't be an issue since I can just get the server and place a large disk in it, then there's no cost after that
ilyt - Today at 21:36
hmm
it really is only your first answer that would work
Howl - Today at 21:38
but with hosting providers they generally end up increasing other specs apart from storage, which winds up being huge per-month costs when really it's just storage that you need

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@thehowl @MaxKruse and others