Skip to content

Commit

Permalink
Refactored cache system and updated dependencies (#9)
Browse files Browse the repository at this point in the history
  • Loading branch information
Valentin Porchet authored Jan 23, 2023
1 parent 907adb6 commit be707a0
Show file tree
Hide file tree
Showing 39 changed files with 451 additions and 722 deletions.
25 changes: 10 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
## 👷 W.I.P. 👷

- Various improvements on caching system
- Translations for specific heroes pages (will be available using a query parameter)
- Additional data about gamemodes and maps

Expand All @@ -37,9 +36,9 @@ If you want to use the API, and you have the possibility to host your own instan

### API Cache and Parser Cache

OverFast API introduces a very specific cache system, stored on a **Redis** server, and divided in two parts :
OverFast API includes a cache stored on a **Redis** server, and divided in two parts :
* **API Cache** : a very high level cache, linking URIs (cache key) to raw JSON data. When first doing a request, if a cache is available, the JSON data is returned as-is by the **nginx** server. The cached values are stored with an arbitrary TTL (time to leave) parameter depending on the called route.
* **Parser Cache** : a specific cache for the parser system of the OverFast API. When an HTML Blizzard page is parsed, a hash of the HTML content and the parsing result (as a JSON string) are stored, in order to minimize the heavy parsing process if the page hasn't changed since the last API call. There is no TTL on this cache.
* **Parser Cache** : a specific cache for the parser system of the OverFast API. When an HTML Blizzard page is parsed, the parsing result (JSON object) is stored, in order to minimize calls to Blizzard when doing a request with filters. The value is refreshed in the background before its expiration.

Here is the list of all TTL values configured for API Cache :
* Heroes list : 1 day
Expand All @@ -49,11 +48,13 @@ Here is the list of all TTL values configured for API Cache :
* Players career : 1 hour
* Players search : 1 hour

### Background cache refresh
### Refresh-Ahead cache system

In order to reduce the number of requests to Blizzard that API users can make, I introduced a specific cache refresh system. The main idea is to update the API Cache in the background (server side) when needed, just before its expiration. For example, if a user requests its player career page, it will be slow for the first call (2-3s in total), but very fast for all the next times, thanks to this system.
In order to reduce the number of requests to Blizzard that API users can make, a Refresh-Ahead cache system has been implemented.

I know that this system is rudimentary, and could be memory consuming if a lot of pages are getting cached. I just did it as an personal exercise, and it shall be improved if the API users base is growing faster than the memory available on my server.
When a user requests its player career page, it will be slow for the first call (2-3s in total), as it's retrieving data from Blizzard. Then, the computed data will be stored in the Parser Cache (which will be refreshed in background), and the final data will be stored in API Cache (only created when a user makes a request).

Thanks to this system, user requests on the same career page will be very fast for all the next times.

## 🐍 Architecture
You can run the project in several ways, though I would advise the first one for better user experience.
Expand All @@ -73,9 +74,6 @@ sequenceDiagram
Python->>Redis: Make Parser Cache request
alt Parser Cache is available
Redis-->>Python: Return Parser Cache
alt Parser Cache is outdated
Python->>Python: Parse HTML page
end
else
Redis-->>Python: Return no result
Python->>Python: Parse HTML page
Expand All @@ -85,7 +83,7 @@ sequenceDiagram
end
```

Using this way (via `docker-compose`), all the responses will be cached into Redis, and will be sent by nginx directly for the next times without requesting the Python server at all. It's the best performance compromise as nginx is the best for serving static content. A single request can lead to several Parser Cache requests, depending on configured Blizzard pages.
Using this way (via `docker-compose`), the response will be cached into Redis, and will be sent by nginx directly for the next times without requesting the Python server at all. It's the best performance compromise as nginx is the best for serving static content. A single request can lead to several Parser Cache requests, depending on configured Blizzard pages.

### Python (uvicorn) + Redis server (caching)
```mermaid
Expand All @@ -101,9 +99,6 @@ sequenceDiagram
Python->>Redis: Make Parser Cache request
alt Parser Cache is available
Redis-->>Python: Return Parser Cache
alt Parser Cache is outdated
Python->>Python: Parse HTML page
end
else
Redis-->>Python: Return no result
Python->>Python: Parse HTML page
Expand All @@ -112,7 +107,7 @@ sequenceDiagram
end
```

Using this way (by manually doing it), all the responses will be cached into Redis, and the cache will be checked by the Python server (`USE_API_CACHE_IN_APP` setting in `config.py` must be set to `True`). It's an acceptable compromise.
Using this way (by manually doing it), the response will be cached into Redis, and the cache will be checked by the Python server (`USE_API_CACHE_IN_APP` setting in `config.py` must be set to `True`). It's an acceptable compromise.

### Python (uvicorn) only
```mermaid
Expand All @@ -121,7 +116,7 @@ sequenceDiagram
User->>Python: Make an API request
Python-->>User: Return API data after parsing
```
Using this way (only using the image built with the `Dockerfile` alone), there will be no cache at all, and every call will make requests to Blizzard pages. I advise not to use this way unless for debugging eventually.
Using this way (only using the image built with the `Dockerfile` alone), there will be no cache at all, and every call will make requests to Blizzard pages. I advise not to use this way unless for debugging.

## 💽 Installation

Expand Down
139 changes: 62 additions & 77 deletions overfastapi/commands/check_and_update_cache.py
Original file line number Diff line number Diff line change
@@ -1,101 +1,86 @@
"""Command used in order to check and update Redis API Cache depending on
the expired cache refresh limit configuration. It can be run in the background.
"""
import re
from fastapi import HTTPException

from overfastapi.common.cache_manager import CacheManager
from overfastapi.common.enums import HeroKey
from overfastapi.common.exceptions import ParserBlizzardError, ParserParsingError
from overfastapi.common.helpers import overfast_internal_error
from overfastapi.common.logging import logger
from overfastapi.handlers.get_hero_request_handler import GetHeroRequestHandler
from overfastapi.handlers.get_player_career_request_handler import (
GetPlayerCareerRequestHandler,
)
from overfastapi.handlers.get_player_stats_summary_request_handler import (
GetPlayerStatsSummaryRequestHandler,
)
from overfastapi.handlers.list_gamemodes_request_handler import (
ListGamemodesRequestHandler,
)
from overfastapi.handlers.list_heroes_request_handler import ListHeroesRequestHandler
from overfastapi.handlers.list_roles_request_handler import ListRolesRequestHandler

# Mapping of cache_key prefixes to the associated
# request handler used for cache refresh
PREFIXES_HANDLERS_MAPPING = {
"/heroes": ListHeroesRequestHandler,
"/roles": ListRolesRequestHandler,
**{f"/heroes/{hero_key}": GetHeroRequestHandler for hero_key in HeroKey},
"/gamemodes": ListGamemodesRequestHandler,
"/players": GetPlayerCareerRequestHandler,
"/players_stats": GetPlayerStatsSummaryRequestHandler,
from overfastapi.config import BLIZZARD_HOST, PARSER_CACHE_KEY_PREFIX
from overfastapi.parsers.gamemodes_parser import GamemodesParser
from overfastapi.parsers.hero_parser import HeroParser
from overfastapi.parsers.heroes_parser import HeroesParser
from overfastapi.parsers.player_parser import PlayerParser
from overfastapi.parsers.player_stats_summary_parser import PlayerStatsSummaryParser
from overfastapi.parsers.roles_parser import RolesParser

# Mapping of parser class names to linked classes
PARSER_CLASSES_MAPPING = {
"GamemodesParser": GamemodesParser,
"HeroParser": HeroParser,
"HeroesParser": HeroesParser,
"PlayerParser": PlayerParser,
"PlayerStatsSummaryParser": PlayerStatsSummaryParser,
"RolesParser": RolesParser,
}

# Regular expressions for keys we don't want to refresh the cache explicitely
# from here (either will be done in another process or not at all because not
# relevant)
EXCEPTION_KEYS_REGEX = {
r"^\/players\/[^\/]+\/(summary|stats)$", # players summary or stats
r"^\/players$", # players search
}
# Generic cache manager used in the process
cache_manager = CacheManager()


def get_soon_expired_cache_keys() -> set[str]:
"""Get a set of URIs for values in API Cache which will expire soon
without taking subroutes and query parameters"""
cache_manager = CacheManager()

expiring_keys = set()
for key in cache_manager.get_soon_expired_api_cache_keys():
# api-cache:/heroes?role=damage => /heroes?role=damage => /heroes
cache_key = key.split(":")[1].split("?")[0]
# Avoid keys we don't want to refresh from here
if any(
re.match(exception_key, cache_key) for exception_key in EXCEPTION_KEYS_REGEX
):
continue
# Add the key to the set
expiring_keys.add(cache_key)
return expiring_keys
"""Get a set of URIs for values in Parser Cache which are obsolete
or will need to be updated.
"""
return set(cache_manager.get_soon_expired_parser_cache_keys())


def get_request_handler_class_and_kwargs(cache_key: str) -> tuple[type, dict]:
"""Get the request handler class and cache kwargs (to give to the
update_all_api_cache() method) associated with a given cache key
"""
cache_request_handler_class = None
def get_request_parser_class(cache_key: str) -> tuple[type, dict]:
"""Get the request parser class and cache kwargs to use for instanciation"""
cache_kwargs = {}

uri = cache_key.split("/")
if cache_key.startswith("/players"):
# Specific case for stats summary
specific_cache_key = (
"/players_stats" if cache_key.endswith("/stats/summary") else "/players"
)
cache_request_handler_class = PREFIXES_HANDLERS_MAPPING[specific_cache_key]
# /players/Player-1234 => ["", "players", "Player-1234"]
cache_kwargs = {"player_id": uri[2]}
elif cache_key.startswith("/heroes") and len(uri) > 2:
cache_request_handler_class = PREFIXES_HANDLERS_MAPPING[cache_key]
cache_kwargs = {"hero_key": uri[2]}
else:
cache_request_handler_class = PREFIXES_HANDLERS_MAPPING[cache_key]
specific_cache_key = cache_key.removeprefix(f"{PARSER_CACHE_KEY_PREFIX}:")
parser_class_name = specific_cache_key.split("-")[0]
uri = specific_cache_key.removeprefix(f"{parser_class_name}-{BLIZZARD_HOST}").split(
"/"
)
cache_parser_class = PARSER_CLASSES_MAPPING[parser_class_name]

if parser_class_name in ["PlayerParser", "PlayerStatsSummaryParser"]:
cache_kwargs = {"player_id": uri[3]}
elif parser_class_name == "HeroParser":
cache_kwargs = {"hero_key": uri[3]}

return cache_request_handler_class, cache_kwargs
return cache_parser_class, cache_kwargs


def main():
"""Main method of the script"""
logger.info(
"Starting Redis cache update...\n"
"Retrieving cache keys which will expire soon..."
)
soon_expired_cache_keys = get_soon_expired_cache_keys()
logger.info("Done ! Retrieved keys : {}", len(soon_expired_cache_keys))
logger.info("Starting Redis cache update...")

for key in soon_expired_cache_keys:
logger.info("Updating all cache for {} key...", key)
request_handler_class, kwargs = get_request_handler_class_and_kwargs(key)
request_handler_class().update_all_api_cache(parsers=[], **kwargs)
keys_to_update = get_soon_expired_cache_keys()
logger.info("Done ! Retrieved keys : {}", len(keys_to_update))

for key in keys_to_update:
logger.info("Updating data for {} key...", key)
parser_class, kwargs = get_request_parser_class(key)

parser = parser_class(**kwargs)

try:
parser.retrieve_and_parse_blizzard_data()
except ParserBlizzardError as error:
logger.error(
"Failed to instanciate Parser when refreshing : {}",
error.message,
)
continue
except ParserParsingError as error:
overfast_internal_error(parser.blizzard_url, error)
continue
except HTTPException:
continue

logger.info("Redis cache update finished !")

Expand Down
9 changes: 5 additions & 4 deletions overfastapi/commands/check_new_hero.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,13 @@

def get_distant_hero_keys() -> set[str]:
"""Get a set of Overwatch hero keys from the Blizzard heroes page"""
heroes_parser = HeroesParser()

try:
heroes_parser = HeroesParser()
except HTTPException:
raise SystemExit
heroes_parser.retrieve_and_parse_blizzard_data()
except HTTPException as error:
raise SystemExit from error

heroes_parser.parse()
return {hero["key"] for hero in heroes_parser.data}


Expand Down
48 changes: 28 additions & 20 deletions overfastapi/common/cache_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
=> {"hash": "12345abcdef", "data": "[{...}]"}
"""

import json
import zlib
from typing import Callable, Iterator

import redis
Expand Down Expand Up @@ -85,47 +87,53 @@ def get_api_cache(self, cache_key: str) -> str | None:
return self.redis_server.get(f"{API_CACHE_KEY_PREFIX}:{cache_key}")

@redis_connection_handler
def get_parser_cache(self, cache_key: str) -> str | None:
def get_parser_cache(self, cache_key: str) -> dict | None:
"""Get the Parser Cache value associated with a given cache key"""
return self.redis_server.hgetall(f"{PARSER_CACHE_KEY_PREFIX}:{cache_key}")

def get_unchanged_parser_cache(
self, cache_key: str, parser_hash: str
) -> str | None:
"""Get the Parser Cache HTML data if the cached hash matches the given
parser hash (it means the data has not changed since the last parsing)
"""
parser_cache = self.get_parser_cache(cache_key)
parser_cache = self.redis_server.get(f"{PARSER_CACHE_KEY_PREFIX}:{cache_key}")
return (
parser_cache[b"data"].decode("utf-8")
if parser_cache and parser_cache[b"hash"].decode("utf-8") == parser_hash
json.loads(zlib.decompress(parser_cache).decode("utf-8"))
if parser_cache
else None
)

@redis_connection_handler
def update_api_cache(self, cache_key: str, value: str, expire: int) -> None:
def update_api_cache(self, cache_key: str, value: dict | list, expire: int) -> None:
"""Update or set an API Cache value with an expiration value (in seconds)"""
self.redis_server.set(f"{API_CACHE_KEY_PREFIX}:{cache_key}", value, ex=expire)

# Compress the JSON string
str_value = json.dumps(value, separators=(",", ":"))

# Store it in API Cache
self.redis_server.set(
f"{API_CACHE_KEY_PREFIX}:{cache_key}", str_value, ex=expire
)

@redis_connection_handler
def update_parser_cache(self, cache_key: str, value: dict) -> None:
def update_parser_cache(self, cache_key: str, value: dict, expire: int) -> None:
"""Update or set a Parser Cache value with an expire value"""
self.redis_server.hset(f"{PARSER_CACHE_KEY_PREFIX}:{cache_key}", mapping=value)
compressed_value = zlib.compress(
json.dumps(value, separators=(",", ":")).encode("utf-8")
)
self.redis_server.set(
f"{PARSER_CACHE_KEY_PREFIX}:{cache_key}", value=compressed_value, ex=expire
)

def get_soon_expired_api_cache_keys(self) -> Iterator[str]:
"""Get a set of cache keys for values in API Cache which will expire soon"""
def get_soon_expired_parser_cache_keys(self) -> Iterator[str]:
"""Get a set of cache keys for values in Parser Cache which will expire soon"""
if not self.is_redis_server_up:
yield from ()
return

try:
api_cache_keys = self.redis_server.keys(pattern=f"{API_CACHE_KEY_PREFIX}:*")
parser_cache_keys = self.redis_server.keys(
pattern=f"{PARSER_CACHE_KEY_PREFIX}:*"
)
except redis.exceptions.RedisError as err:
logger.warning("Redis server error : {}", str(err))
yield from ()
return

for key in api_cache_keys:
for key in parser_cache_keys:
# Get key TTL in redis
try:
key_ttl = self.redis_server.ttl(key)
Expand Down
4 changes: 2 additions & 2 deletions overfastapi/common/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ def __str__(self):
return self.message


class ParserInitError(OverfastError):
class ParserBlizzardError(OverfastError):
"""Exception raised when there was an error in a Parser class
initialization, usually when the data is not available
"""

status_code = status.HTTP_500_INTERNAL_SERVER_ERROR
message = "Parser Init Error"
message = "Parser Blizzard Error"

def __init__(self, status_code: int, message: str):
super().__init__()
Expand Down
8 changes: 4 additions & 4 deletions overfastapi/common/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,11 @@ def overfast_request(url: str) -> requests.Response:
}
try:
return requests.get(url, headers=headers, timeout=10)
except requests.exceptions.Timeout:
except requests.exceptions.Timeout as error:
raise blizzard_response_error(
status_code=0,
error="Blizzard took more than 10 seconds to respond, resulting in a timeout",
)
) from error


def overfast_internal_error(url: str, error: Exception) -> HTTPException:
Expand Down Expand Up @@ -117,14 +117,14 @@ def send_discord_webhook_message(message: str) -> requests.Response | None:
def read_html_file(filepath: str) -> str:
"""Helper method for retrieving fixture HTML file data"""
with open(
f"{TEST_FIXTURES_ROOT_PATH}/html/{filepath}", "r", encoding="utf-8"
f"{TEST_FIXTURES_ROOT_PATH}/html/{filepath}", encoding="utf-8"
) as html_file:
return html_file.read()


def read_json_file(filepath: str) -> dict | list:
"""Helper method for retrieving fixture JSON file data"""
with open(
f"{TEST_FIXTURES_ROOT_PATH}/json/{filepath}", "r", encoding="utf-8"
f"{TEST_FIXTURES_ROOT_PATH}/json/{filepath}", encoding="utf-8"
) as json_file:
return json.load(json_file)
Loading

0 comments on commit be707a0

Please sign in to comment.