diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 45fd2dd..cfacf09 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -11,4 +11,3 @@ jobs: uses: lycheeverse/lychee-action@v2.4.1 with: fail: true - diff --git a/lychee.toml b/lychee.toml new file mode 100644 index 0000000..a0f3da9 --- /dev/null +++ b/lychee.toml @@ -0,0 +1 @@ +exclude_path = ["meeting-notes"] \ No newline at end of file diff --git a/meeting-notes/2022-10-06.md b/meeting-notes/2022-10-06.md new file mode 100644 index 0000000..5e74c54 --- /dev/null +++ b/meeting-notes/2022-10-06.md @@ -0,0 +1,41 @@ +OpenPodcastSync API meeting notes +=== + +### Participating projects + +podfriend, gpodder 4 Nextcloud, antennapod, kasts, funkwhale + + +### What are the problems we are trying to solve? + +Try to get the big picture around the various issues. + +Subscriptions. + +Problems identified with gpodder: +- Multi-device support is confusing to users. Gpodder stores each device as an entity and allows you to link to devices to sync them. Users find this confusing and don't understand why content isn't synced properly across non-linked devices. + - This is only implemented for subscriptions, not for episodes. This inconsistency is confusing for users. +- The database often overflows due to a large dataset being stored. All actions are stored and never cleaned up, and all episode actions can only be stored once. E.g: + - If you listen to an episode once and then listen again, an action such as "new" is only sent once. + - Sending the exact same play position once cannot be stored twice. +- Duplicate episodes/subscriptions are an issue. They use the media URL as an identifier for an episode, but if the file changes due to reupload or something else this creates a brand new entry. Syncing these changes is difficult. +- User documentation is lacking. e.g.: + - If podcast creators change GUID and URL for an episode, there isn't an agreed-upon behavior for the API or for clients consuming the episodes. + - If an action is stored locally, and a conflicting action is received from the server at later stage; what happens on sync? Can take inspiration from listenbrains scrobbles. +- Subscription lists can duplicate due to URLs not being updated reliably. +- There is no agreed-upon way to handle updating URLs, and this is mostly being handled by clients +- We need to be able to synchronize a queue of episodes in the correct order between devices +- We need to handle multiple queues, and have graceful handling for syncing with clients/servers that cannot handle multiple queues + +People would expect to find all their data, queues and progress to be synced accross all their apps, using a single online identity. +Howto handle when a server shuts down? Would we need some export/import features? Like an extended OPML? Or can we rely on clients as 'intermediaries' (sync data, log out from server, log in to other server)? +Switching from mobile (home/commute) to web/desktop app (at work) is a common use case amongst us. + +What would be our Minimum Viable Product? + +Next steps? +- split the list into compenents problems +- asynchronous discussions +- organize meetings when needed on specific matters + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2023-03-14.md b/meeting-notes/2023-03-14.md new file mode 100644 index 0000000..c67dacd --- /dev/null +++ b/meeting-notes/2023-03-14.md @@ -0,0 +1,35 @@ +Meeting 2023-03-14 +=== + +Participants: +* Sporiff +* keunes +* gcrkrause + +# Who has the authority of the GUID + +* In the first place the RSS feed +* If thats not available the server might *optionally* ask podcastindex.org +* The client may send a `guid` in the `POST` request **only** if it is obtained from the RSS feed. The server accepts sent `guid` information as authoritative +* The client already has the GUID from the feed +* The server (project) may decide to be as slim as possible, to the extent that it doesn't do any RSS fetching +* The server MUST return a `guid` immediately. This can either be the `guid` sent by the client **or** a generated `guid` if nothing is sent. An asynchronous task CAN fetch the RSS feed to check for a `guid` if one was generated, store an updated `guid` and put an 'updated since' flag to tell clients on next connect to update this data. + * In case a user subscribes to the same podcast though with different feed URLs while there is no `guid` that connects the two, or if a server is unresponsive and this causes issues, it is accepted that this can lead to duplicate subscriptions. + +# Deletion process + +* The `DELETE` verb should actually remove data as a cascade + * The server should keep a record **only** of the GUID and mark it as deleted + * The API should return a `410 GONE` status for any deleted entries +* The `PATCH` unsubscribe request marks all entries as **unsubscribed** + * The server should not remove any data associated with **unsubscribed** subscriptions unless they are deleted + +# Tasks until next time + +- [ ] Update specs @Ciaran +- [ ] [Setup Hosted OpenAPI specs](https://github.com/OpenPodcastAPI/api-specs/issues/13) @Georg +- [ ] Setup Sphinx @Ciaran +- [ ] Reference Implementation @Georg +- [ ] Check that Ciarán isn't speaking nonsense in client behavior spec @keunes + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` diff --git a/meeting-notes/2023-04-11.md b/meeting-notes/2023-04-11.md new file mode 100644 index 0000000..b688de1 --- /dev/null +++ b/meeting-notes/2023-04-11.md @@ -0,0 +1,35 @@ +Open Podcast API 11/04/2023 +=== + +present: Ciarán (FW), Jonathan (GfN), Keunes (AP) and Frederik ([MusicPod](https://github.com/ubuntu-flutter-community/musicpod)) + +Ciarán to update: + +* Fetch logic: + * All timestamp fields must be checked against the `since` parameter in the call (`subscription_changed`, `guid_changed`) +* Deletion logic: + * `is_deleted` boolean field should be replaced with a timestamp field that is included in fetch calls to inform clients of deletions + * A deleted subscription should be reinstated by a client adding a new subscription with the same GUID. The `subscription_changed` and `guid_changed` fields should reflect the date that the subscription is reinstated. The `deleted` timestamp field should be NULLed + * On receipt of a deleted subscription, the client should present the user with the option to **remove** their local data or **send** their local data to the server to reinstate the subscription details + +Keunes to add a project goal/description to the [Index page](https://github.com/OpenPodcastAPI/api-specs/blob/main/docs/index.md) directly in the PR (use [MyST formatting](https://myst-parser.readthedocs.io/en/latest/)). + +We'll call the specs 'pre-release' or 'ALPHA' until we have implemented all specs that we deem as 'required' for all servers. Ciarán will add a banner at the top of the pages to warn readers of this. + +JonOfUs to add a GitHub Actions workflow for PRs to create and publish a preview of them (template [here](https://github.com/OpenPodcastAPI/api-specs/issues/28)) + +Once the above changes are reflected, we should merge the subscriptions endpoint spec to have something on the site. + +We can use some Creative Commons license for this specification (tbd). Reference implementations can pick their own license (gPodder for Nextcloud & Funkwhale will have AGPL). + +Ciarán will be in a podcast early May, would be good to have the Subscriptons endpoint merged by then. + +## Future discussion + +* Ensure that user data is separated by user ID +* Outline what data can be shared and what is per-user data +* Reflect these rules in the spec for multi-tenant and single-tenant servers +* What calls are core/required; which ones are 'feature' ([GH discussion](https://github.com/orgs/OpenPodcastAPI/discussions/16)) +* Declaring versions & supported endpoints (well-known/other way; [Matrix](https://spec.matrix.org/v1.6/client-server-api/#capabilities-negotiation) e.g. does this at `$prefix/v1/capabilities`) + +###### tags: `meeting` `project-management` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2023-05-30.md b/meeting-notes/2023-05-30.md new file mode 100644 index 0000000..8c2d470 --- /dev/null +++ b/meeting-notes/2023-05-30.md @@ -0,0 +1,153 @@ +2023-05-30 9pm in the middle of the night +=== + +## Endpoints +* `GET/PUT /episodes` + * returns only episodes changed + * parameter `since` +* ~~`GET/PUT /episodes/{guid-hash}`~~ + * Don't allow this endpoint to prevent problems with duplicate GUIDs +* `GET /subscriptions/{guid}/episodes` + * parameter `since` + * parameter `guid`? +* `GET/PUT /subscriptions/{guid}/episodes/{fetch-hash}` (hash: SHA1?) + * if fetch-hash clash, server expected to return BAD REQUEST + * Hash here, because GUIDs can be any String + + +We want to explain in the specs why we have endpoints 'under' subscriptions, and why we might refuse updates. (i.e. how this will help avoid gPodder API pitfalls.) + +## Episode endpoint + +The episode endpoint is required to synchronize playback positions and played status for specific episodes. At a minimum, the endpoint should accept and return the following: + +1. The episode's **Podcast GUID** (most recent) +2. The episode's **GUID** (sent by the client if found in the RSS feed, or generated by the server if not): String (not necessarily GUID/URL formatted).` +4. A **Status** field containing lifecycle statuses. E.g.: + * `New` + * `Played` + * `Ignored` + * `Queued` +6. A **Playback position** marker, updated by a PUT request +7. A **timestamp** of the last time the episode was played/paused (used for conflict resolution on the playback position) +8. A **Favorite** field to mark episodes +9. A **timestamp** for the last time some metadata (except playback position) was updated + +We discussed if it makes sense to use episode numbers, but it's not part of the feed anyways so we don't have this information and don't need it anyways + +https://www.rssboard.org/rss-specification#ltguidgtSubelementOfLtitemgt + + +### Episode identification +#### Fetch-hash vs GUID +Discussion whether to generate a new (static?) identifier per episode and use that for synchronisation (clients would have to store it additionally per episode?) or to use existing GUIDs as sync identifier and generate them if none is present (one endpoint needs the GUIDs to be passed by their hash/base64 then for REST-compliancy) + +#### Fetch-hash +Fetch-hash creation: SHA1/MD5 hash of +1. `` https://www.rssboard.org/rss-specification#ltguidgtSubelementOfLtitemgt + +x. `` https://www.rssboard.org/rss-specification#hrelementsOfLtitemgt +x. `` (aka media file URL) https://www.rssboard.org/rss-specification#ltenclosuregtSubelementOfLtitemgt + +Priority of latter 2 tbd: `` might be less likely to be unique, while `` might be less stable (more likely to change). + +Consideration: why not BASE64? (REST-compliant, can be "unhashed", so hash wouldn't have to be stored on the server) + +Good practice/required: store all 3 (GUID, link, media file URL). This will allow for later matching of episodes if one or two of these are missing. For example, if a totally new client is connecting to a server, and an episode doesn't have a GUID and the `` has changed, matching would still be possible based on media file URL. (If we don't do this, finding the right episode locally might be hard when receiving a fetch-hash that's not unique, or a GUID that's missing. We know the podcast and within each podcast there'll be only a limited set of 'wrong' episodes, so a client would only have to create hashes for a few episodes in order to find a match. But still, not very economic.) + +
+ Matching proposal in pseudo-code (click to expand) + +```pseudo-code +are_episodes_equal(client-episode c, server-episode s): + // this filters out any potential GUID duplicates + if c.podcast_guid != s.podcast_guid then + return False + + // if GUID is present, decide exclusively according to it + if c.guid not empty then + return c.guid == s.guid + + // if enclosure matches, probably the same (since they share the media file) + if c.enclosure not empty && c.enclosure == s.enclosure then + return True + + // case: no media file + if c.enclosure empty then + // no guid, enclosure or link -> not matchable + if c.link empty then + return False + + // no media file, but episode URL matches - very probably the same + // (how large is the error here?) + if c.link == l.link then + return True + + // All other cases: not matching + return False +``` +

+ +?? Each field that is empty/not present in the RSS is stored & sent empty. ~~The fetch-hash is only used when sending a request about a specific episode.~~ (that wouldn't work well in case of batch updates - see below) Payloads don't contain fetch-hashes, only the three separate fields. + +Two options for identifying episodes in communication: +[I don't think these are the only options, see [here](#Fetch-hash-vs-GUID)] +* For each episode (e.g. in queue; batch update), all three fields/tags are included. Lot of (unnecessary) data exchange. +* Each episode gets a calculated fetch-hash, which is used for communication. Clients can decide to store or generate on the fly. (Generating on-the-fly is dangerous, episode identifier should be static even if episode changes) + +Server creates fetch-hash, similar to creation of Podcast GUID, based on the logic described above. + +Why do we trust the server to create the hash, more than the client? Because for each person, there's probably just 1 server in the game, more likely multiple clients. So if the server messes it up, there's still a single outcome for each user. + +#### GUID +Why shouldn't the server just create a GUID (seed: available payloads or whole episode, can also be just random) and send this back to the client? (the client would map using `` and `` and then store this GUID) +[Advantage: less payload fields, only ``, `` and `` and after first sync only `` (`guid-hash` only for `PUT /subs../{guid}/epis../{guid-hash}`)] +[Further advantage: easier to implement for clients, they probably already have an `episode_guid` field in their DB] + +Only create GUID if none is present, otherwise use existing one. +Identify episode always by `podcast_guid`+`episode_guid` (e.g. when referencing queue items, settings, ...) +[PodcastIndex seems to handle this [the same way](https://podcastindex-org.github.io/docs-api/#get-/episodes/byguid)] + +The workflow if a new client connects could then be: +1. Get subscriptions & fetch feeds +2. Get episodes +3. Feed with GUIDs: map by GUID +4. Feed without GUIDs: map by matching algorithm [[above](#Matching-proposal-in-pseudo-code)], then store GUID from sync server + +#### Deduplication + +Two options: +a. agree on a deduplication logic as part of the spec which is to be executed at server level (hard to 'enforce') +b. let clients figure out deduplication, and spec the calls that will allow clients to merge episodes. + +To be discussed further. Latter is easier for us :-) +Latter should be in the spec in either case, so that we don't have to change the whole spec if some podcast feeds mess up in a way we never anticipated. Clients can adapt a lot faster. + +#### New GUID/Fetch-hash logic +Necessary for changing GUIDs, can also be used for deduplication? + +Options: +1. `PUT /episodes` with additional field `old_fetch-hash` (or `old_guid`) +2. `PUT /subscriptions/{guid}/episodes/{guid-/fetch-hash}` with additional field `new_fetch-hash` (or `new_guid`) + +Case where both episodes are contained in the feed (episode didn't change, but podcasters published twice): To mark duplicate, additional boolean `is_duplicate` so that the server handles `fetch-hash`/`guid` of both as aliases (tombstoning one, if one of them is requested, return aliases in field/array `aliases`/`duplicate_fetch-hashes/guids`) + +In both cases, server changes fetch-hash/GUID of episode entry, sets `fetch-hash/GUID_changed` timestamp and creates tombstone for old value +[On `GET /episodes`, old value is in `fetch-hash`/`guid` and new value in `new_fetch-hash/new_guid`, same behaviour as in Subscriptions] + +Case to handle: +1. Client 1 marks {`fetch-hash2`/`guid2`} as new guid of {`fetch-hash1`/`guid1`} +2. Client 2 receives & stores this +3. Client 2 marks {`fetch-hash1`/`guid1`} as new guid of {`fetch-hash2`/`guid2`} + +(could happen through e.g. slightly different podcast feed, e.g. one feed contains MP3s, the other AACs, but podcast GUID is the same) + + +## Excursus Database Schema in the specs + +* We should focus on the format of the communications, not how the database is stored +* We have all field data types specified anyways in the API endpoint specification +* We can leave the proposed database schema as an example + + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2023-07-11.md b/meeting-notes/2023-07-11.md new file mode 100644 index 0000000..ff32436 --- /dev/null +++ b/meeting-notes/2023-07-11.md @@ -0,0 +1,37 @@ +2023-07-11 20:00 +=== + +## Episode identification + +Possible way forward for selecting ideal ['identification' (ID) for episodes](https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA): write up test cases (examples of data gaps) > what satisfies all our test cases? + +* rss feed without episode guids +* rss feed with 2 duplicate guids +* guid changes for a given episode in the rss feed +* ... + +Then make table. + +We should probably add a warning, reminding that these cannot be used as the only indices in database in multi-user environment (users have different playback positions). + +## Data + +1. The episode's **Podcast GUID** (most recent) +2. The episode's **GUID** (sent by the client if found in the RSS feed, or generated by the server if not): String (not necessarily GUID/URL formatted).` +4. A boolean **played** field / or a field(e.g. nested json) **state** containing information about the state this episode currently in (like played, in_queue, ignored, ...) + a. What is 'played' differs between clients (e.g. in AntennaPod you can set as played even if 20 seconds at end is skipped) + b. Interaction with other potential states? (e.g. 'ignored') E.g. 'notified' (to avoid getting notifications on multiple devices). Need a list of statuses (& combinations) to keep track of, and then see which options (boolean, integer, nested booleans, etc) are best. + c. Solution: define a set of states and explain those well +6. Liked/Favourited +7. A **Playback position** marker, updated by a PUT request +8. A **time_played** counter, containing the total amount of seconds this episode was played +9. A **timestamp** of the last time the episode was played/paused +10. To resolve sync conflicts: dedocated timestamp for each of the fields? Or single timestamp for whole episode. + a. Two timestamps: **last_played** (for conflict resolution on the playback position) and **metadata_changed** (for conflict resolution on all other episode information) + ~~b. One timestamp for everything~~ + ~~c. Separate timestamps for each field~~ [too complicated] +11. Episode length? (gpodder.net had this) TBD (cases with media files shorter like 30 sec when abroad, or when media files have ads removed after x-thousand downloads because podcaster gets paid only for first 10k) +12. Any other markers (e.g. bookmarked playback positions; timed annotations) +13. Ratings/Reviews (probably better as separate endpoint, referencing the episode) + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2023-08-01.md b/meeting-notes/2023-08-01.md new file mode 100644 index 0000000..da933bf --- /dev/null +++ b/meeting-notes/2023-08-01.md @@ -0,0 +1,208 @@ +2023-08-01 20:00 +=== +:::info +***Next meeting: 2023-09-05 20:00 CEST (8pm)*** +::: + +Present: + +## API Versioning +https://github.com/orgs/OpenPodcastAPI/discussions/11 + +Options: +* Nodeinfo (created by Pleroma): https://docs-develop.pleroma.social/backend/development/API/nodeinfo/ +* Specs versioning (to maintain backwards compatibility) + +Probably both needed at the same time; even if you're on the same API version your server might not support the same feature set as the client. + +--> How do we version our specs? At level of whole specs, or at level of individual calls/endpoints? +--> Spec version increases only for core endpoints, or optional ones as well? + +Spec version increment with any change (e.g. point-increase in case of changes to/added optional end-points). But the versions in the calls only have the major versions. + +* Minor change: only add or deprecate (you should use this version instead, but here is your requested info) +* Major change: drop endpoints/calls, or change the way info is returned. Addition of new core endpoints should also be considered a major change + +**Note to the future**: if a server supports two different major versions, the data coming in through both endpoints should be compatible on database-level. We will have to discuss how this may affect what we are allowed to change in major-version-changes. + +## Endpoints + +- `${base_address}/versions`: contains only the version number(s) (x.x.x) +- `${base_address}/v1/capabilities`: contains a list of features and their status/related settings + +(So on base address, not via .well-known, because the latter is not necessarily present in all servers.) + +### Versions endpoint +```json +{ + "versions": [ + "v1.1", // translates to /v1/ + "v2.0", // translates to /v2/ + ], +} +``` + +### Capabilities endpoint + +Array version: +```json +{ + "capabilities": [ + "queueSync", + "storeEpisodes" + ], + "settings": [ + { + "name": "queuSync", + "metadata": { + "maxQueues": 5 + } + }, + { + "name": "subscription_settings_sync", + } + ] +} +``` + +Array of objects version: +```json +{ + "capabilities": [ + { + "name": "queue_sync", + "maxQueues": 5 + }, + { + "name": "storeepisodes", + "uploadQuota": 500_000_000 + }, + { + "name": "subscription_settings_sync", + } + ] +} +``` + +Nested objects version: +```json +{ + "capabilities": { + "queue_sync": { + "maxQueues": 5 + }, + "storeShows": {}, + "storeEpisodes": { + "uploadQuota": 500 + } + } +} +``` + +For magical-spell-arguments, **nested objects** are determined better. + +--> If a server has disabled a given feature (e.g. I have my instance of PodWire which in principle supports multiple queues but I have disabled this), it can simply be not returned in the capabilities end-point (no need to know why it's not available). + +Inspired from example Nodeinfo response: + +```json +{ + "version": "string", + "software": { + "name": "string", + "version": "string" + }, + "protocols": [ + "string" + ], + "services": { + "inbound": [], + "outbound": [] + }, + "openRegistrations": true, + "usage": { + "users": { + "total": 0, + "activeHalfyear": 0, + "activeMonth": 0 + } + }, + "metadata": { + "actorId": "string", + "private": true, + "shortDescription": "string", + "longDescription": "string", + "rules": "string", + "contactEmail": "string", + "terms": "string", + "nodeName": "string", + "banner": "string", + "defaultUploadQuota": 0, + "library": { + "federationEnabled": true, + "anonymousCanListen": true, + "tracks": { + "total": 0 + }, + "artists": { + "total": 0 + }, + "albums": { + "total": 0 + }, + "music": { + "hours": 0 + } + }, + "supportedUploadExtensions": [ + "string" + ], + "allowList": { + "enabled": true, + "domains": [ + "string" + ] + }, + "reportTypes": [ + { + "type": "string", + "label": "string", + "anonymous": true + } + ], + "funkwhaleSupportMessageEnabled": true, + "instanceSupportMessage": "string", + "endpoints": { + "knownNodes": "string", + "channels": "string", + "libraries": "string" + }, + "usage": { + "favorites": { + "tracks": { + "total": 0 + } + }, + "listenings": { + "total": 0 + }, + "downloads": { + "total": 0 + } + } + } +} +``` + +We wouldn't implement full NodeInfo end-point, but could follow a similar approach. E.g. provide all server info together. + +Changeability of the returned info: + + +### Packaging + +Ideally create a new library for each version. Schema file can be used to crete these through CI. Each (client) project would have only one library (SDK) at a time, with version info for each of the calls, such that a warning is shown if a deprecated call is used and the client software can dynamically determine which calls/parameters should be used/expected. + +We should keep track for each call from which version it is available, because client needs to know this info. + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2023-09-05.md b/meeting-notes/2023-09-05.md new file mode 100644 index 0000000..9fe11fd --- /dev/null +++ b/meeting-notes/2023-09-05.md @@ -0,0 +1,75 @@ +2023-09-05 20:00 +=== +:::info +***Next meeting: 2023-10-{04,05} 20:00 CEST (8pm)*** +::: + +# Netlify / Starlight migration PR +- Netlify is only free for open source projects if we follow certain criteria ([Netlify open source policy](https://www.netlify.com/legal/open-source-policy/)) +- use Funkwhale's Code of Conduct (https://funkwhale.audio/en_US/code-of-conduct) +- use CC BY-SA as license for the spec (https://creativecommons.org/licenses/by-sa/4.0/) +- https://github.com/expressive-code/expressive-code + +# Versioning endpoint PR + +- Change `/{major_version}` to `v1` for all endpoint for clarity in the docs. +- `/versions` and `/v1/capabilities` will be kept as separate endpoints, as currently foreseen. +- `/versions` endpoint will not include the URL of the endpoint. +- Change the json response of `/versions` to return a `versions` object containing an array (in line with XML) +- Change the response of `/versions` to include minor versions as separate info, to avoid string parsing by the client. (Patch versions are not included here because they're not behavioral changes, thus not needed to know by the server or client.) +```json +{ + "versions": [ + "v1": { "minor": 6 }, + "v2": { "minor": 0 } + ] +} +``` + +We'll bake in major versions in urls. + +```http +GET https://openpodcastapi.com/v1/subscriptions +``` + +If we have a new major version, but we have an endpoint that didn't change, e.g. `/v2` in the back-end can simply redirect `/v1` code. + +# Episodes endpoint +- [State of episode identification (notes of 2023-05-30)](https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA#Episode-identification) + +### Identification methods comparison +We talked about such a comparison [here](https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA#Episode-identification) +- **Static fetch-hash**: (doesn't need to be a hash?) once-calculated hash/identificator for an episode that is then used for identification. If the episode metadata changes, the fetch-hash will remain the same (unless a client decides to 'duplicate' an episode if it considers it is too different??). +- **Dynamic fetch-hash**: hash that is calculated on-the-fly following a specific algorithm so that it is the same for each client/server. If the episode metadata changes, the fetch-hash will change as well. +- **GUID**: use the GUID field, if an episode doesn't have one, generate it + +| test case ✖✔ | 🎉Static fetch-hash / generated GUID🎉 | Dynamic fetch-hash | GUID from the feed | +| :--- | :---: | :---: | :---: | +| **rss feed without episode guids**
IDs need to be generated once | ✔ | ✔ | ✔ | +| **rss feed with 2 duplicate guids**
~~We would see them as the same episode in each case - but we'd accept this (consider them effectively the same episde, and sync changes between the 'two' episodes)~~ The client can decide whether the episodes are the same or whether they are different - if they are different, GUIDs from the feed cannot be used!! | ✔ | ✔ | ✖ | +| **guid changes for a given episode in the rss feed**
Deduplication endpoint necessary, deduplicate client-side? | ✔ | ✖ | ✔ | + +* We have to accept that a) an episode may not have a GUID and b) the GUID can contain anything/any type of character. +* Our choice of input fields for a fetch-hash and which metadata changes, affects + * dynamic fetch-hash --> always different + * static fetch-hash --> are different if + * two clients independently fetch RSS feeds, catch episodes locally & calculate fetch-hash, but information affecting fetch-hash has changed in the meantime + * single client has deduplication logic and determines that episode is not known but new + +* Sync first, post later. This is to avoid 'race condition' +* We want to push identification/deduplication to clients as much as possible. + * If we are more precise for episode identification at server level, we are more likely to have 'different' episodes (e.g. in case of duplicate GUIDs; custom fetch-hash will lead to separate IDs). + * In this situation, clients can make a decision, using the `/deduplication` endpoint. +* **clear 'winner'** in the table: static ~~fetch-hash~~ **fetch-GUID** (we want the format to be a GUID, not a hash). +* How do clients match episodes in the local feed/database, with the episodes from the server? + * See https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA#Fetch-hash --> "Good practice/required: store all 3" + Matching proposal in pseudo-code. + * If we were to use a generated hash to fetch information, this can change at any time and we would need to perform matching client-side to make sure the client has the fetch hash as well. (In that sense a GUID is equally unreliable as a hash.) For this reason, it's better to use the GUID format to sync as it's more standard. The client will still need to perform matching of information such as link, enclosure, media, podcast GUID to assign the fetch GUID in the payload to the local entry. + * Who generates the fetch-GUIDs? Same as with subscriptions - clients CAN generate and send them to the server. But they can also leave the field empty and the server will generate and respond with it. +The client CAN (?) put a temporary identifier into the request (e.g. the local DB index) that the server will reflect so that the client can match the episodes from the server response more easily. + * Matching doesn't need to be done server-side, if clients respect fetch first post later. + + +# (Authentication) + + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2023-10-03.md b/meeting-notes/2023-10-03.md new file mode 100644 index 0000000..a781a6b --- /dev/null +++ b/meeting-notes/2023-10-03.md @@ -0,0 +1,24 @@ +2023-10-03 20:00 +=== +:::info +***Next meeting: will be planned with Doodle/...*** +::: + +# NLnet funding +OpenPodcastAPI has been selected for round two, but they sent us a number of questions. We will discuss them in a separate HedgeDoc document. + +# Netlify / Starlight migration PR +Our Netlify Open Source plan request got rejected, probably because we were missing some of their [criteria](https://www.netlify.com/legal/open-source-policy/). +We will implement all of their criteria, e.g. a link to Netlify at the bottom of the project's homepage, and try again. + +# Episode endpoint +- Episode identification is done by randomly generated UUIDs/GUIDs per episode +- Clients have to pull from the server before pushing, so that clients will have to do deduplication and episode mapping + +# Subscriptions endpoint/Server implementation +While working on subscriptions endpoint in php (Nextcloud) server, it appeared a bit of a pain to implement support for XML. Cutting it could save us quite some time. TBC. + +# (Authentication) +Expect discussions on scopes (e.g. enough to have 'core read' and 'core write'?). + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2024-02-27.md b/meeting-notes/2024-02-27.md new file mode 100644 index 0000000..f9592fe --- /dev/null +++ b/meeting-notes/2024-02-27.md @@ -0,0 +1,69 @@ +2024-02-27 12:30 +=== + +## Next meetings +- Maintaining a WebCal calendar of meetings sounds useful, so that people can see the meetings in their calendars +:::info +Meet two times per month, to speed it up: +* **First week**: any day, any time (evening?) (TBC) +* **Third week**: Tuesdays 12:30 +::: + + +## Episodes +:::info +**Prior meetings to episodes** +2023-05-30 Episode fields: https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA# +2023-07-11 Episode fields: https://pad.funkwhale.audio/FkuIqtPGT-ynYKqBieHffw# +2023-09-05 Identification: https://pad.funkwhale.audio/s/T-yx14DsH# +::: + +Kasts uses subset of gpodder.net API + +### Additional data point notices (to the ones defined [here](https://pad.funkwhale.audio/FkuIqtPGT-ynYKqBieHffw)) + +* **playback position** (used for multiple purposes) +* **episode status** (played or not): inferred from playback position (if close to end of episode). workarounds in place if episode is marked as played. +* **new status**: would link with 'Inbox' for AntennaPod. Need a clear description of what that status means. AntennaPod: any manual action (play, download, swipe out of Inbox) - Kasts more or less similar. Can base definition on 'manual action by user'. But we should also be careful with being too strict/opiniated, so the definition works also for other projects. + * episode/new status combinable? (mutual exclusive) + * exchange as bunch of booleans (new Y/N; played Y/N; archived Y/N) or as an array, or a string. If mutually exclusive (across all clients/players), it can be an ENUM (set of pre-defined strings). + * Object rather than array. Should cover 'technical' (metadata status), e.g. bookmark is rather about the content, so should not go in, while download status should go in this Object. +* **bookmarks/favourites**: can submit, but would have to currently go through each individual episode, so not implemented + * currently implemented as boolean, but feature request for labels. Could be 'tags' for episodes. Something potentially down the road. + * should be not-core? Endpoint is core and call as well, should we define core/optional at field level? +* **download status**: also covered in gPodder, but bug in implementation: fails to update metadata (will not check timestamp and ignore if value exists already). If data synced, then download to be executed immediately? Use more as a flag of intent (want/should be downloaded) rather than an actual current status. We expect that clients will have option to turn off. Specify expectation in docs: leave open for clients if they want to download immediately, download later, or never download if not possible/desired to download when this data is received. However: which client then takes precedence? (If not downloaded, sync new status? Requires a download status sync setting at client level.) + * Proposal: make opt-in data, expect client to respect download status if enabled. Delayed download (e.g. wait for WiFi) is permitted. +* **episode length/duration**: sometimes declared length is different from actual length (diff of couple of seconds is common). Also many feeds don't have length. Needed for the user? Probably not - can be taken from file itself. + +### General considerations to data points +* bulk sync of episodes (playback state, bookmark/favourite): will have 'all changes since' so we're good there +* *Core/Optional*: + * Currently endpoints and calls can be core. What about field level? + * Can declare action as core, but identify optional-for-client field/data (expect server to have all fields, in order to pass on the data). + * Need a way to identify this at field level in specs. TBC. + * E.g. RSS reader would not be 'OPA-compliant' as it cannot store playback position, but it could still kinda work. +* **Timestamps**: + * One timestamp per field unnecessary overhead on the server side? + * Exceptions: dedicated timestamp for last_played. Could introduce timestamps for all other 'sensitive' fields (e.g. favourite). Are there fields that are non-critical? E.g. 'new' status could cause trouble also + * Concurrency in REST APIs: + * before you come online get most recent data, and/or + * only change this if it has not been modified since this date + * then do conflict resolution at client side + * typical situations for nasty things. + * first client, do things un-synced (e.g. offline) + * second client, do things with sync + * go back to first client, go online & pull updated data + * gPodder approach: not sync current state, but sync all changes since then including their timestamp. Having full history is really expensive, and doesn't provide benefit over timestamp per field. + * **Agreement**: Having **timestamp per field** helps, but might still cause issues when clients don't store timestamps of local changes (rather than submitting timestamp upon sync). We can include in specs: SHOULD, but responsibility of client, or MUST have one (for OPA-complience). + * if the client says 'fetch me data for this episode changed since x', then the server can just provide that info (no additional data being sent over) + * **TODO**: criticially examine which of the fields need a timestamp. Think of naming scheme for timestamp fields & JSON structure for endpoints. + + +### Episode identification +* We have an [idea/approach](https://pad.funkwhale.audio/s/T-yx14DsH#Episodes-endpoint), but need to confirm this for Kasts & AntennaPod. We'll try to make a meeting happen with Hans-Peter & Bart + +## Website +* We can have openAPI pages in starlight +* Ciarán will check for email with Funkwhale SteerCom (regarding CoC) + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2024-04-16.md b/meeting-notes/2024-04-16.md new file mode 100644 index 0000000..f61c916 --- /dev/null +++ b/meeting-notes/2024-04-16.md @@ -0,0 +1,106 @@ +2024-04-16 12:30 +=== + +:::info +***Next meeting: 2024-04-24 21:00 CET (9pm)*** +::: + + +## Subscriptions endpoint +Difficult scenario, by Hans-Peter (ByteHamster): +> What happens if client says 'new_feed_URL' is xyz, and the server already has another podcast in the database that also has URL xyz. This may happen if client A supports syntax, and there is client B that doesnt. Client A submits to server new podcast subscription with the _new_ URL. Client B doesn't support it, so would create a subscription with the old URL. Now client B gets an software update, and then supports the redirect, and then tells the server there's new_feed_URL from old to new feed. Then the server has already two feeds and receives a feed that conflicts. +> Similarly: some client follows temporary redirects while another client follows permanent redirect. + +* Subscription conflicts. Merge/deduplicate on client or on server side? + * if `new_feed_url` is received, check if this already belongs to different podcast + * server needs to have merge/migration process. (could add a description to the 'update subscription' endpoint: "if feed is already known in system") + * should be a background job, as merging process takes a while + * recommendation for server: + * response format where client can request any changes since timestamp, where API replies with tombstone / resource has moved, and client updates the guid in their database + * we agree to have timestamps at data point level, in order to support merging. +* Specification already covers 'GUID update' (but that doesn't cover merge conflicts) +* Currently each old URL inform is a single PATCH, so if there's a chain of redirects, that'd be a bunch of calls required. + * We expect the clients to _always_ inform the server of past URLS + +`/subscriptions/{guid}/episodes` -> 301 HTTP resource moved permanently -> `/subscriptions/{new_guid}/episodes` -> clients SHOULD update guid + +### Data timestamps +:::info +Following up on https://pad.funkwhale.audio/s/88C5eXrRq#General-considerations-to-data-points +::: + +* API should always return timestamp with every call, to support merging (discussed above) + + +```json +{ + "total": 2, + "page": 1, + "per_page": 5, + "subscriptions": [ + { + "feed_url": "https://example.com/rss1", + "guid": "31740ac6-e39d-49cd-9179-634bcecf4143", + "is_subscribed": true, + "guid_changed": "2022-09-21T10:25:32.411Z", + "new_guid": "8d1f8f09-4f50-4327-9a63-639bfb1cbd98" + }, + { + "feed_url": "https://example.com/rss2", + "guid": "968cb508-803c-493c-8ff2-9e397dadb83c", + "is_subscribed": false, + "subscription_changed": "2022-04-24T17:53:21.573Z", + "deleted": "2022-04-24T17:53:21.573Z" + } + ] +} +``` + +```json +{ + "total": 2, + "page": 1, + "per_page": 5, + "subscriptions": [ + "8d1f8f09-4f50-4327-9a63-639bfb1cbd98": { + "feed_url": "https://example.com/rss1", + "guid": { + "old_value": "31740ac6-e39d-49cd-9179-634bcecf4143", + "value": "8d1f8f09-4f50-4327-9a63-639bfb1cbd98", + "changed_at": "2024-01-01T10:25:32.411Z" + }, + "is_subscribed": { + "value": true, + "changed_at": "20240101T10:25:32.411Z" + } + }, + "968cb508-803c-493c-8ff2-9e397dadb83c": { + "feed_url": "https://example.com/rss2", + "guid": "968cb508-803c-493c-8ff2-9e397dadb83c", + "is_subscribed": false, + "subscription_changed": "2022-04-24T17:53:21.573Z", + "deleted": "2022-04-24T17:53:21.573Z" + } + ] +} +``` + +* `old_value`: only in case of feed URL and guid (at subscription & episode level) +* `value` +* `changed_at` + +## Episode GUIDs +Maybe not needed to always generate, as other fields will need to be sent anyway for identification/matching of new episodes with the RSS feed. At the same time, this matching is only needed for new episodes, subsequent data exchanges could rely on simple guid. + +For AntennaPod, the extra space needed is worth it helps to prevent many edge cases that we'd have without the guid. + +## For next meeting +### Episode deduplication +From notes of meeting with Hans-Peter (AntennaPod): +> As AntennaPod, we'd be hesitant to accept incoming deduplication instructions. Because if an episode disappears in AntennaPod, we would get the comments from the user. Do we trust other clients? 'User listened to episode X' - that's so basic that we can more easily trust. Just accepting anything that the server does is a bit scary. A misbehaving client could render AntennaPod completely unusable, e.g. by merging all episodes into 1. +> +> What do we do in AntennaPod if there's a deduplication request coming in that we would ignore? Since the server & other clients already think the episode is gone, would we send a 'new episode' to the server? We could ask the user to go ahead with merge/deduplication or not. If rejected, we create a new episode. Then a "don't deduplicate flag" is applied to the episode. +> +> Alternatively, send deduplication as strong recommendation. If ignored, keep synchronising with the old episode - so server always has 2 episodes. Then in the UI of one of the clients there is one more +> +> Alternatively, we could keep duplicates on the server and work with 'tombstones' (rather: episode with deduplication flag with reference to the new one). Then sends back to the client: I think you want to use this other one instead, but if you insist you can use this deduplicated episode (but know that it won't propagate to other clients). Then two clients that agree while a third client has another opinion, the two can still use the 'tombstone'. And there's nothing to be deleted if it doesn't want to delete it. \ No newline at end of file diff --git a/meeting-notes/2024-04-24.md b/meeting-notes/2024-04-24.md new file mode 100644 index 0000000..5f75668 --- /dev/null +++ b/meeting-notes/2024-04-24.md @@ -0,0 +1,91 @@ +2024-04-24 21:00 (9pm) +=== +:::info +***Next meeting: 2024-05-06 18:00 CET (6pm)*** +::: +# Episode identification +:::info +**Current state: https://pad.funkwhale.audio/s/T-yx14DsH#Episodes-endpoint** +**Last meeting: https://pad.funkwhale.audio/s/6mWuDexgz#Episode-GUIDs** +::: + +| test case ✖✔ | 🎉Static fetch-hash / generated GUID🎉 | Dynamic fetch-hash | GUID from the feed | +| :--- | :---: | :---: | :---: | +| **rss feed without episode guids (old)**
IDs need to be generated once | ✔ | ✔ | ✔ | +| **rss feed with 2 duplicate guids (old)**
~~We would see them as the same episode in each case - but we'd accept this (consider them effectively the same episde, and sync changes between the 'two' episodes)~~ The client can decide whether the episodes are the same or whether they are different - if they are different, GUIDs from the feed cannot be used!! | ✔ | ✔ | ✖ | +| **guid changes for a given episode in the rss feed (old)**
Deduplication endpoint necessary, deduplicate client-side? | ✔ | ✖ | ✔ | + +### New test case 1 +* listen to episode +* phone dies +* podcast publisher changes episode guid +* we get a new phone & connect to the database again; it gets everything from the server + +--> server doesn't know of new guid, and phone doesn't know of old one +Here we'd get a duplicate. When the client sends the 'create episode' call to the server, the server recognises the possible duplicate and +1. creates the episode +2. responds to the client with a question: do you want to deduplicate? + +[Discussion about deduplication from 2023-05-30](https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA#Deduplication) + +What's the expectation regarding the guids to align? We have discussed a deduplciation endpoint, this would be a good use-case for that. + +How can client know how to deduplicate, if it doesn't receive any metadata from the server, only a guid? Impossible to compare with RSS feed if it has nothing to compare with. +AP: title, if mostly similar then check media type, date & duration. If those are the same as well, then we identify as duplicate. But title usually the most useful one; media file URL (enclosure) will most likely change. + +What data should the server store for deduplication? +1. Title +2. Release date +3. Enclosure URL? +4. Duration? + +Other question, for future: which fields should be checked for similarity (when the client synchronises) - to avoid noise. + +### New test case 2 +* podcaster publishes both DE & EN episodes. In 3 feeds: EN-only, DE-only, mixed. Episode guids created as different ones between +* What if you switch from DE-only to EN-only feed? + +Deduplication MUST always only happen within a single feed, not across feeds. If the feed URL changes, it doesn't necessarily mean that the feed's GUID changes. +AP: +* if podcast publisher knows what they're doing and has redirect, episode GUIDs remain the same +* if not: + * if episode guid in the feed is still the same, we keep as is + * if episode guids in feed are different, we would consider them as 'new' episodes and execute deduplication + +### New test case 3 +If you switch, within same feed/subscription, from type a (e.g. video) to type b (e.g. audio). + +### Question about how server stores episode data: +If you merge episodes, this is per-user action. This means that there is duplication between users? Indeed. Then extra episode metadata +It would affect storage space. Database structuring can be done smartly; e.g. have single table per user with all data, or split the data between multiple tables. As long as API specs are respected, server can do whatever. +E.g. single-user server would not separate this all, if you're doing a multi-user database you'll want to be smarter. If in the latter case, deduplication might create a single new episode that's only used by one user. There's multiple ways to do it. + +We can put a preamble recommendation to server implementers to do + +### Who is responsible for calculating/creating the spec episode GUID? +* [2023-07-11 episode data fields - server can calculate GUID](https://pad.funkwhale.audio/FkuIqtPGT-ynYKqBieHffw#Data) +* have to specify: has to be 'globally' unique within a feed +* probably server should when receiving instructions from client to create new episode or patch one, and it sends an updated GUID, it should check it is actually unique within the feed +* We should establish clearly separate naming; separate from feed GUID. Proposal: **sync-UUID** + +Conclusion:(?) we want to keep a separate sync guid? Yes, because: +* when clients have different deduplication standards +* it can help multi-user servers to have single episode entries and use guid for sync. (Although if allow the client to generate sync-UUIDs, they would become unique across the server.) + +### Question: informing server if client finds new episode +Really needed to tell server about episode as soon as it was found? Or can it be only when there's a meaningful action (e.g. play). That would save many calls to the server, and sync-IDs that have to be stored in the client. +Potential problem: if episode was played already on client A, and then client pu +-> Solution: allow client to request server to provide 'changed since' data. + +Should we support feeds who don't keep historical data? E.g. episodes that are no longer in feed but already +'Downloaded' status should also be considered an action that acts as trigger to inform the server of the episode. + +Conclusion: +* don't push all episodes to sync server +* if client syncs episode timestamp, but doesn't know the episode sync-ID yet, then client needs a way to request additional information about the episode to locally match received sync ID and local episodes +* should have 'native' calls "I played episode but don't have a sync ID yet" and "I played episode with this sync ID" +* **"lazy synchronization"**: only sync episode data after interaction with episode -> sync-ID fields stay empty until interaction happens + + + + diff --git a/meeting-notes/2024-05-06.md b/meeting-notes/2024-05-06.md new file mode 100644 index 0000000..6c0da11 --- /dev/null +++ b/meeting-notes/2024-05-06.md @@ -0,0 +1,80 @@ +2024-05-06 18:00 (6pm) +=== +:::info +***Next meeting: 2024-05-21 12:30 CET (12:30pm)*** +::: + +Netlify has accepted our open source plan request! Astro site is deployed! +It is deployed at: https://openpodcastapi.netlify.app/ and https://openpodcastapi.org + +*** + +### Episode identification (sync-ID) +Last meeting notes: https://pad.funkwhale.audio/kIRwEOYDRNqTA4np6vbBVg# + + +| Test case | Comments | Test case OK? ✖✔ | +| :--- | :--- | :---: | +| **rss feed without episode guids (old)**
IDs need to be generated once | | | +| **server doesn’t know of new sync-ID, and phone doesn’t know of old one**
Old phone dies after listening, publisher changes GUID, new phone gets new GUID from feed and then synchronises with sync server.
| We need a mechanism to deduplicate episdes. | | +| **new client connects to server**
Client doesn't yet have sync-IDs of the episodes it finds in the RSS feed. | We need a matching mechanism. If GUID is podcast-unique it's easy. But we need a fall-back/priority list (helpful across all scenarios). | | +| **Same feed has identical enclosure URL as an existing episode**
Podcaster goes on holiday and republishes existing episode (although often they'd record a new intro). | | | +| | | | + +Episode deduplication in Podverse: at beginning of parsing, compare to already known episodes; a) hide items that are no longer in the feed & b) add items that are new in the feed; after a while, clean up (delete) episodes. +Episode matching: +* guid (coverage seems to have improved a lot over time), icw identifier of the podcast +* enclosure URL as fallback (which is problematic as different feeds can point to same episode) + +Approach: deletion of episode from feed seen as author wants episode to be deleted from the internet. Exception: clips - linked episodes are kept as 'hidden' rows. + +#### Fallback episode matching + +Podverse has been cleaning the dataset and dropping 'Title' from episode data. However, it could still act as 'second class' citizen. + +Enclosure URL might be a better candidate. + +https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA#Fetch-hash: +1. GUID +2. Enclosure URL +3. Episode URL + +However, this wouldn't consider Hans-Peter's comments about title & date being helpful. + +Also: guideline or hard spec? Former; we only need to specify which data is mandatory for episode matching/deduplication. + +Is there a difference between 'episode matching' (assigning a sync-ID at client level) and deduplication? + +Waterfall options: +* If step 1 (GUID) fails, check for step 2 etc +* If step 1 (GUID) fails, consider a new episode & continue, then at any later time let client use the deduplication mechanism and -endpoint to deduplicate + +When & where is this waterfall run? +* At client side when refreshing feed. +* At server side when receiving 'new episode' from client. + +1. GUID +2. Enclosure URL +3. Matching of at least 2 out of 3 relevant fields: + 1. Publishing Date + 2. Episode link + 3. Title + +If there's no match, go to the next step. If there's more than 1 match, take only those matches to the next stage. + +If you fall through the entire waterfall, you assume duplicates, which then need to be flagged as such (see deduplication end-point). + +#### How to proceed + +We haven't reviewed all edge-cases; table is incomplete. We accept, however, that there will be edge-cases (with data loss) and get back to the edge-cases (completing the table) when we discuss deduplication endpoint. We can focus on the 99%, for episode matching based on waterfall. + +### Episode endpoint roadmap + +Current state of data fields: https://pad.funkwhale.audio/s/88C5eXrRq + +> TODO: criticially examine which of the fields need a timestamp. Think of naming scheme for timestamp fields & JSON structure for endpoints. + +Collate the information: +* Which fields need timestamps + proposal: https://pad.funkwhale.audio/s/6mWuDexgz#Data-timestamps +* Actions (calls) under the episode endpoint: https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA# +* Deduplication endpoint already necessary? (Not a high priority; versioning & authentication to be done first). diff --git a/meeting-notes/2024-05-28.md b/meeting-notes/2024-05-28.md new file mode 100644 index 0000000..72ea960 --- /dev/null +++ b/meeting-notes/2024-05-28.md @@ -0,0 +1,90 @@ +2024-05-28 12:00 (12pm) +=== +:::info +***Next meeting: 2024-06-11 12:00 CEST (12:00pm)*** +::: +## Episode endpoints assembly +### Endpoints +- `GET /v1/episodes` +- `POST /v1/episodes` - max. 50 entries per call +- `GET /v1/subscriptions/{guid}/episodes` +- `GET /v1/subscriptions/{guid}/episodes/{sync-id}` +- `PATCH /v1/subscriptions/{guid}/episodes/{sync-id}` + +#### Pagination +Already defined for [subscription endpoint](https://openpodcastapi.org/specs/subscriptions/get-all/). + +:::info +We need to check if we should **change to cursor pagination** (for all endpoints). Probably better, as it defines set results, not impacted by GET and POST requests being processed simultaneously (in which case regular pagination changed). +::: + +#### Batching; moving PUT under `/episodes`? +Key use case: moving from one server to another. + +Three options: +1. end-point accepts only array, with immediate response + + +3. submit batch, get back batch ID, checks status for batch ID +Batch limit would be good. 50 could be a good limit, still allowing server to reply near-immediate. This is, however, determined by the server implementor. +We can implement in API standard approach: POST just for 1, batch (POST with array) for up to 50. + +3. all as single transactions +Single transactions most REST compliant. But performance unkown. To be checked. + +More calls, but web calls are cheap & efficient. Batching on server side still possible. Batching on client side bit more tricky; what if part of the batch fails (e.g. 999 OK, 1 fail). Roll back everything? Status reporting in batch reply? +Key question: where is the load issue handled. + +*Conclusion*: We agreed on **batch endpoints with a batch limit (option 1)**. We limit the batch size to **50** by default, this can potentially be handled out with the capabilities. The batch limit ensures that the server can respond in a reasonable time. + +:::info +We need to **update the subscriptions endpoint** with this max batch size. +::: +:::info +We need to **update the capabilities endpoint** also with this info about non-default values. +::: + +#### Deletion +1. Remove (playback) history/stats: PATCH +2. Delete podcast: deletion gets cascaded to episodes on server side +3. Delete episode: not supported + +#### Deduplication *(not a high priority)* +[...] + +#### `/subscriptions/{guid}/episodes` +Same as `/episodes`, but 'filtered' for the subscription collection; returning that sub-set. + +### Data fields +#### GET episode information +- *REQUEST* + - **since** - ISO8601 + - page (TO UPDATE) + - per-page (TO UPDATE) +- *RESPONSE* + - Identifier + Which identifiers to send? Always all, or only main ones? If client B is not aware yet of an episode identified by client A, then how does the sever know it needs to send all identifiers rather than basic ones only to client B? It can't, would require a 'conversation'. Only a few bytes => **we'll always send everything**, maybe include parameters for selecting fields later. + - **Podcast GUID** + - **Episode sync-id** + - Episode GUID (from RSS) + - Title + - Publish date + - Enclosure URL + - Episode URL + - (Duration?) + - Status fields + One timestamp per field to keep track of changes + - **Playback position** + - **Episode status** + - **New status** + - **Download status**? + - **Bookmark/favorite**? +#### POST/PATCH episode information +- *REQUEST* +- *RESPONSE* + +:::info +Next steps: +* Ciaran will do a draft PR +* We'll reconvene Tuesday 11/06 12/12:30 CEST. +::: \ No newline at end of file diff --git a/meeting-notes/2024-06-19.md b/meeting-notes/2024-06-19.md new file mode 100644 index 0000000..9b41740 --- /dev/null +++ b/meeting-notes/2024-06-19.md @@ -0,0 +1,88 @@ +2024-06-19 15:00 (3pm) +=== +## Episode endpoint +### Data fields +#### GET episode information +- *REQUEST* + - **since** - ISO8601 + - page (TO UPDATE) + - per-page (TO UPDATE) +- *RESPONSE* + - Identifier fields + Which identifiers to send? Always all, or only main ones? If client B is not aware yet of an episode identified by client A, then how does the server know it needs to send all identifiers rather than basic ones only to client B? It can't, would require a 'conversation'. Only a few bytes => **we'll always send everything**, maybe include parameters for selecting fields later. + - **Podcast GUID** + - **Episode sync-id** + - Episode GUID (from RSS) + - Title + - Publish date + - Enclosure URL + - Episode URL + Do we need timestamp here still? No. RSS = source of truth so clients should check that for changes. Not needed to know when changes happen to prevent data loss. + - ~~Duration~~ - maybe keep for unlikely edge cases, but probably not useful (not all RSS feeds include this, and values likely to differ between RSS feed and actual media file) + - Data fields + One timestamp per field to keep track of changes. + NOTE: if statuses are mutually exclusive, then we trust whoever applies a status change to update other statuses accordingly. E.g. if user manually downloads an episode, both Download status and New status MUST be updated. + - **Playback position** - in seconds + - **~~Episode~~Played status** - boolean + - **New status** - boolean, indicates that no (user?) interactions took place with a given episode, Inbox in AntennaPod + - **Download status** - boolean, make opt-in data, expect client to respect download status if enabled. Delayed download (e.g. wait for WiFi) is permitted. ([discussion details](https://pad.funkwhale.audio/s/88C5eXrRq)) + - **Favorite status** - boolean + - Potential future fields: + - Bookmark - request in AntennaPod for bookmark with timestamp & notes + - Tags - request in Kasts for tags for episodes + +Options for formatting status fields (affecting both GET and PUT requests, latter in case you sync after your holiday in the desert): +```json +{ + 'playback_position': { + 'value': 15, + 'timestamp': 2024-06-19T15:46 + } +} +``` +```json +{ + 'playback_position': 15, + 'playback_position.changed': 2024-06-19T15:46 +} +``` +Important: this MUST/SHOULD (?) be the timestamp of the actual data change, not the timestamp of sending by client or processing by the server. Reason: when client is offline for a long time, then these changes should not overwrite more changes in other clients which are done more recently but synchronised after. +:::info +This is not needed if client always first pulls before push. Assuming that the client stores the timestamps of these changes locally. Maybe we should note this as a requirement, rather than submitting the timestamps. +Kasts keeps log of changes and wipes on each sync, also keeping track of timestamp of latest sync. +::: + +#### Create or edit information of multiple episodes +`POST /v1/episodes` +- *REQUEST* - always as array + - if sync ID known: + - **Podcast GUID** + - **Episode sync-ID** + - **changed data fields** + - if no sync ID: + - **all identifier fields** (except sync-id) + - **temporary ID** [optional] arbitrary value that is returned by the server to make re-identification of episode easier, e.g. client's database key + - **changed data fields** +- *RESPONSE* + - Batch update + - success array + - failure array + - Fields for each episode in array: + - Podcast GUID + - Episode sync-ID + - all data fields + their timestamps where the server has newer information than the client (only relevant to failure array) + - message - reason for failure (only relevant to failure array) + - Fields for newly created episodes: + - all identifier fields - so that clients can map local episodes to sync-ids + OR + temporary ID if it was submitted + - all data fields + their timestamps where the server has newer information than the client + +#### Create or edit information of single episodes +`PATCH /v1/subscriptions/{guid}/episodes/{sync-id}` +- *REQUEST* + - **changed data fields** +- *RESPONSE* + - HTTP code says whether action worked + - all data fields + their timestamps where the server has newer information than the client + - failure message, if action failed \ No newline at end of file diff --git a/meeting-notes/2024-11-07.md b/meeting-notes/2024-11-07.md new file mode 100644 index 0000000..d75da82 --- /dev/null +++ b/meeting-notes/2024-11-07.md @@ -0,0 +1,144 @@ +2024-11-07 noon +=== + +## Versioning & capabilities + +Starting point of the discussion: +https://github.com/OpenPodcastAPI/api-specs/pull/50#discussion_r1779485473 + +Key question is whether we need minor versions in the implementation (or 'protocol') + +JMAP (replacement for IMAP) +When you authenticate, server always replies with 'these are functioalitis/namespaces that I provide' + +Our capabilities could be communicated in a similar way, in combination together with versioning. E.g. I support: + +* core v1 +* core v2 +* + +Solves deprication/removal problem. Server perspective: in most cases fine to support multiple versions. + +If the server changes, then it should close all open sessions. Then clients have to re-authenticate and learn about the new capabilities. + +Having a version- and capabilities endpoint still have a merit. + +## Versioning + +* Protocol: actual implementation by servers & clients +* Specification + +How frequently do we think core specifications will change? Each major version should go through collective feedback & testing by developers & users. (e.g. this 'optional' end-point is used by everyone so should be in core). + +Core endpoint changes should (aim to) only come in main versions. New or changed features can be 'optional' in + +Flow: +* new ideas implemented & tested (technical & user perspective) in beta +* no new changes to the spec, until we cut a new major version + +Therefore, from specs perspective, there is only pre- major version and major versions. + +Do we still need minor versions if you have the capabilities? + +Otherwise, if we have 7 minor versions, developers would need to keep up with these changes. Client devs then would have to implement a lot of if-then logic. + +## Capabilities + +```json +{ + "capabilities": { + "openpodcastapi::betav1", + "openpodcastapi::core", + "openpodcastapi::corev2", + "openpodcastapi::optional": [ + "coversync", + "queuesync" + ], + "openpodcastapi::optional::v2": [ + "coversync", + "queuesync" + ], + "othergroup::socialsharing" + } +} +``` + +```json +{ + "capabilities": { + "openpodcastapi::core", + "openpodcastapi::corev2", + "openpodcastapi::optional": [ + "coversyncv1", + "queuesyncv1" + ], + "openpodcastapi::optional": [ + "coversyncv2", + "queuesyncv2" + ] + } +} +``` + + +Here is a JMAP auth response with capabilities: + +```json +{ + "capabilities": { + "urn:ietf:params:jmap:core": { ... } + "urn:ietf:params:jmap:submission": {}, + "urn:ietf:params:jmap:mail": {} + }, + "accounts": { + "u2321401a": { + "name": "example@example.fm", + "isReadOnly": false, + "isArchiveUser": false, + "isPersonal": true, + "accountCapabilities": { + "urn:ietf:params:jmap:submission": { + "submissionExtensions": {}, + "maxDelayedSend": 44236800 + }, + "urn:ietf:params:jmap:core": {}, + "urn:ietf:params:jmap:mail": { + "emailQuerySortOptions": [ ... ] + "maxSizeMailboxName": 490, + "maxMailboxDepth": null, + "mayCreateTopLevelMailbox": true, + "maxMailboxesPerEmail": 1000, + "maxSizeAttachmentsPerEmail": 50000000 + } + }, + } + }, + "primaryAccounts": { + "urn:ietf:params:jmap:submission": "u2321401a", + "urn:ietf:params:jmap:core": "u2321401a", + "urn:ietf:params:jmap:mail": "u2321401a" + }, + "uploadUrl": "https://api.fastmail.com/jmap/upload/{accountId}/", + "eventSourceUrl": "https://api.fastmail.com/jmap/event/", + "downloadUrl": "https://www.fastmailusercontent.com/jmap/download/{accountId}/{blobId}/{name}?type={type}", + "apiUrl": "https://api.fastmail.com/jmap/api/", + "username": "example@example.fm" +} +``` + +namespace allows us to define capabilities, but also allows anyone external to create their own spec, and servers to commnicate that they support this at that point. + +## Beta/Testing versions + +namespace +openpodcastapi::opentional::podcast-cover::beta-v1 + +Server should still expose this. Clients & servers could support multiple beta versions at a time, or only one at a time. + +From specs perspective, we should rethink the capabilities endpoint. Redesign from namespace perspective. + +Should be communicated in authentication response. Capabilities endpoint could also be handy. That would enable a flow where user sets up sync, client checks capabilities, and exposes this in a nice way to the user. + + + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/2024-11-21.md b/meeting-notes/2024-11-21.md new file mode 100644 index 0000000..8425309 --- /dev/null +++ b/meeting-notes/2024-11-21.md @@ -0,0 +1,61 @@ +2024-11-21 noon +=== + +With keunes, cos, mogway & gcrk + +### Fundamental approach + +cos: Are we doing the right thing? Why take gPodder-approach, with central server. + +mogwai: Frequent request also for Kasts. Can have multiple clients talk to the same file, causing conflicts. Also, all clients then need to do the exact same thing. (Would work in a walled garden system, but in our open and varying ecosystem it's opening a can of worms.) + +This approach is proven - there are weak points, but that would be a massive undertaking (given that already the approach with a sanitizing server is taking us long). + +Could maybe be a follow-up once we've defined the API, to start building a shared data model (similar to calendar events). + +### Versioning & capabilities + +Building on [last meeting](https://pad.funkwhale.audio/wWQGg1xkS66QHPK3GBel-Q). Key question: how much complexity vs how much flexibility. Agreed approach to have fewer major versions. + +Having the version identifier as part of the version name. Problem: can't tell if version is bumpted or new implementation from the ground up (by someone else who wasn't able to think of a creative name). This is the feature/purpose, though. + +```json +{ + "capabilities": { + "openpodcastapi::betav1", + "openpodcastapi::core", + "openpodcastapi::corev2", + "openpodcastapi::optional": [ + "coversync", + "queuesync" + ], + "openpodcastapi::optional::v2": [ + "coversync", + "queuesync" + ], + "othergroup::socialsharing" + } +} +``` + +We want to keep all possible combinations of different versions. + +Scenarios to think about: +* If in core v1 we need queue-sync as optional, and then becomes part of core v2. In v1 optional must be advertised, in v2 it should not be advertised. How to deal with that? +* others? (we should think of some which we then can check) + +Having versioning for the optional endpoints adds complexity, but allows to move on with optional endpoints without changing/breaking core. + +#### Optional fields in core endpoints + +Server expected to handle these? Optionality applies to sender; + +* Server needs to implement optional fields, and have full support for it. +* Optionality, a discussion ongoing about two different contexts: + * clients' support (e.g. [episode](https://deploy-preview-95--openpodcastapi.netlify.app/specs/episodes/get-all/) `is_favorited`) + * makes sense to send or not (e.g. `next` page) +* We need to differentiate between these two. Maybe we should rename 'optional' endpoints to 'extension' endpoints. + * Can we _not_ specify for each field if it's optional in the sense of `next` page? No, we should specify this: to ensure that devs just won't send guids because they prefer not to. Also, many if statements needed on client side, to cover for optional fields. + * gcrk will create a PR to change 'optional' to 'extension'. Then once concluded, also to be changedin in the episodes endpoint PR. + +###### tags: `project-management` `meeting-notes` `OpenPodcastAPI` \ No newline at end of file diff --git a/meeting-notes/README.md b/meeting-notes/README.md new file mode 100644 index 0000000..bbdf618 --- /dev/null +++ b/meeting-notes/README.md @@ -0,0 +1,5 @@ +# Meeting Notes + +This directory contains meeting notes for the Open Podcast API project. They are static copies serving as back-up and reference point from the documentation. Their 'live' counterparts are (currently) listed in the ['Meeting notes' GitHub Discussion](https://github.com/orgs/OpenPodcastAPI/discussions/35). + +The files can be accessed via git, and added to GenAI tools with access to this GitHub repository. The files are not included in main website navigation by design. \ No newline at end of file