Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,3 @@ jobs:
uses: lycheeverse/lychee-action@v2.4.1
with:
fail: true

1 change: 1 addition & 0 deletions lychee.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
exclude_path = ["meeting-notes"]
41 changes: 41 additions & 0 deletions meeting-notes/2022-10-06.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
OpenPodcastSync API meeting notes
===

### Participating projects

podfriend, gpodder 4 Nextcloud, antennapod, kasts, funkwhale


### What are the problems we are trying to solve?

Try to get the big picture around the various issues.

Subscriptions.

Problems identified with gpodder:
- Multi-device support is confusing to users. Gpodder stores each device as an entity and allows you to link to devices to sync them. Users find this confusing and don't understand why content isn't synced properly across non-linked devices.
- This is only implemented for subscriptions, not for episodes. This inconsistency is confusing for users.
- The database often overflows due to a large dataset being stored. All actions are stored and never cleaned up, and all episode actions can only be stored once. E.g:
- If you listen to an episode once and then listen again, an action such as "new" is only sent once.
- Sending the exact same play position once cannot be stored twice.
- Duplicate episodes/subscriptions are an issue. They use the media URL as an identifier for an episode, but if the file changes due to reupload or something else this creates a brand new entry. Syncing these changes is difficult.
- User documentation is lacking. e.g.:
- If podcast creators change GUID and URL for an episode, there isn't an agreed-upon behavior for the API or for clients consuming the episodes.
- If an action is stored locally, and a conflicting action is received from the server at later stage; what happens on sync? Can take inspiration from listenbrains scrobbles.
- Subscription lists can duplicate due to URLs not being updated reliably.
- There is no agreed-upon way to handle updating URLs, and this is mostly being handled by clients
- We need to be able to synchronize a queue of episodes in the correct order between devices
- We need to handle multiple queues, and have graceful handling for syncing with clients/servers that cannot handle multiple queues

People would expect to find all their data, queues and progress to be synced accross all their apps, using a single online identity.
Howto handle when a server shuts down? Would we need some export/import features? Like an extended OPML? Or can we rely on clients as 'intermediaries' (sync data, log out from server, log in to other server)?
Switching from mobile (home/commute) to web/desktop app (at work) is a common use case amongst us.

What would be our Minimum Viable Product?

Next steps?
- split the list into compenents problems
- asynchronous discussions
- organize meetings when needed on specific matters

###### tags: `project-management` `meeting-notes` `OpenPodcastAPI`
35 changes: 35 additions & 0 deletions meeting-notes/2023-03-14.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Meeting 2023-03-14
===

Participants:
* Sporiff
* keunes
* gcrkrause

# Who has the authority of the GUID

* In the first place the RSS feed
* If thats not available the server might *optionally* ask podcastindex.org
* The client may send a `guid` in the `POST` request **only** if it is obtained from the RSS feed. The server accepts sent `guid` information as authoritative
* The client already has the GUID from the feed
* The server (project) may decide to be as slim as possible, to the extent that it doesn't do any RSS fetching
* The server MUST return a `guid` immediately. This can either be the `guid` sent by the client **or** a generated `guid` if nothing is sent. An asynchronous task CAN fetch the RSS feed to check for a `guid` if one was generated, store an updated `guid` and put an 'updated since' flag to tell clients on next connect to update this data.
* In case a user subscribes to the same podcast though with different feed URLs while there is no `guid` that connects the two, or if a server is unresponsive and this causes issues, it is accepted that this can lead to duplicate subscriptions.

# Deletion process

* The `DELETE` verb should actually remove data as a cascade
* The server should keep a record **only** of the GUID and mark it as deleted
* The API should return a `410 GONE` status for any deleted entries
* The `PATCH` unsubscribe request marks all entries as **unsubscribed**
* The server should not remove any data associated with **unsubscribed** subscriptions unless they are deleted

# Tasks until next time

- [ ] Update specs @Ciaran
- [ ] [Setup Hosted OpenAPI specs](https://github.com/OpenPodcastAPI/api-specs/issues/13) @Georg
- [ ] Setup Sphinx @Ciaran
- [ ] Reference Implementation @Georg
- [ ] Check that Ciarán isn't speaking nonsense in client behavior spec @keunes

###### tags: `project-management` `meeting-notes` `OpenPodcastAPI`
35 changes: 35 additions & 0 deletions meeting-notes/2023-04-11.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Open Podcast API 11/04/2023
===

present: Ciarán (FW), Jonathan (GfN), Keunes (AP) and Frederik ([MusicPod](https://github.com/ubuntu-flutter-community/musicpod))

Ciarán to update:

* Fetch logic:
* All timestamp fields must be checked against the `since` parameter in the call (`subscription_changed`, `guid_changed`)
* Deletion logic:
* `is_deleted` boolean field should be replaced with a timestamp field that is included in fetch calls to inform clients of deletions
* A deleted subscription should be reinstated by a client adding a new subscription with the same GUID. The `subscription_changed` and `guid_changed` fields should reflect the date that the subscription is reinstated. The `deleted` timestamp field should be NULLed
* On receipt of a deleted subscription, the client should present the user with the option to **remove** their local data or **send** their local data to the server to reinstate the subscription details

Keunes to add a project goal/description to the [Index page](https://github.com/OpenPodcastAPI/api-specs/blob/main/docs/index.md) directly in the PR (use [MyST formatting](https://myst-parser.readthedocs.io/en/latest/)).

We'll call the specs 'pre-release' or 'ALPHA' until we have implemented all specs that we deem as 'required' for all servers. Ciarán will add a banner at the top of the pages to warn readers of this.

JonOfUs to add a GitHub Actions workflow for PRs to create and publish a preview of them (template [here](https://github.com/OpenPodcastAPI/api-specs/issues/28))

Once the above changes are reflected, we should merge the subscriptions endpoint spec to have something on the site.

We can use some Creative Commons license for this specification (tbd). Reference implementations can pick their own license (gPodder for Nextcloud & Funkwhale will have AGPL).

Ciarán will be in a podcast early May, would be good to have the Subscriptons endpoint merged by then.

## Future discussion

* Ensure that user data is separated by user ID
* Outline what data can be shared and what is per-user data
* Reflect these rules in the spec for multi-tenant and single-tenant servers
* What calls are core/required; which ones are 'feature' ([GH discussion](https://github.com/orgs/OpenPodcastAPI/discussions/16))
* Declaring versions & supported endpoints (well-known/other way; [Matrix](https://spec.matrix.org/v1.6/client-server-api/#capabilities-negotiation) e.g. does this at `$prefix/v1/capabilities`)

###### tags: `meeting` `project-management` `OpenPodcastAPI`
153 changes: 153 additions & 0 deletions meeting-notes/2023-05-30.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
2023-05-30 9pm in the middle of the night
===

## Endpoints
* `GET/PUT /episodes`
* returns only episodes changed
* parameter `since`
* ~~`GET/PUT /episodes/{guid-hash}`~~
* Don't allow this endpoint to prevent problems with duplicate GUIDs
* `GET /subscriptions/{guid}/episodes`
* parameter `since`
* parameter `guid`?
* `GET/PUT /subscriptions/{guid}/episodes/{fetch-hash}` (hash: SHA1?)
* if fetch-hash clash, server expected to return BAD REQUEST
* Hash here, because GUIDs can be any String


We want to explain in the specs why we have endpoints 'under' subscriptions, and why we might refuse updates. (i.e. how this will help avoid gPodder API pitfalls.)

## Episode endpoint

The episode endpoint is required to synchronize playback positions and played status for specific episodes. At a minimum, the endpoint should accept and return the following:

1. The episode's **Podcast GUID** (most recent)
2. The episode's **GUID** (sent by the client if found in the RSS feed, or generated by the server if not): String (not necessarily GUID/URL formatted).`
4. A **Status** field containing lifecycle statuses. E.g.:
* `New`
* `Played`
* `Ignored`
* `Queued`
6. A **Playback position** marker, updated by a PUT request
7. A **timestamp** of the last time the episode was played/paused (used for conflict resolution on the playback position)
8. A **Favorite** field to mark episodes
9. A **timestamp** for the last time some metadata (except playback position) was updated

We discussed if it makes sense to use episode numbers, but it's not part of the feed anyways so we don't have this information and don't need it anyways

https://www.rssboard.org/rss-specification#ltguidgtSubelementOfLtitemgt


### Episode identification
#### Fetch-hash vs GUID
Discussion whether to generate a new (static?) identifier per episode and use that for synchronisation (clients would have to store it additionally per episode?) or to use existing GUIDs as sync identifier and generate them if none is present (one endpoint needs the GUIDs to be passed by their hash/base64 then for REST-compliancy)

#### Fetch-hash
Fetch-hash creation: SHA1/MD5 hash of
1. `<guid>` https://www.rssboard.org/rss-specification#ltguidgtSubelementOfLtitemgt

x. `<link>` https://www.rssboard.org/rss-specification#hrelementsOfLtitemgt
x. `<enclosure>` (aka media file URL) https://www.rssboard.org/rss-specification#ltenclosuregtSubelementOfLtitemgt

Priority of latter 2 tbd: `<link>` might be less likely to be unique, while `<enclosure>` might be less stable (more likely to change).

Consideration: why not BASE64? (REST-compliant, can be "unhashed", so hash wouldn't have to be stored on the server)

Good practice/required: store all 3 (GUID, link, media file URL). This will allow for later matching of episodes if one or two of these are missing. For example, if a totally new client is connecting to a server, and an episode doesn't have a GUID and the `<link>` has changed, matching would still be possible based on media file URL. (If we don't do this, finding the right episode locally might be hard when receiving a fetch-hash that's not unique, or a GUID that's missing. We know the podcast and within each podcast there'll be only a limited set of 'wrong' episodes, so a client would only have to create hashes for a few episodes in order to find a match. But still, not very economic.)

<details>
<summary markdown="span">Matching proposal in pseudo-code (click to expand)</summary>

```pseudo-code
are_episodes_equal(client-episode c, server-episode s):
// this filters out any potential GUID duplicates
if c.podcast_guid != s.podcast_guid then
return False

// if GUID is present, decide exclusively according to it
if c.guid not empty then
return c.guid == s.guid

// if enclosure matches, probably the same (since they share the media file)
if c.enclosure not empty && c.enclosure == s.enclosure then
return True

// case: no media file
if c.enclosure empty then
// no guid, enclosure or link -> not matchable
if c.link empty then
return False

// no media file, but episode URL matches - very probably the same
// (how large is the error here?)
if c.link == l.link then
return True

// All other cases: not matching
return False
```
</details><br>

?? Each field that is empty/not present in the RSS is stored & sent empty. ~~The fetch-hash is only used when sending a request about a specific episode.~~ (that wouldn't work well in case of batch updates - see below) Payloads don't contain fetch-hashes, only the three separate fields.

Two options for identifying episodes in communication:
[I don't think these are the only options, see [here](#Fetch-hash-vs-GUID)]
* For each episode (e.g. in queue; batch update), all three fields/tags are included. Lot of (unnecessary) data exchange.
* Each episode gets a calculated fetch-hash, which is used for communication. Clients can decide to store or generate on the fly. (Generating on-the-fly is dangerous, episode identifier should be static even if episode changes)

Server creates fetch-hash, similar to creation of Podcast GUID, based on the logic described above.

Why do we trust the server to create the hash, more than the client? Because for each person, there's probably just 1 server in the game, more likely multiple clients. So if the server messes it up, there's still a single outcome for each user.

#### GUID
Why shouldn't the server just create a GUID (seed: available payloads or whole episode, can also be just random) and send this back to the client? (the client would map using `<enclosure>` and `<link>` and then store this GUID)
[Advantage: less payload fields, only `<enclosure>`, `<link>` and `<guid>` and after first sync only `<guid>` (`guid-hash` only for `PUT /subs../{guid}/epis../{guid-hash}`)]
[Further advantage: easier to implement for clients, they probably already have an `episode_guid` field in their DB]

Only create GUID if none is present, otherwise use existing one.
Identify episode always by `podcast_guid`+`episode_guid` (e.g. when referencing queue items, settings, ...)
[PodcastIndex seems to handle this [the same way](https://podcastindex-org.github.io/docs-api/#get-/episodes/byguid)]

The workflow if a new client connects could then be:
1. Get subscriptions & fetch feeds
2. Get episodes
3. Feed with GUIDs: map by GUID
4. Feed without GUIDs: map by matching algorithm [[above](#Matching-proposal-in-pseudo-code)], then store GUID from sync server

#### Deduplication

Two options:
a. agree on a deduplication logic as part of the spec which is to be executed at server level (hard to 'enforce')
b. let clients figure out deduplication, and spec the calls that will allow clients to merge episodes.

To be discussed further. Latter is easier for us :-)
Latter should be in the spec in either case, so that we don't have to change the whole spec if some podcast feeds mess up in a way we never anticipated. Clients can adapt a lot faster.

#### New GUID/Fetch-hash logic
Necessary for changing GUIDs, can also be used for deduplication?

Options:
1. `PUT /episodes` with additional field `old_fetch-hash` (or `old_guid`)
2. `PUT /subscriptions/{guid}/episodes/{guid-/fetch-hash}` with additional field `new_fetch-hash` (or `new_guid`)

Case where both episodes are contained in the feed (episode didn't change, but podcasters published twice): To mark duplicate, additional boolean `is_duplicate` so that the server handles `fetch-hash`/`guid` of both as aliases (tombstoning one, if one of them is requested, return aliases in field/array `aliases`/`duplicate_fetch-hashes/guids`)

In both cases, server changes fetch-hash/GUID of episode entry, sets `fetch-hash/GUID_changed` timestamp and creates tombstone for old value
[On `GET /episodes`, old value is in `fetch-hash`/`guid` and new value in `new_fetch-hash/new_guid`, same behaviour as in Subscriptions]

Case to handle:
1. Client 1 marks {`fetch-hash2`/`guid2`} as new guid of {`fetch-hash1`/`guid1`}
2. Client 2 receives & stores this
3. Client 2 marks {`fetch-hash1`/`guid1`} as new guid of {`fetch-hash2`/`guid2`}

(could happen through e.g. slightly different podcast feed, e.g. one feed contains MP3s, the other AACs, but podcast GUID is the same)


## Excursus Database Schema in the specs

* We should focus on the format of the communications, not how the database is stored
* We have all field data types specified anyways in the API endpoint specification
* We can leave the proposed database schema as an example


###### tags: `project-management` `meeting-notes` `OpenPodcastAPI`
37 changes: 37 additions & 0 deletions meeting-notes/2023-07-11.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
2023-07-11 20:00
===

## Episode identification

Possible way forward for selecting ideal ['identification' (ID) for episodes](https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA): write up test cases (examples of data gaps) > what satisfies all our test cases?

* rss feed without episode guids
* rss feed with 2 duplicate guids
* guid changes for a given episode in the rss feed
* ...

Then make table.

We should probably add a warning, reminding that these cannot be used as the only indices in database in multi-user environment (users have different playback positions).

## Data

1. The episode's **Podcast GUID** (most recent)
2. The episode's **GUID** (sent by the client if found in the RSS feed, or generated by the server if not): String (not necessarily GUID/URL formatted).`
4. A boolean **played** field / or a field(e.g. nested json) **state** containing information about the state this episode currently in (like played, in_queue, ignored, ...)
a. What is 'played' differs between clients (e.g. in AntennaPod you can set as played even if 20 seconds at end is skipped)
b. Interaction with other potential states? (e.g. 'ignored') E.g. 'notified' (to avoid getting notifications on multiple devices). Need a list of statuses (& combinations) to keep track of, and then see which options (boolean, integer, nested booleans, etc) are best.
c. Solution: define a set of states and explain those well
6. Liked/Favourited
7. A **Playback position** marker, updated by a PUT request
8. A **time_played** counter, containing the total amount of seconds this episode was played
9. A **timestamp** of the last time the episode was played/paused
10. To resolve sync conflicts: dedocated timestamp for each of the fields? Or single timestamp for whole episode.
a. Two timestamps: **last_played** (for conflict resolution on the playback position) and **metadata_changed** (for conflict resolution on all other episode information)
~~b. One timestamp for everything~~
~~c. Separate timestamps for each field~~ [too complicated]
11. Episode length? (gpodder.net had this) TBD (cases with media files shorter like 30 sec when abroad, or when media files have ads removed after x-thousand downloads because podcaster gets paid only for first 10k)
12. Any other markers (e.g. bookmarked playback positions; timed annotations)
13. Ratings/Reviews (probably better as separate endpoint, referencing the episode)

###### tags: `project-management` `meeting-notes` `OpenPodcastAPI`
Loading