Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RichText type #526

Open
gagarski opened this issue Feb 11, 2024 · 2 comments
Open

Introduce RichText type #526

gagarski opened this issue Feb 11, 2024 · 2 comments

Comments

@gagarski
Copy link

gagarski commented Feb 11, 2024

Disclaimer

I understand that I am suggesting a breaking change, but the breaking changes are sometimes being implemented in telegram-bot-api (maintaining compatibility for some time). The introduction of ReplyParameters type kinda resembles what I am suggesting, yet it may be done for better reasons.

Problem

There are a lot of places in bot API where fields like text+entities+parse_mode come together. Together they represent formatting for the text. There are 4 kinds of these groups in different methods and the naming of the fields is barely consistent:

  • text, entities, parse_mode
  • caption, caption_entities, parse_mode
  • quote, quote_entities, quote_parse_mode
  • explanation, explanation_entities, explanation_parse_mode

Suggested solution

Group these fields into a type named (as a first suggestion) RichText, so that

{
  //...
  "caption": "123"
  "caption_entities": [...]
  "parse_mode": "HTML"
  //...
}

becomes

{
  //...
  "caption: {
    "text": "123",
    "entitites": [...]
    "parse_mode": "HTML"
  }
  //...
}

Note: I know that the example having both parse_mode and enities is incorrect, just didn't want to split it into two examples for each case.

Cons

  • As mentioned before, it's a breaking change. The old formats of course can be supported for some time. This is also a change to a very core feature of the API
  • More boilerplate nesting for simple cases (without formatting). This can be softened a bit by allowing a field caption in above example be of type RichText or String

Pros

  • Less duplicates in documentation and client-library code.
  • Client-library developers can provide type-safe means to produce this rich text and prevent library client form passing HTML text with Markdown parse mode. Nothing prevents library developers from doing so now, but in that case they're responsible for converting data from their input format to telegram-bot-api format which can cause bugs and also confusing for library clients (the more library API is aligned with Telegram API the better)

Additional concern: output rich text

Besides the method arguments, there are data returned as a response or update, which can also contain rich text. In that case there is always no parse_mode. This can be solved by splitting RichText into subtypes:

  • RichTextWithEntities
  • RichTextMarkdown
  • RichTextMarkdownV2
  • RichTextHTML

In this case, the input type would be RichText (or String?) and the output type would be RichTextWithEntities.

@levlam
Copy link
Contributor

levlam commented Feb 12, 2024

You correctly and precisely described consequences of that API change. I have nothing to disprove, but let me add some comments on the mentioned pros and cons.

As mentioned before, it's a breaking change. The old formats of course can be supported for some time. This is also a change to a very core feature of the API

This isn't an issue, given the old way will be supported forever and will be removed only from the documentation as happens with all Bot API changes.

More boilerplate nesting for simple cases (without formatting).

This is a real issue and the main reason why API looks like it is. Bot API very often flattens API to simplify common usages, like it is done with all methods for messages sending, or was done with "reply_to_message_id" or "disable_web_page_preview".

This can be softened a bit by allowing a field caption in above example be of type RichText or String

I like this proposal, but there are a lot of popular programming languages that will not be able to implement the same. Also, it could be harder to understand, how to use such fields.

Less duplicates in documentation and client-library code.

This is also true, but changing many similar fields in the documentation is only slightly harder than changing one. Also, it affects mostly writers of the documentation and client-library code.

Client-library developers can provide type-safe means to produce this rich text and prevent library client form passing HTML text with Markdown parse mode. Nothing prevents library developers from doing so now, but in that case they're responsible for converting data from their input format to telegram-bot-api format which can cause bugs and also confusing for library clients (the more library API is aligned with Telegram API the better)

Yes, this can already be done by client libraries. I don't agree that parameters flattening is hard or likely to have bugs. I also don't agree that API of client libraries should be perfectly aligned with Telegram API. They should provide simple access to most raw methods, but libraries advantage comes from the provided syntactic sugar, shorthands, language-specific features, update processing pipelines, the ability to keep user state. Otherwise, it would be simpler to use raw API directly.

Summarizing, this can be implemented someday, but currently I prefer to have the simplest API for common usages and leave all syntactic sugar and language-specific type safety to the developers of client libraries. The issue can be kept open, given this can be changed in the future.

@gagarski
Copy link
Author

Fair points and nothing to argue with, however I'll anser some of your comments as well.

or was done with "reply_to_message_id" or "disable_web_page_preview"

That's what I meant by "better reasons", you added a couple of new fields to that and made it more structured.

I like this proposal, but there are a lot of popular programming languages that will not be able to implement the same. Also, it could be harder to understand, how to use such fields.

That's true that a lot of languages do not support union types. However, we already have a union type in API: chat_id usually has a type Integer or String. That's how that can be worked out:

// @JsonDeserialize(using = ChatIdDeserializer::class) // Deserialization is out of scope for this example, 
// however can still work
sealed interface ChatId

@JvmInline
value class LongChatId(
    @JsonValue // this is a hint to serialize it as a number
    val long: Long
) : ChatId

@JvmInline
value class StringChatId(
    @JsonValue // this is a hint to serialize it as a string
    val string: String
) : ChatId

// Helper methods to create a ChatId from primitive types
fun String.toChatId() = StringChatId(this)
fun Long.toChatId() = LongChatId(this)
fun Int.toChatId() = LongChatId(this.toLong())

Same approach can be applied for String or RichText, however, since you still need to have a wrapper for String, you can give up on plain string wrapper and just use RichTextWithEntities, because if will be effectively the same. The option with plain String though might be useful in the languages with support for union types (or without types at all) and for quick testing using browser.

I don't agree that parameters flattening is hard or likely to have bugs

Well, it's hard to argue that it's not hard, yet it can look like that:

suspend fun Telegram.copyMessage(
    chatId: ChatId,
    messageThreadId: Long? = null,
    fromChatId: ChatId,
    messageId: Long,
    richCaption: RichCaption? = null,
    disableNotification: Boolean = false,
    protectContent: Boolean = false,
    replyParameters: ReplyParameters? = null,
    replyMarkup: ReplyMarkup? = null,
): MessageId = call(
    CopyMessage(
        chatId = chatId, 
        messageThreadId = messageThreadId, 
        fromChatId = fromChatId, 
        messageId = messageId, 
        caption = richCaption?.caption, 
        parseMode = richCaption?.parseMode, 
        captionEntities = richCaption?.captionEntities, 
        disableNotification = disableNotification, 
        protectContent = protectContent, 
        replyParameters = replyParameters, 
        replyMarkup = replyMarkup
    )
)

From my experience, this kind of code is exactly the type of code you can make mistakes in (okay, I involved code-generation to make it)! Given the changing nature of Telegram API and a wide use of optional parameters, It's easy to make a mistake in such mapping, bot when the method is first defined and during updates to catch up wtih Telegram API.

The other approach might be to do some post-processing after serializing to JSON (or Multipart). This involves less field moving-around and allows you to apply the patch directly to the patched part (in our case RichText part). However, in that case these manipulations are no longer type-safe (and therefore error-prone).

I also don't agree that API of client libraries should be perfectly aligned with Telegram API. They should provide simple access to most raw methods, but libraries advantage comes from the provided syntactic sugar, shorthands, language-specific features, update processing pipelines, the ability to keep user state.

Agreed. However, no matter how good are your update processing pipelines and state-keeping, the library user will probably eventually want to send a message. First thing to do is to define structures for such operations (either by an explicit type or by the structure of function params). Even having that (even dumbly reproducing Telegram Bot API docs) is better than directly using HTTP client: at least it is harder for a library user to make a typo in a parameter name. Now the next temptation for a library developer is to write in the documentation something like This is a sendMessage Telegram method. For more info on the parameters please refer to https://core.telegram.org/bots/api#sendmessage. Of course some improvements can be done on the library side: dates can be represented as dates (not an integer), One other improvement might be to split some types (which are not split to subtypes in Telegram docs) to sub-types. So far the library user can jump to Telegram API docs and read some more about fields/parameters. Wrapping/unwrapping can also be done, but that makes things much harder (yet I understand that this "much harder" border is subjective).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants