Skip to content

Commit

Permalink
Updated the regex to capture all UTF-8 patterns
Browse files Browse the repository at this point in the history
Regex is so concise, I literally had to write more characters in
documentation than the regex itself.
  • Loading branch information
1Git2Clone committed Dec 16, 2024
1 parent df97df4 commit 3e4a68e
Showing 1 changed file with 27 additions and 2 deletions.
29 changes: 27 additions & 2 deletions src/data/bot_data.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,32 @@ lazy_static! {
temp
};

// https://regex101.com/r/aX8vec/5
pub(crate) static ref EMOJIS_AND_EMBEDS_REGEX: Regex = Regex::new(r"(?<emoji>(:)([a-zA-Z0-9_]+)(:))|(?<embed>(\[)([a-zA-Z0-9_]+)(\])\([^()]*\))").unwrap();
/// # 2 groups
///
/// *(matched via bitwise or `|`)*
///
/// 1. emoji
/// - `:UTF-8:`
/// - Exceptions for the `UTF-8` group:
/// - `:` & `<any-whitespace>` at both ends
/// 2. embed_emoji
/// - `[UTF-8](<any-pattern/nothing>)`
/// - Exceptions for the `UTF-8` group:
/// - `<any-whitespace>` at both ends
/// - `[` at the start
/// - `]` at the end
///
/// ---
///
/// <https://regex101.com/r/Yi782B/1>
pub(crate) static ref EMOJIS_AND_EMBEDS_REGEX: Regex = Regex::new(
concat!(
"(?<emoji>",
r"(:)([[^:\s]&&\u0000-\u{10FFFF}&&[^:\s]]+)(:)",
")|(?<embed_emoji>",
r"(\[)([[^\[\s]&&\u0000-\u{10FFFF}&&[^\]\s]]+)(\])\([^()]*\)",
")"
)
).unwrap();

}

0 comments on commit 3e4a68e

Please sign in to comment.