Skip to content

Commit

Permalink
README
Browse files Browse the repository at this point in the history
  • Loading branch information
fanchann committed Feb 1, 2025
1 parent b6db4db commit 1b5f2f5
Showing 1 changed file with 116 additions and 35 deletions.
151 changes: 116 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,130 @@
# Yamete-go
<p align="center">
<img src="https://media4.giphy.com/media/v1.Y2lkPTc5MGI3NjExemZkOWdvbmx2NG03bWZucGJ1MTV4ZnM2MHl1bTE4bGt3a2xmcDFpOSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/l0Iy33dWjmywkCnNS/giphy.gif" alt="yamete"/>
</p>
![yamete](https://media4.giphy.com/media/v1.Y2lkPTc5MGI3NjExemZkOWdvbmx2NG03bWZucGJ1MTV4ZnM2MHl1bTE4bGt3a2xmcDFpOSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/l0Iy33dWjmywkCnNS/giphy.gif)

**Yamete-Go** comes from the Japanese word **"Yamete" (やめて)**, which means **"stop"**.
In this context, **Yamete-Go is a high-performance text censorship library** that utilizes a **Trie-based pattern matching** algorithm to detect and censor unwanted words in a text.


**A high-performance text censorship library** with _trie-based pattern matching_ algorithm.

## Architecture Visualization

```md
Inserted words:
-bad
-crap
-bastard

Trie:
`<-` = end of the word
(root)
/ \
c b
/ \
r a
/ \ / \
a <- p <- s <- t <-
\ \
p a
\ \
e <- r
\
d <-
Search:
bastard -> true
bad -> true
crap -> true

Censored words:
bastard -> ******
bad -> ***
crap -> ****
Input:
- badword
- banana

Trie Visualization:
Root
β”œβ”€β”€ b
β”‚ β”œβ”€β”€ a
β”‚ β”‚ β”œβ”€β”€ d
β”‚ β”‚ β”‚ β”œβ”€β”€ w
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ o
β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ r
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ d (end)
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”œβ”€β”€ a
β”‚ β”‚ β”œβ”€β”€ n
β”‚ β”‚ β”‚ β”œβ”€β”€ a
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ n
β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ a (end)

```

If you insert words like banana and badword into the trie, the censorship system will replace them with asterisks, as shown below:

## Example
```md
Input:
"This is a badword and banana!"

Output:
"this is a ******* and ******!"
(Note: The input is automatically converted to lowercase.)
```

## How yamete-go works?
`Yamete-Go` processes only alphabetic characters (a-z). If the input text contains numbers, those characters are ignored during processing.


example:
```md
- Input:
4ppl3s

- Trie Visualization:
Root
β”œβ”€β”€ p
β”‚ β”œβ”€β”€ p
β”‚ β”‚ β”œβ”€β”€ l
β”‚ β”‚ β”‚ β”œβ”€β”€ s (end)

- Output:
4ppl3s -> **true** (The word is not censored because numeric characters are ignored.)
```

## How to Use `yamete-go`

`yamete-go` is a library designed to help analyze and censor text based on predefined toxic word lists. Below are the steps to use it effectively.

---

### 1. Create a Yamete Configuration

To start using `yamete-go`, you need to create a configuration object (`YameteConfig`) that specifies the source of the toxic word list. You can load the word list either from a URL or a local file.

```go
yameteCfg := yamete.YameteConfig{
URL: "", // URL of the file to be loaded (e.g., a raw GitHub link)
File: "", // File path of the file to be loaded (local file path)
}
```

**Note:**
- If you load the word list from a URL, ensure that the raw text is UTF-8 encoded.
- Example of a valid URL: [https://raw.githubusercontent.com/fanchann/toxic-word-list/master/id_toxic_371.txt](https://raw.githubusercontent.com/fanchann/toxic-word-list/master/id_toxic_371.txt)

---

### 2. Initialize Yamete

Once the configuration is set, initialize the `yamete-go` instance by passing the configuration object to the `NewYamete` function.

```go
yameteInit, err := yamete.NewYamete(&yameteCfg)
if err != nil {
panic(err) // Handle errors appropriately in your application
}
```
---

### 3. Analyze Text with Yamete

After initializing `yamete-go`, you can analyze any text using the `AnalyzeText` method. This method returns detailed information about the analyzed text, including the original text, censored text, detected toxic words, and the count of censored words.

```go
response := yameteInit.AnalyzeText("dasar lu bot!")

// Print the response details
fmt.Printf("Original Text: %v\n", response.OriginalText) // Output: dasar lu bot!
fmt.Printf("Censored Text: %v\n", response.CensoredText) // Output: dasar lu ***!
fmt.Printf("Censored Words: %v\n", response.CensoredWords) // Output: [bot]
fmt.Printf("Censored Count: %v\n", response.CensoredCount) // Output: 1
```

---

### Key Response Fields

Here’s a breakdown of the fields returned by the `AnalyzeText` method:

- **`OriginalText`**: The original input text provided for analysis.
- **`CensoredText`**: The text after censoring toxic words (toxic words are replaced with `***`).
- **`CensoredWords`**: A list of toxic words detected in the text.
- **`CensoredCount`**: The total number of toxic words detected.




## Example
```go
package main

Expand Down

0 comments on commit 1b5f2f5

Please sign in to comment.