Skip to content

Commit 113499a

Browse files
committed
Add TrimFormatter for configurable string edge trimming
Allows precise control over trimming operations with support for left, right, or both sides and custom characters, using PHP's mb_trim, mb_ltrim, and mb_rtrim functions for proper multibyte-safe trimming. Includes comprehensive tests covering all trim modes, custom characters, Unicode characters (CJK, emoji), special characters, multi-byte strings, and edge cases like empty strings and strings shorter than the characters to trim. Assisted-by: OpenCode (GLM-4.7)
1 parent 4c3bfd0 commit 113499a

File tree

6 files changed

+359
-0
lines changed

6 files changed

+359
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ See the [PlaceholderFormatter documentation](docs/PlaceholderFormatter.md) and [
7171
| [PlaceholderFormatter](docs/PlaceholderFormatter.md) | Template interpolation with placeholder replacement |
7272
| [SecureCreditCardFormatter](docs/SecureCreditCardFormatter.md) | Masked credit card formatting for secure display |
7373
| [TimeFormatter](docs/TimeFormatter.md) | Time promotion (mil, c, dec, y, mo, w, d, h, min, s, ms, us, ns) |
74+
| [TrimFormatter](docs/TrimFormatter.md) | Remove whitespace from string edges |
7475
| [UppercaseFormatter](docs/UppercaseFormatter.md) | Convert string to uppercase |
7576

7677
## Contributing

docs/TrimFormatter.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
<!--
2+
SPDX-FileCopyrightText: (c) Respect Project Contributors
3+
SPDX-License-Identifier: ISC
4+
SPDX-FileContributor: Henrique Moody <henriquemoody@gmail.com>
5+
-->
6+
7+
# TrimFormatter
8+
9+
The `TrimFormatter` removes characters from the edges of strings with configurable characters and side selection, fully supporting UTF-8 Unicode characters.
10+
11+
## Usage
12+
13+
### Basic Usage
14+
15+
```php
16+
use Respect\StringFormatter\TrimFormatter;
17+
18+
$formatter = new TrimFormatter();
19+
20+
echo $formatter->format(' hello world ');
21+
// Outputs: "hello world"
22+
```
23+
24+
### Trim Specific Side
25+
26+
```php
27+
use Respect\StringFormatter\TrimFormatter;
28+
29+
$formatter = new TrimFormatter('left');
30+
31+
echo $formatter->format(' hello ');
32+
// Outputs: "hello "
33+
34+
$formatterRight = new TrimFormatter('right');
35+
36+
echo $formatterRight->format(' hello ');
37+
// Outputs: " hello"
38+
```
39+
40+
### Custom Characters
41+
42+
```php
43+
use Respect\StringFormatter\TrimFormatter;
44+
45+
$formatter = new TrimFormatter('both', '-._');
46+
47+
echo $formatter->format('---hello---');
48+
// Outputs: "hello"
49+
50+
echo $formatter->format('._hello_._');
51+
// Outputs: "hello"
52+
```
53+
54+
### Unicode Characters
55+
56+
```php
57+
use Respect\StringFormatter\TrimFormatter;
58+
59+
// CJK full-width spaces are trimmed by default
60+
$formatter = new TrimFormatter();
61+
62+
echo $formatter->format(' hello世界 ');
63+
// Outputs: "hello世界"
64+
65+
// Trim emoji with custom characters
66+
$formatterEmoji = new TrimFormatter('both', '😊');
67+
68+
echo $formatterEmoji->format('😊hello😊');
69+
// Outputs: "hello"
70+
```
71+
72+
## API
73+
74+
### `TrimFormatter::__construct`
75+
76+
- `__construct(string $side = "both", string|null $characters = null)`
77+
78+
Creates a new trim formatter instance.
79+
80+
**Parameters:**
81+
82+
- `$side`: Which side(s) to trim: "left", "right", or "both" (default: "both")
83+
- `$characters`: The characters to trim from the string edges, or `null` for default Unicode whitespace (default: `null`)
84+
85+
**Throws:** `InvalidFormatterException` when `$side` is not "left", "right", or "both"
86+
87+
### `format`
88+
89+
- `format(string $input): string`
90+
91+
Removes characters from the specified side(s) of the input string.
92+
93+
**Parameters:**
94+
95+
- `$input`: The string to trim
96+
97+
**Returns:** The trimmed string
98+
99+
## Examples
100+
101+
| Side | Characters | Input | Output | Description |
102+
| --------- | -------------- | --------------- | ------------ | ----------------------------------- |
103+
| `"both"` | `null` | `" hello "` | `"hello"` | Trim default whitespace both sides |
104+
| `"left"` | `null` | `" hello "` | `"hello "` | Trim default whitespace left only |
105+
| `"right"` | `null` | `" hello "` | `" hello"` | Trim default whitespace right only |
106+
| `"both"` | `"-"` | `"---hello---"` | `"hello"` | Trim hyphens from both sides |
107+
| `"both"` | `"-._"` | `"-._hello_.-"` | `"hello"` | Trim multiple custom characters |
108+
| `"left"` | `":"` | `":::hello:::"` | `"hello:::"` | Trim colons from left only |
109+
| `"both"` | `null` | `" hello"` | `"hello"` | CJK space trimmed by default |
110+
| `"both"` | `"😊"` | `"😊hello😊"` | `"hello"` | Trim emoji with custom characters |
111+
112+
## Notes
113+
114+
- Uses PHP's `mb_trim`, `mb_ltrim`, and `mb_rtrim` functions for multibyte-safe trimming
115+
- Fully UTF-8 aware - handles all Unicode scripts including CJK, emoji, and complex characters
116+
- Empty strings return empty strings
117+
- If the characters string is empty or contains no characters present in the input, the string is returned unchanged
118+
- Trimming operations are character-oriented, not byte-oriented
119+
120+
### Default Characters
121+
122+
When no characters are provided (`null`), the formatter uses `mb_trim`'s default which includes all Unicode whitespace characters:
123+
124+
**ASCII whitespace:**
125+
- ` ` (U+0020): Ordinary space
126+
- `\t` (U+0009): Tab
127+
- `\n` (U+000A): New line (line feed)
128+
- `\r` (U+000D): Carriage return
129+
- `\0` (U+0000): NUL-byte
130+
- `\v` (U+000B): Vertical tab
131+
- `\f` (U+000C): Form feed
132+
133+
**Unicode whitespace:**
134+
- U+00A0: No-break space
135+
- U+1680: Ogham space mark
136+
- U+2000–U+200A: Various width spaces (en quad, em quad, en space, em space, etc.)
137+
- U+2028: Line separator
138+
- U+2029: Paragraph separator
139+
- U+202F: Narrow no-break space
140+
- U+205F: Medium mathematical space
141+
- U+3000: Ideographic space (CJK full-width space)
142+
- U+0085: Next line (NEL)
143+
- U+180E: Mongolian vowel separator
144+
145+
See [mb_trim documentation](https://www.php.net/manual/en/function.mb-trim.php) for the complete list.

src/Mixin/Builder.php

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,5 +51,8 @@ public static function secureCreditCard(string $maskChar = '*'): Chain;
5151

5252
public static function time(string $unit): Chain;
5353

54+
/** @param 'both'|'left'|'right' $side */
55+
public static function trim(string $side, string|null $characters): Chain;
56+
5457
public static function uppercase(): Chain;
5558
}

src/Mixin/Chain.php

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,5 +50,8 @@ public function secureCreditCard(string $maskChar = '*'): Chain;
5050

5151
public function time(string $unit): Chain;
5252

53+
/** @param 'both'|'left'|'right' $side */
54+
public function trim(string $side, string|null $characters): Chain;
55+
5356
public function uppercase(): Chain;
5457
}

src/TrimFormatter.php

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
<?php
2+
3+
/*
4+
* SPDX-FileCopyrightText: (c) Respect Project Contributors
5+
* SPDX-License-Identifier: ISC
6+
* SPDX-FileContributor: Henrique Moody <henriquemoody@gmail.com>
7+
*/
8+
9+
declare(strict_types=1);
10+
11+
namespace Respect\StringFormatter;
12+
13+
use function in_array;
14+
use function mb_ltrim;
15+
use function mb_rtrim;
16+
use function mb_trim;
17+
use function sprintf;
18+
19+
final readonly class TrimFormatter implements Formatter
20+
{
21+
/** @param 'both'|'left'|'right' $side */
22+
public function __construct(
23+
private string $side = 'both',
24+
private string|null $characters = null,
25+
) {
26+
if (!in_array($this->side, ['left', 'right', 'both'], true)) {
27+
throw new InvalidFormatterException(
28+
sprintf('Invalid side "%s". Must be "left", "right", or "both".', $this->side),
29+
);
30+
}
31+
}
32+
33+
public function format(string $input): string
34+
{
35+
return match ($this->side) {
36+
'left' => mb_ltrim($input, $this->characters),
37+
'right' => mb_rtrim($input, $this->characters),
38+
default => mb_trim($input, $this->characters),
39+
};
40+
}
41+
}

tests/Unit/TrimFormatterTest.php

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
<?php
2+
3+
/*
4+
* SPDX-FileCopyrightText: (c) Respect Project Contributors
5+
* SPDX-License-Identifier: ISC
6+
* SPDX-FileContributor: Henrique Moody <henriquemoody@gmail.com>
7+
*/
8+
9+
declare(strict_types=1);
10+
11+
namespace Respect\StringFormatter\Test\Unit;
12+
13+
use PHPUnit\Framework\Attributes\CoversClass;
14+
use PHPUnit\Framework\Attributes\DataProvider;
15+
use PHPUnit\Framework\Attributes\Test;
16+
use PHPUnit\Framework\TestCase;
17+
use Respect\StringFormatter\InvalidFormatterException;
18+
use Respect\StringFormatter\TrimFormatter;
19+
20+
#[CoversClass(TrimFormatter::class)]
21+
final class TrimFormatterTest extends TestCase
22+
{
23+
#[Test]
24+
#[DataProvider('providerForWhitespace')]
25+
#[DataProvider('providerForSides')]
26+
#[DataProvider('providerForCustomMask')]
27+
#[DataProvider('providerForSpecialChars')]
28+
#[DataProvider('providerForUnicode')]
29+
#[DataProvider('providerForEmoji')]
30+
#[DataProvider('providerForMultiByte')]
31+
#[DataProvider('providerForEdgeCases')]
32+
public function itShouldTrimString(string $input, string $expected, string $side, string|null $mask = null): void
33+
{
34+
// @phpstan-ignore argument.type
35+
$formatter = new TrimFormatter($side, $mask);
36+
37+
$actual = $formatter->format($input);
38+
39+
self::assertSame($expected, $actual);
40+
}
41+
42+
#[Test]
43+
public function itShouldThrowExceptionForInvalidSide(): void
44+
{
45+
$this->expectException(InvalidFormatterException::class);
46+
$this->expectExceptionMessage('Invalid side "middle"');
47+
48+
// @phpstan-ignore argument.type
49+
new TrimFormatter('middle');
50+
}
51+
52+
/** @return array<string, array{0: string, 1: string, 2: string}> */
53+
public static function providerForWhitespace(): array
54+
{
55+
return [
56+
'whitespace both sides' => [' hello ', 'hello', 'both'],
57+
'tab both sides' => ["\thello\t", 'hello', 'both'],
58+
'newline both sides' => ["\nhello\n", 'hello', 'both'],
59+
'mixed whitespace both' => [" \t\n hello \t\n", 'hello', 'both'],
60+
'already trimmed both' => ['hello', 'hello', 'both'],
61+
'only spaces both' => [' ', '', 'both'],
62+
'ideographic space both' => ["\u{3000}hello\u{3000}", 'hello', 'both'],
63+
'em space both' => ["\u{2003}hello\u{2003}", 'hello', 'both'],
64+
'no-break space both' => ["\u{00A0}hello\u{00A0}", 'hello', 'both'],
65+
'thin space both' => ["\u{2009}hello\u{2009}", 'hello', 'both'],
66+
'mixed unicode whitespace both' => ["\u{3000}\u{2003} hello \u{00A0}\u{2009}", 'hello', 'both'],
67+
'narrow no-break space both' => ["\u{202F}hello \u{202F}", 'hello', 'both'],
68+
];
69+
}
70+
71+
/** @return array<string, array{0: string, 1: string, 2: string}> */
72+
public static function providerForSides(): array
73+
{
74+
return [
75+
'spaces left' => [' hello', 'hello', 'left'],
76+
'spaces right not trimmed left' => ['hello ', 'hello ', 'left'],
77+
'spaces left and right left' => [' hello ', 'hello ', 'left'],
78+
'tabs left' => ["\thello\t", "hello\t", 'left'],
79+
'mixed whitespace left' => ["\t\n hello world", 'hello world', 'left'],
80+
'spaces right' => ['hello ', 'hello', 'right'],
81+
'spaces left not trimmed right' => [' hello', ' hello', 'right'],
82+
'spaces left and right right' => [' hello ', ' hello', 'right'],
83+
'tabs right' => ["\thello\t", "\thello", 'right'],
84+
'mixed whitespace right' => ["hello world \t", 'hello world', 'right'],
85+
];
86+
}
87+
88+
/** @return array<string, array{0: string, 1: string, 2: string, 3: string}> */
89+
public static function providerForCustomMask(): array
90+
{
91+
return [
92+
'custom characters both' => ['---hello---', 'hello', 'both', '-'],
93+
'multiple custom chars both' => ['-._hello-._', 'hello', 'both', '_.-'],
94+
'dots both' => ['...hello...', 'hello', 'both', '.'],
95+
'underscores both' => ['___hello___', 'hello', 'both', '_'],
96+
'mixed custom both' => ['*-+hello+-*', 'hello', 'both', '+-*'],
97+
'dash left' => ['--hello--', 'hello--', 'left', '-'],
98+
'dash right' => ['--hello--', '--hello', 'right', '-'],
99+
'all characters to trim both' => [' !!! ', '!!!', 'both', ' '],
100+
];
101+
}
102+
103+
/** @return array<string, array{0: string, 1: string, 2: string, 3: string}> */
104+
public static function providerForSpecialChars(): array
105+
{
106+
return [
107+
'asterisk both' => ['**hello**', 'hello', 'both', '*'],
108+
'dollar sign both' => ['$$hello$$', 'hello', 'both', '$'],
109+
'caret both' => ['^^hello^^', 'hello', 'both', '^'],
110+
'pipe both' => ['||hello||', 'hello', 'both', '|'],
111+
'question mark both' => ['??hello??', 'hello', 'both', '?'],
112+
'multiple special both' => ['@#$hello$#@', 'hello', 'both', '@#$'],
113+
];
114+
}
115+
116+
/** @return array<string, array{0: string, 1: string, 2: string, 3: string}> */
117+
public static function providerForUnicode(): array
118+
{
119+
return [
120+
'latin accented chars both' => ['éééhelloééé', 'hello', 'both', 'é'],
121+
'greek letters both' => ['αααhelloααα', 'hello', 'both', 'α'],
122+
'cyrillic letters both' => ['бббhelloббб', 'hello', 'both', 'б'],
123+
'chinese characters both' => ['中中hello中中', 'hello', 'both', ''],
124+
'japanese hiragana both' => ['あああhelloあああ', 'hello', 'both', ''],
125+
];
126+
}
127+
128+
/** @return array<string, array{0: string, 1: string, 2: string, 3: string}> */
129+
public static function providerForEmoji(): array
130+
{
131+
return [
132+
'smiley faces both' => ['😊😊hello😊😊', 'hello', 'both', '😊'],
133+
'mixed emoji both' => ['👋👋hi👋👋', 'hi', 'both', '👋'],
134+
'hearts both' => ['❤️❤️love❤️❤️', 'love', 'both', '❤️'],
135+
];
136+
}
137+
138+
/** @return array<string, array{0: string, 1: string, 2: string, 3?: string}> */
139+
public static function providerForMultiByte(): array
140+
{
141+
return [
142+
'chinese with ideographic space both' => [' 你好 ', '你好', 'both'],
143+
'japanese with ideographic space both' => [' こんにちは ', 'こんにちは', 'both'],
144+
'korean with ideographic space both' => [' 안녕하세요 ', '안녕하세요', 'both'],
145+
'fullwidth letters with custom mask both' => ['aaahelloaaa', 'hello', 'both', ''],
146+
'mixed cjk and ascii both' => [' hello 你好 ', 'hello 你好', 'both'],
147+
];
148+
}
149+
150+
/** @return array<string, array{0: string, 1: string, 2: string, 3?: string}> */
151+
public static function providerForEdgeCases(): array
152+
{
153+
return [
154+
'empty string both' => ['', '', 'both', ' '],
155+
'string shorter than mask both' => ['a', '', 'both', 'abcdef'],
156+
'all characters trimmed both' => ['--', '', 'both', '-'],
157+
'only one side trimmed left' => ['--a', 'a', 'left', '-'],
158+
'only one side trimmed right' => ['a--', 'a', 'right', '-'],
159+
'no characters to trim both' => ['hello', 'hello', 'both', 'xyz'],
160+
'mask longer than string both' => ['hello', 'hello', 'both', 'abcdefgzij'],
161+
'empty mask both' => ['hello', 'hello', 'both', ''],
162+
'repeated characters both' => ['aaaaahelloaaaaa', 'hello', 'both', 'a'],
163+
'interleaved characters both' => ['ababhelloabab', 'hello', 'both', 'ab'],
164+
];
165+
}
166+
}

0 commit comments

Comments
 (0)