-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a compression algorithm without ZERO width char ? #126
Comments
Try |
@pieroxy , I tried . no use. Is there any method make the result of LZString.compress without \u2028 & \u2029 ? test case (in some android browsers , You know ,in China there are many brands of Mobile Phone, and they use many custom browsers )
if no \u2028 & \u2029 , everything is ok. |
If you look at compressToUTF16, it is easy to adapt to your use case:
You can add a simple check in the lambda provided ( |
compressToBase64 |
Here you go: https://gist.github.com/JobLeonard/7a49b8e5adf17d9a3783ffcfa21eec3f I also removed the string-characters, so:
So copying/pasting a compressed string won't suddenly lead to strange broken string behavior. Did a quick test round here: https://observablehq.com/d/d223e05380aa85c9/ |
You can do some easy tricks. For example, add a fixed 4 chars to beginning of every compressed string, which is not found in the compressed string. choose of chars in M:
so, \u3165 - \uFEFE should be a good range. function rndInt(a, b) {
return Math.floor(Math.random() * (b - a + 1)) + a;
}
function compress(s) {
let z = LZString.compress(s);
let m = '';
do {
m = Array.from({ length: 4 }, () => String.fromCharCode(rndInt(0x3165, 0xFCFE))).join('') // generate a fixed length prefix
} while (z.includes(m)) // ensure it is not used in the compression
// breaking: \u2028\u2029
// zero-width: \u200B-\u200D\u2060\uFEFF
// problem chars: \u180E\u2800\u3164
// other white spaces: \u2000-\u200A\u202F\u205F\u3000
z = z.replace(/[\u2028\u2029\u200B-\u200D\u2060\uFEFF\u180E\u2800\u3164\u2000-\u200A\u202F\u205F\u3000]/g, e => m + String.fromCharCode(e.charCodeAt() - 0x200)); // offset -0x200
return m + z;
}
function decompress(z) {
let m = z.substring(0, 4);
z = z.substring(4);
z = z.replace(new RegExp(`${m}(.)`, 'g'), (e, f) => String.fromCharCode(f.charCodeAt() + 0x200)); // offset +0x200
const s = LZString.decompress(z);
return s;
} Note:
Another approachlike
then you can reverse back to decode it. first char is t. t1 t2 t3 ... at the end. replace t0 to t this should be very efficient and reliable. function findLeastFrequentChar(text) {
const charCount = {};
let minCount = Infinity;
let leastFrequentChar = '';
// Count the occurrences of each character
for (let char of text) {
if (!charCount[char]) {
charCount[char] = 0;
}
charCount[char]++;
}
// Find the character with the least occurrences
for (let char in charCount) {
if (charCount[char] < minCount) {
minCount = charCount[char];
leastFrequentChar = char;
}
}
return leastFrequentChar;
} |
In some android browser , it can't run
LZString.decompress
correctly when the string include ZERO WIDTH char (e.g. 0x80 0x86 ...)Is there a method of LZString could do compress without ZERO WIDTH char ?
Thanks
The text was updated successfully, but these errors were encountered: