Skip to content

wallw-teal/chardetng-wasm

Repository files navigation

chardetng-wasm

Makes chardetng (Firefox's character encoding detection library) available to JS via Web Assembly.

Usage

If you are using the repo directly, build before trying the examples

API

detect(uint8Array): String

Given a buffer, guess the character encoding. This is simple wrapper which is equivalent to:

function detect(uint8Array) {
  const detector = EncodingDetector.new();

  // does not assume end of stream
  detector.feed(uint8Array, false);

  // defaults to 'com' topLevelDomain and allows UTF-8
  return detector.guess(undefined, true);
}

EncodingDetector

This is an interface to the original EncodingDetector struct in Rust. See those docs for details.

All arguments other than the buffer for feed() are optional.

Example:

const detector = EncodingDetector.new();

// ... set up a stream of bytes into a buffer
// while reading ...
detector.feed(buffer); // isLast defaults to false

// Optional:
// when finished, if all the feeds represent a full file then mark it finished with no bytes sent
detect.feed('', true);

// these are the defaults
const topLevelDomain = 'com';
const allowUTF8 = false;

console.log(detector.guess(topLevelDomain, allowUTF8));
console.log(detector.guess());

Building

Setup

Prerequisites:

# install rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`

# install wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

You will also need node stable and npm

npm install

Build

npm run build

License

Copyright BIT Systems

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.