Update dependency @huggingface/transformers to v3.1.0 #59
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
3.0.2
->3.1.0
Release Notes
huggingface/transformers.js (@huggingface/transformers)
v3.1.0
Compare Source
🚀 Transformers.js v3.1 — any-to-any, text-to-image, image-to-text, pose estimation, time series forecasting, and more!
Table of contents:
🤖 New models: Janus, Qwen2-VL, JinaCLIP, LLaVA-OneVision, ViTPose, MGP-STR, PatchTST, PatchTSMixer.
Janus for any-to-any generation (e.g., image-to-text and text-to-image)
First of all, this release adds support for Janus, a novel autoregressive framework that unifies multimodal understanding and generation. The most popular model, deepseek-ai/Janus-1.3B, is tagged as an "any-to-any" model, and has specifically been trained for the following tasks:
Example: Image-Text-to-Text
Sample output:
Example: Text-to-Image
Sample outputs:
Qwen2-VL for Image-Text-to-Text
Example: Image-Text-to-Text
Next, we added support for Qwen2-VL, the multimodal large language model series developed by Qwen team, Alibaba Cloud. It introduces the Naive Dynamic Resolution mechanism, allowing the model to process images of varying resolutions and leading to more efficient and accurate visual representations.
JinaCLIP for multimodal embeddings
Example: Compute text and/or image embeddings with
jinaai/jina-clip-v2
:LLaVA-OneVision for Image-Text-to-Text
Example: Multi-round conversations w/ PKV caching
ViTPose for pose-estimation
Example: Pose estimation w/
onnx-community/vitpose-base-simple
.Optionally, visualize the outputs (Node.js usage shown here, using the `canvas` library):
MGP-STR for Optical Character Recognition (OCR)
Example: Optical Character Recognition (OCR) w/
onnx-community/mgp-str-base
PatchTST and PatchTSMixer for time series forecasting.
Example: Time series forecasting w/
onnx-community/granite-timeseries-patchtst
Example: Time series forecasting w/
onnx-community/granite-timeseries-patchtsmixer
🐛 Bug fixes
📝 Documentation improvements
🛠️ Other improvements
behavior=removed
&invert=false
by @xenova in https://github.com/huggingface/transformers.js/pull/1033progress_callback
by @ocavue in https://github.com/huggingface/transformers.js/pull/1034🤗 New contributors
Full Changelog: huggingface/transformers.js@3.0.2...3.1.0
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.