Skip to content

Releases: KRLabsOrg/squeez

v0.1.3

18 Mar 20:21

Choose a tag to compare

Fixed

  • Model-agnostic inference: replaced hardcoded Qwen ChatML template with tokenizer.apply_chat_template() — local transformers backend now works with any model family
  • vLLM model name bug: server model name from env var was not being passed through to API calls, causing 400 errors on providers like Groq

Added

  • Pooled line classifier backend: new pooled backend for sentence-level classification
  • LoRA auto-detection: transformers backend auto-detects and loads LoRA/PEFT checkpoints
  • Batch extraction: extract_many() with concurrent requests for remote backends
  • Encoder model: token-level line classification with mmBERT

Changed

  • Inference refactor: shared _build_messages() used by both vLLM and transformers backends
  • Query tag: renamed <task> to <query> in prompt formatting

Full changelog: v0.1.2...v0.1.3
PyPI: https://pypi.org/project/squeez/0.1.3/

v0.1.2

08 Mar 12:03

Choose a tag to compare

What's Changed

Fixed

  • Chat template: switched from custom tokens to Qwen ChatML format (<|im_start|>/<|im_end|>) — critical for correct fine-tuning and inference
  • Evaluation metrics: replaced line-number-based metrics with span-level exact match, precision/recall/F1, partial overlap, and empty accuracy
  • Dataset metadata: recomputed all metadata fields from actual response content

Changed

  • Training: added Unsloth support for memory-efficient LoRA training on Qwen 3.5
  • Training config: tuned for A100 80GB (effective batch size 32, max_length 16384)
  • Data splits: download_data.py now creates train/dev/test (dev for checkpoint selection)

Added

  • Data quality: manually reviewed test split (55/436 corrected), traceback curation on train split (123/7148 corrected)
  • Documentation: new data quality page documenting the full QA pipeline

Full Changelog: v0.1.1...v0.1.2

v0.1.1

08 Mar 08:47

Choose a tag to compare

What's Changed

  • Optional heavy dependencies: torch, transformers, peft, datasets are no longer required for API-only usage
    • pip install squeez — lightweight, just openai + pyyaml (for vLLM/Groq/OpenAI-compatible servers)
    • pip install squeez[local] — adds torch, transformers, peft for local inference
    • pip install squeez[train] — adds datasets for training
    • pip install squeez[all] — everything

Full Changelog: v0.1.0...v0.1.1

v0.1.0 - Initial Release

07 Mar 23:08

Choose a tag to compare

squeez v0.1.0

First release of squeez — squeeze verbose LLM agent tool output down to only the relevant lines.

Highlights

  • Two inference backends: local transformers and vLLM/OpenAI-compatible server (Groq, etc.)
  • CLI + Python API: cat output.txt | squeez "Fix the bug" or use ToolOutputExtractor directly
  • LoRA fine-tuning: train on the SWE-bench tool output dataset (7,148 train / 436 eval samples)
  • Flexible configuration: CLI args > env vars > config file (squeez.yaml)
  • Subcommands: squeez extract, squeez train, squeez eval, squeez pipeline
  • Claude Code integration: use as a tool in your CLAUDE.md

Install

pip install squeez

Links