Releases: argilla-io/argilla
v1.4.1
v1.3.2
v1.2.2
v.1.5.0
🔆 Highlights
Dataset Settings page
We have added a Settings page for your datasets. From there, you will be able to manage your dataset. Currently, it is possible to add labels to your labeling schema and delete the dataset.
Add images to your records
The image in this record was generated using https://robohash.orgYou can pass a URL in the metadata field _image_url
and the image will be rendered in the Argilla UI. You can use this in the Text Classification and the Token Classification tasks.
Non-searchable metadata fields
Apart from the _image_url
field you can also pass other metadata fields that won't be used in queries or filters by adding an underscore at the start e.g. _my_field
.
Load only what you need using rg.load
You can now specify the fields you want to load from your Argilla dataset. That way, you can avoid loading heavy vectors if you're using them for your annotations.
Two new tutorials (kudos @embonhomme & @burtenshaw)
Check out our new tutorials created by the community!
Changelog
All notable changes to this project will be documented in this file. See standard-version for commit guidelines.
1.5.0 - 2023-03-21
Added
- Add the fields to retrieve when loading the data from argilla.
rg.load
takes too long because of the vector field, even when users don't need it. Closes #2398 - Add new page and components for dataset settings. Closes #2442
- Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key
_image_url
- Non-searchable fields support in metadata. #2570
Changed
- Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see #2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
- The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
- Update "Define a labeling schema" section in docs.
- The record inputs are sorted alphabetically in UI by default. #2581
Fixes
- Allow URL to be clickable in Jupyter notebook again. Closes #2527
Removed
- Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client
<v1.3.0
- Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version
<1.3.0
- Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.
As always, thanks to our amazing contributors!
- Documentation update: tutorial for text classification models comparison (#2426) by @embonhomme
- Docs: fix little typo (#2522) by @anakin87
- Docs: Tutorial on image classification (#2420) by @burtenshaw
v1.4.0
🔆 Highlights
Enhanced annotation flow for all tasks
Improved bulk annotation and actions
A more stylish banner for available global actions. It includes an improved label selector to apply and remove labels in bulk.
We enhanced multi-label text classification annotations and now adding labels in bulk doesn't remove previous labels. This action will change the status of the records to Pending and you will need to validate the annotation to save the changes.
Learn more about bulk annotations and multi-level text classification annotations in our docs.
Clear and Reset actions
New actions to clear all annotations and reset changes. They can be used at the record level or as bulk actions.
Unvalidate and undiscard
Click the Validate or Discard buttons in a record to undo this action.
Optimized one-record view
Improved view for a single record to enable a more focused annotation experience.
Prepare for training for SparkNLP Text2Text
Extended support to prepare Text2Text datasets for training with SparkNLP.
Learn more in our docs.
Extended shortcuts for token classification (kudos @cceyda)
In token classification tasks that have 10+ options, labels get assigned QWERTY keys as shortcuts.
Changelog
All notable changes to this project will be documented in this file. See standard-version for commit guidelines.
1.4.0 (2023-03-09)
Features
configure_dataset
accepts a workspace as argument (#2503) (29c9ee3),- Add
active_client
function to main argilla module (#2387) (4e623d4), closes #2183 - Add text2text support for prepare for training spark nlp (#2466) (21efb83), closes #2465 #2482
- Allow passing workspace as client param for
rg.log
orrg.load
(#2425) (b3b897a), closes #2059 - Bulk annotation improvement (#2437) (3fce915), closes #2264
- Deprecate
chunk_size
in favor ofbatch_size
forrg.log
(#2455) (3ebea76), closes #2453 - Expose
batch_size
parameter forrg.load
(#2460) (e25be3e), closes #2454 #2434 - Extend shortcuts to include alphabet for token classification (#2339) (4a92b35)
Bug Fixes
- added flexible app redirect to docs page (#2428) (5600301), closes #2377
- added regex match to set workspace method (#2427) (d789fa1), closes [#2388]
- error when loading record with empty string query (#2429) (fc71c3b), closes #2400 #2303
- Remove extra-action dropdown state after navigation (#2479) (9328994), closes #2158
Documentation
- Add AutoTrain to readme (7199780)
- Add migration to label schema section (#2435) (d57a1e5), closes #2003 #2003
- Adds zero+few shot tutorial with SetFit (#2409) (6c679ad)
- Update readme with quickstart section and new links to guides (#2333) (91a77ad)
As always, thanks to our amazing contributors!
v1.3.1
v1.3.0
🔆 Highlights
Keyword metric from Python client
Most important keywords in the dataset or a subset (using the query param) can be retrieved from Python. This can be useful for EDA and defining programmatic labeling rules:
from argilla.metrics.commons import keywords
summary = keywords(name="example-dataset")
summary.visualize() # will plot an histogram with results
summary.data # returns the raw result data
Prepare for training for SparkNLP and spaCy text-cat
Added a new framework sparknlp
and extended the support for spacy
including text classification datasets. Check out this section of the docs
Create train and test split with prepare_for_training
You can pass train_size
and test_size
to prepare_for_training
to get train-test splits. This is especially useful for spaCy. Check out this section of the docs
Better repr for Dataset and Rule (kudos @Ankush-Chander)
When using the Python client now you get a human-readable visualization of Dataset
and Rule
entities
Changelog
All notable changes to this project will be documented in this file. See standard-version for commit guidelines.
1.3.0 (2023-02-09)
Features
- better log error handling (#2245) (66e5cce), closes #2005
- Change view mode order in sidebar (#2215) (dff1ea1), closes #2214
- Client: Expose keywords dataset metrics (#2290) (a945c5e), closes #2135
- Client: relax client constraints for rules management (#2242) (6e749b7), closes #2048
- Create a multiple contextual help component (#2255) (a35fae2), closes #1926
- Include record event_timestamp (#2156) (3992b8f), closes #1911
- updated the
prepare_for_training
methods (#2225) (e53c201), closes #2154 #2132 #2122 #2045 #1697
Bug Fixes
- Client: formatting caused offset in prediction (#2241) (d65db5a)
- Client: Log remaining data when shutdown the dataset consumer (#2269) (d78963e), closes #2189
- validate predictions fails on text2text (#2271) (f68856e), closes #2252
Visual enhancements
- Fine tune menu record card (#2240) (62148e5), closes #2224
- Rely on box-shadow to provide the secondary underline (#2283) (d786171), closes #2282 #2282
Documentation
- Add deploy on Spaces buttons (#2293) (60164a0)
- fix typo in documentation (#2296) (ab8e85e)
- Improve deployment and quickstart docs and tutorials (#2201) (075bf94), closes #2162
- More spaces! (#2309) (f02eb60)
- Remove cut-off sentence in docs codeblock (#2287) (7e87f20)
- Rephrase
to know more
intoto learn more
in Quickstart login page (#2305) (6082a26) - Replace leftover
rubrix.apikey
withargilla.apikey
(#2286) (4871127), closes #2254 #2254 - Simplify token attributions code block (#2322) (4cb6ae1)
- Tutorial buttons (#2310) (d6e02de)
- Update colab guide (#2320) (e48a7cc)
- Update HF Spaces creation image (#2314) (e4b2a04)
As always, thanks to our amazing contributors!
v1.2.1
1.2.1 (2023-01-23)
Bug Fixes
- Allow non-alphanumeric characters for login (#2207) (629499a), closes #1879
- Client: Stop using
ujson
for client actions (#2211) (920213e) - doc typos (#2203) (b353a30)
- Read statics with proper encoding (#2234) (92739bf), closes #2219
- Remove 3.9+ string methods (#2230) (4ed1ff0), closes #2192
- Remove argilla:stats in metadata filter (#2218) (a412b22), closes #2217, #2220
v1.2.0
1.2.0 (2023-01-12)
🔆 Highlights
Data labelling and curation with similarity search
Since 1.2.0 Argilla supports adding vectors to Argilla records which can then be used for finding the most similar records to a given one. This feature uses vector or semantic search combined with more traditional search (keyword and filter based).
View record info
You can now find all record details and fields which can be useful for bookmarking, copy/pasting, and making ES queries
View record timestamp
You can now see the timestamp associated with the record timestamp (event timestamp) which corresponds to the moment when the record was uploaded or a custom timestamp passed when logging the data (e.g., when the prediction was made when using it for monitoring)
Configure the base path of your Argilla UI (useful for proxies)
Features
- Allow to launch the argilla server in a different base_url (#2080) (63d624d), closes #1914 #1899
- Check es connection on startup with retries (#2141) (7a63bea)
- enable partial record update (#2118) (4ed0d95)
- Improve the
dataset_labels
metric processing (#1978) (1c3235e), closes #1818 - Include record event_timestamp (#2156) (5b75ade), closes #1911
- Include record info view and remove metadata filter (#2079) (901d45a), closes #1927 #1849
- Raw records scan endpoint (#2102) (1b63d95)
- reuse the same
httpx
async client instance (#1958) (a70cb6c), closes #1886 - Search: Allow passing raw es query in search query (#2098) (0541798)
- set record timestamp by default (#1970) (309fd9f), closes #1892
- Similarity vector search (#1768) (#1998) (32958f4), closes #1757
- UI: remove mixins to hide scroll bar in drop down (#2000) (95ad9b8), closes #1928
Bug Fixes
- #1912 hide empty menu dropdown (#1981) (d90390b)
- Avoid manipulating DOM (#1895) (6939b28), closes #1765
- catch ImportError for telemetry module (#1989) (25513b7)
- Client: check url underscore only for hostnames (#2185) (ec5726a)
- client: prevent python client response json parse error (#2186) (5549ab0)
- Compute predicted properly for token classification [REINDEX_DATASET_REF] (#1975) (a29a198), closes #1955
- Disable shortcuts for pagination when focus is on an input tag (#1995) (af07f3e), closes #1976
- Migration: Set dynamic to false for old indices (#2167) (15a18d7)
- Prevent show "No result" before data is loaded (#2014) (0799425), closes #1936
Documentation
- Add new tutorial about zeroshot sentiment analysis with GPT-3 (#2011) (d3c43ab)
- added additional explanation for datetime ranges (#2120) (c8c3dc9), closes #2119
- Adds Hugging Face Space deployment guide (#2109) (a7a47c4)
- changed DatasetForTextGeneration to DatasetForText2Text (#2090) (8cde28b), closes #2089
- Fix load docstring example (#2050) (7e2af7f), closes #1951
- fixed typo errors for terminology section (#2025) (1056736)
- include new OG image (#2017) (710ab3f)
- Include og image (#2016) (85442e4)
- Maintain menu position during navigation (#1935) (82c6e08), closes #1864
- New setfit tutorial (#2002) (43c66b2)
- Replace OG image (#2018) (894b273)
- Replace video with image (#1990) (359b637)
- reverted to correct apikey reference (#2136) (f32f2b8), closes #2074
As always, thanks to our amazing contributors!
- Add Azure deployment tutorial (#2124) by @burtenshaw
- Create training-textclassification-activelearning-with-GPU.ipynb (#2020) by @MoritzLaurer