Releases: rvankoert/loghi-htr
2.0.3
Release Notes for Loghi-HTR Version 2.0.3
Date: 2024-04-26
Overview
This release introduces a minor improvement to the error handling mechanism when dealing with text partitions.
Additional Improvements
- Improved Error Handling: The software now raises a
ValueError
if a specified partition contains no valid text lines. This update ensures that users receive a specific and actionable error message early in the processing pipeline, preventing unnecessary processing and clarifying the exact nature of the issue. This improvement is critical for those working with large and varied text datasets where partition integrity is crucial.
Contributors
- @TimKoornstra: Enhanced the error handling mechanism in this version, ensuring the software provides more accurate feedback to users during data preparation phases.
Full Changelog: 2.0.2...2.0.3
2.0.2
Release Notes for Loghi-HTR Version 2.0.2
Date: 2024-04-18
Overview
Version 2.0.2 enhances Loghi-HTR with improved TensorFlow strategy automation, API error handling, memory management, and security updates.
Major Updates
- Automated TensorFlow Distribution Strategy: Adjusts TensorFlow strategy based on the GPU count, optimizing for various hardware configurations.
- Enhanced API Error Handling:
- Issues a 400 response code for input validation errors, improving API reliability.
- Handles zero-byte images to prevent processing errors.
- Memory Management Overhaul: Addresses a memory leak in API by shifting to numpy objects for better garbage collection.
Additional Improvements
- Gunicorn Security Update: Upgraded to the latest version as per security advisory CVE-2024-1135.
- Bug Fix - assert_cardinality Warning: Resolved dataset processing warnings to ensure smoother operations.
Docker Image
The Docker image for version 2.0.2 can be obtained using:
docker pull loghi/docker.htr:2.0.2
Contributors
- @TimKoornstra: Implemented all updates and fixes in this release.
Full Changelog: 2.0.1...2.0.2
2.0.1
Release Notes for HTR Version 2.0.1
Date: 2024-04-10
Overview
Version 2.0.1 of HTR introduces critical updates to enhance model accuracy and configuration clarity. This release notably corrects a CTC loss calculation bug and updates the README to guide users on essential model configurations.
Major Updates
- CTC Loss Calculation Bug Fix: Addressed an issue affecting the accuracy of the CTC loss calculation under specific dataset and batch size conditions, ensuring more reliable model training outcomes.
Additional Improvements
- README Update for Model Configuration: The README now includes an important note on the necessity of a
config.json
file within theLOGHI_MODEL_PATH
, specifically for thechannels
key, to ensure model compatibility and optimal performance.
Contributors
- @TimKoornstra: Key contributions to bug resolution and documentation improvements.
Full Changelog: 2.0.0...2.0.1
2.0.0
Release Notes for Loghi-HTR Version 2.0.0
Date: 2024-04-04
Overview
Version 2.0.0 of Loghi-HTR marks a significant milestone in the evolution of our handwriting text recognition software. This release introduces comprehensive enhancements across the board, from data processing and model architecture to user interaction and system efficiency. Key updates include advanced visualization tools for in-depth analysis, a modular and easily navigable code structure, and a second version of our API designed for higher performance and better resource management. We've also focused on refining our GPU handling, data loading, and augmentation processes for optimized performance. Additionally, this version sees a revamp in configuration handling and logging for a more user-friendly experience, alongside the introduction of custom learning rate schedules and significant code quality improvements. Deprecated features and arguments have been carefully evaluated and updated to streamline operations and pave the way for future advancements. With version 2.0.0, users can expect a more powerful, efficient, and intuitive LoghiHTR, ready to meet the challenges of modern handwriting text recognition tasks.
Major Updates
- Modular Code Structure: Significantly improved organization with functions grouped into subfolders within the
src
directory, aiding in maintainability. - API v2:
- Improved support of
gunicorn
. This changes how the API should be started. For reference, check the example scripts in thesrc/api
directory. - API refactored for efficiency. Key enhancements include:
- Simplified queue system for faster processing.
- New
/health
and/ready
endpoints to monitor overall API and process status. - Optional user login through SimpleSecurity integration.
- Separate decoding process for better GPU utilization.
- See the updated README for detailed API changes and instructions.
- Improved support of
- Robust Logging: Streamlined logging with a more structured system, comprehensive validation logs including metric tables, and execution timers.
- Improved Configuration Handling:
- Run Loghi using a configuration file (
--config_file
) for greater flexibility. - Command-line arguments override config file settings for easy adjustments.
- Revamped
config.json
structure for improved readability.
- Run Loghi using a configuration file (
- Enhanced Visualizations:
- Time-step prediction visualizer: Highlights the top-3 most probable characters considered by the model at each time-step.
- Filter activations visualizer: Shows how convolutional layers respond to input images and random noise, enabling analysis of different model architectures.
- PDF combiner: Creates a single-sheet export of all generated visualizations.
Additional Improvements
- Custom Learning Rate Schedule: Supports warmup, exponential decay, and linear decay.
- GPU Handling Refinements
- Revamped Data Loaders and Augmentations:
- Data management classes refactored (
DataLoader
is nowDataManager
). - Data augmentations performed on the GPU for significant performance boost.
- Data management classes refactored (
- Code Quality Enhancements: Code simplifications, bug fixes, and improvements.
- User Experience Improvements: The
vis_arg_parser
aligns withloghi-htr
for a familiar command-line experience.
Deprecations (Effective May 2024)
Several arguments in LoghiHTR are being deprecated to streamline functionality and improve user experience. Here is a summary of the changes and the reasoning behind them:
-
--do_train
: Future training processes will be initiated through a more flexible method by providing atrain_list
. This change allows for a more intuitive setup for training sessions. -
--do_inference
: Inference will be activated by supplying aninference_list
, simplifying the command line interface and making it more intuitive to perform inferences. -
--use_mask
: Masking will be enabled by default, removing the need for explicit command-line toggling and reflecting the common use case directly in the application's behavior. -
--no_auto
: This argument will be removed to streamline the command line options, as auto-correction or similar functionalities will be incorporated more seamlessly into the application's logic. -
--height
: The height parameter will be inferred automatically from the VGSL specification, simplifying model configuration and ensuring consistency across model inputs. -
--channels
: Like height, the number of channels will be automatically inferred from the VGSL specification, reducing the need for manual specification and potential errors. -
--output_charlist
: The character list will be saved tooutput/charlist.txt
by default, standardizing output file locations and reducing command line clutter. -
--config_file_output
: Configuration details will be saved tooutput/config.json
by default, aligning with the standardized approach for output management. -
--thaw
: With models being saved with all layers thawed by default, this argument becomes unnecessary, simplifying model saving and loading processes. -
--existing_model
: The use of--existing_model
will be replaced by the--model
argument, streamlining the process of loading or creating models.
Additionally, we are phasing out support for the classic .pb
-style TensorFlow SavedModel format. Starting May 2024, LoghiHTR will automatically convert any old models loaded in the .pb
format to the new .keras
format. This conversion process is designed to be seamless and will save the converted model to the specified output/model-name
directory. This change aligns with our commitment to using the latest and most efficient formats, ensuring better performance and ease of use.
Docker Image
The Docker image for version 2.0.0 can be obtained using the following command:
docker pull loghi/docker.htr:2.0.0
Important Notes
- Due to the significant changes, please test your workflows thoroughly and report issues.
- We have strived for a smooth update, but some disruptions may occur. If you encounter problems, please open an issue on the project's GitHub repository.
Contributors
-
@TimKoornstra: A major force behind this release, Tim contributed to several key areas including the main refactor & organization of files, the introduction of an improved learning rate schedule, enhancements in argument handling and configuration, the development of API v2, and numerous quality of life and code quality improvements. His contributions have been instrumental in shaping the direction and capabilities of LoghiHTR 2.0.0.
-
@Thelukepet: Contributed to revamping visualization files and played a pivotal role in the V1 DataGenerator and Data Augmentation Revamp on GPU. These contributions have significantly improved data handling and model visualization capabilities.
-
@MMaas3: Made a notable first contribution by enhancing security features. This addition is crucial for the secure and reliable operation of Loghi-HTR.
Full Changelog: 1.3.12...2.0.0
1.3.12
Release Notes for Loghi-HTR Version 1.3.12
Date: 2024-03-22
Overview
Version 1.3.12 of Loghi-HTR introduces several enhancements and bug fixes to improve data loading, augmentation, model handling, and confidence score calculations.
Enhancements
-
DataLoader Improvements:
- The DataLoader now skips lines that are empty after stripping, ensuring cleaner data processing.
-
Random JPEG Augmentation Adjustments:
- The
--random_jpeg
augmentation has been adjusted to be less extreme, providing more realistic augmentations.
- The
-
Existing Model Channel Resetting:
- When using the
--existing_model
option, the channels are now always reset to ensure consistent model behavior.
- When using the
Bug Fixes
-
Confidence Score Clamping:
- Fixed a bug where confidence scores could exceed 1 due to precision errors. All confidence scores are now clamped to the range [0, 1]. A warning is logged whenever this clamping occurs.
-
SavedModel Format Conversion:
- The SavedModel format is now converted and saved to the new .keras format in the
output/model.name
directory. Starting from May 2024, the legacy format will only be usable for inference.
- The SavedModel format is now converted and saved to the new .keras format in the
Contributors
- @rvankoert: Responsible for the DataLoader improvements, random JPEG augmentation adjustments, and existing model channel resetting.
- @TimKoornstra: Responsible for the confidence score clamping and SavedModel format conversion.
Full Changelog: 1.3.8...1.3.12
1.3.8
Release Notes for Loghi-HTR Version 1.3.8
Date: 2024-01-19
Overview
Version 1.3.8 of Loghi-HTR introduces a range of new features and updates to enhance testing procedures, handle Out-of-Vocabulary (OOV) vocabulary more effectively, and improve data normalization and validation processes.
New Features
-
Enabling Test List Usage:
- Added functionality to use a
test_list
for streamlined testing procedures.
- Added functionality to use a
-
OOV Vocabulary Implementation:
- Implemented handling for Out-of-Vocabulary (OOV) words.
- Replaced [UNK] tokens with � (a less common character), enabling it to be counted as a single character in Character Error Rate (CER) calculations.
-
Outputting Results to File:
- Validation and test results can now be outputted to a .csv file in the output folder.
Enhancements
-
Data Normalization and Validation Process Updates:
- Separated validation and evaluation datasets for more precise control:
validation_dataset
: Used with the--do_validate
option, not normalized.evaluation_dataset
: Used for evaluation during training, undergoes normalization.
- Separated validation and evaluation datasets for more precise control:
-
Default OOV Token Settings:
- OOV tokens are enabled by default for testing and validation, but not for training and evaluation.
Bug Fixes
- General Stability and Performance Enhancements:
- Addressed various minor issues to improve overall system stability and performance.
Contributors
- @TimKoornstra: Responsible for the implementation of OOV vocabulary handling, test list functionality, and enhancements in data normalization and validation processes.
Full Changelog: 1.3.6...1.3.8
1.3.6
Release Notes for Loghi-HTR Version 1.3.6
Date: 2023-12-08
Overview
Version 1.3.6 of Loghi-HTR introduces new features for enhanced usability and performance, including Conda support and a Prometheus endpoint for queue monitoring. This release also contains important enhancements to the config file and a significant bug fix in the API.
New Features
- Conda Support with
environment.yml
: Anenvironment.yml
file has been added to facilitate easy setup of Conda environments. Users can initialize their environment with:
conda env create -f environment.yml
conda activate loghi-htr
- Prometheus Endpoint for Queue Monitoring: A Prometheus endpoint is now available in the API at the route
"/prometheus"
(GET method), enabling monitoring of the length of queues for better system performance insights.
Enhancements
- Config File Enhancements:
- The
url-code
containing the GitHub link (https://github.com/knaw-huc/loghi) has been added to the config file. - The
model_name
has also been included in the config file for improved model management.
Bug Fixes
- API Concurrency Bug Fix: Addressed a concurrency issue in the API where simultaneous instances might try to create the same folder, leading to potential crashes. This fix ensures more stable API operations.
Contributors
- @rvankoert: Contributed to adding the GitHub link and
model_name
to the config file. - @MMaas3: Responsible for creating the
environment.yml
and the Prometheus endpoint. - @TimKoornstra: Fixed the concurrency bug in the API.
Full changelog: 1.3.0...1.3.6
1.3.0
Release Notes for Loghi-HTR Version 1.3.0
Date: 2023-11-14
Overview
In version 1.3.0, we've introduced significant improvements, including enhanced normalization features for CER and CER lower, a simplified confidence interval, and various API enhancements. Fixes have been made to the ResidualBlock implementation and freezing mechanism, and models now automatically save in the new .keras file format. Several changes have also been made to the API to improve usability and performance.
New Features
- Normalization for CER and CER Lower: Added functionality to normalize for Character Error Rate (CER) and its lower case version using the
--normalization_file
argument. This update also displays the ground truth and prediction in a normalized form. - Simplified Confidence Interval: Introduced a more straightforward method for calculating confidence intervals.
Enhancements
- Model File Format: Models now automatically get saved in the new
.keras
file format, while still supporting loading of both.pb
and.keras
files.
Bug Fixes
- ResidualBlock Implementation Fix: Addressed an issue where saving a model and then continuing training was not working properly.
- ResidualBlock Freezing Fix: Corrected the freezing of convolutional layers in the residual blocks with
--freeze_conv_layers
.
API Specific Changes
- Environment Variable Simplifications: Removed the necessity of
LOGHI_INPUT_CHANNELS
andLOGHI_CHARLIST_PATH
environment variables, which are now read directly from the model'sconfig.json
andcharlist.txt
respectively. - Reduced OOM Errors: Enhanced batch processing to split recursively on Out-Of-Memory (OOM) errors, failing only the problematic image instead of the entire batch.
- Improved Image Padding: Adjusted image padding during processing for better alignment with training, marginally improving confidence and output.
- Dynamic Model Switching in API: Introduced the ability to switch models during an API call using the "model" field, though it's advised to use caution as it can slow down inference.
- Error Output for Failed Predictions: Text line images that fail during prediction are now outputted to
LOGHI_OUTPUT_PATH/group_id/identifier.error
with the specific error message.
Contributors
- @Thelukepet: Major contributions to normalization for CER and CER lower, and the simplified confidence interval.
- @TimKoornstra: Significant contributions across various aspects including bug fixes, API enhancements, and overall improvements.
Full changelog: 1.2.10...1.3.0
1.2.10
Release Notes for Loghi-HTR Version 1.2.3 -> 1.2.10
Date: 2023-10-27
Overview
Version 1.2.10 brings a suite of bug fixes, API improvements, and functional enhancements. This update focuses on increased flexibility in model training, improved normalization processes, added image processing augmentations, and enhanced multi-GPU support in the API.
New Features
-
Normalization File: Introduced the
--normalization_file
argument, replacing--normalize
. Characters to be replaced can now be specified in a JSON file, where the key is the character to replace, and the value is the replacement character. This facilitates training models with a focus on reducing or changing uncommon characters to more common, similar ones.
Example:{ "ø": "o", "æ": "ae" }
-
Sauvola and Otsu Binarizations: Reintroduced these methods for image preprocessing.
-
New Image Augmentations: Added blur (
--do_blur
) and invert (--do_invert
) augmentations. -
Silent Training: Added
--training_verbosity_mode
with options[auto, 0, 1, 2]
for controlling training output verbosity.
API Enhancements
- Multi-GPU Support: Enhanced to support multiple GPUs.
- Improved Logging: Added process IDs for easier debugging and enhanced overall logging.
- GPU Usage: Improved GPU usage handling, including the use of mixed_float16 policy if supported.
- Image Preparation: Moved all image preparation tasks to a dedicated worker.
- Batch Queue Size: Adjusted the prepared max queue size to reflect batches instead of individual images.
- API Response Codes: Updated to align more closely with Laypa and Tooling standards.
Bug Fixes
- Freezing Layers Capitalization: Fixed an issue where capitalization was ignored when setting layers to non-trainable.
- Charlist Inference: The
charlist
is no longer required when replacing the final layer; it can now be inferred from texts. - Normalization in Multi-Character Spaces: When a normalization file is used, all multi-character spaces are now replaced with a single space.
- Config.json githash: Ensured that
githash
in theconfig.json
file does not contain spaces. - Existing Model Naming: Fixed an issue where setting a new name for an existing model using
--existing_model
and--model_name
didn't work as expected. - Random Shear in Augmentation: Corrected an issue where the random shear augmentation incorrectly applied elastic transform.
- Prediction Padding Value: Resolved a bug in prediction due to an incorrect padding value.
- Multiprocessing with Gunicorn/Flask: Moved multiprocessing outside of the gunicorn/flask apps.
Docker Image
The Docker image for version 1.2.10 can be obtained using the following command:
docker pull loghi/docker.htr:1.2.10
Contributors
- @MMaas3: Key contributions in reintroducing binarizations, adding extra augmentations, and fixing the githash issue in the
config.json
file. - @rvankoert: Significant work in addressing the capitalization issue for freezing layers and the charlist inference.
- @TimKoornstra: Major contributions across various aspects including enhancements, bug fixes, and general improvements in the application.
Full Changelog: 1.2.3...1.2.10
1.2.3
Release Notes for Loghi-HTR Version 1.2.3
Date: 2023-10-02
Overview
This release introduces significant refactoring, added VGSL Spec functionality, unit tests, and upgraded dependencies. It also brings improvements to the documentation and cleanups in the file structure.
New Features
- VGSL Spec Implementation: Added VGSL spec-type model creation functionality.
- Unit Tests: Introduced unittests for the most-used classes and additional functionalities.
- Custom VGSL Model Names: Models can now be named using the
--model_name
argument. - VGSL Spec String Conversion: Models can now be converted to a VGSL spec string using the
VGSLSpecGenerator.model_to_string(model)
function.
Enhancements
- Improved Documentation: Enhanced the
README.md
with a FAQ section, API usage section, and more accurate and consistent information. - GitHub Actions Integration: Setup GitHub Actions to run unittests on push/pull requests.
- Confidence Interval: Added a 95% confidence interval feature.
- Dependency Upgrades: Upgraded various dependencies and set Python requirement to > 3.8.
Deprecations and Removals
- Old Model Creation Method Deprecated: The previous method of creating models is no longer supported. Models must now be created either from the model library or using a VGSL spec string. This change facilitates rapid and structured prototyping among other benefits. For details on transitioning from the old to the new method, see the "Upgrading" section of these release notes.
- File Cleanup: Removed many unused files and arguments.
- Requirements.txt Moved: Moved
requirements.txt
out of thesrc
folder.
Upgrading
- New models must now be created with either a VGSL spec string, or "model" + the version number corresponding to the "old" way of creating models. For example,
--model new17
becomes--model model17
. - Old models can be converted to VGSL spec by running the following command:
This command will be helpful for users who have old models and want to convert them to the new VGSL spec format, making the upgrading process smoother.
python3 src/vgsl_spec_generator.py --model_dir /path/to/model/dir/
Docker Image
The Docker image for version 1.2.3 can be obtained using the following command:
docker pull loghi/docker.htr:1.2.3
Contributors
- @TimKoornstra: Major contributions including refactoring, VGSL Spec implementation, documentation improvements, and more.
- @Thelukepet: Significant contributions including writing most of the VGSL spec string, updating the corresponding README documentation, file cleanup, and implementing the 95% confidence interval feature.