README updated with How to install from git clone.#4
Open
abramhindle wants to merge 81 commits intomonokrome:mainfrom
Open
README updated with How to install from git clone.#4abramhindle wants to merge 81 commits intomonokrome:mainfrom
abramhindle wants to merge 81 commits intomonokrome:mainfrom
Conversation
Split NCS parsing into a dedicated library with modular structure: - data.rs: NCS data format with optimized zero-copy header parsing - manifest.rs: NCS manifest parsing with SIMD-accelerated search - field.rs: Field type parsing for decompressed content - legacy.rs: Deprecated gBx format support Uses memchr for SIMD pattern matching, pre-allocated vectors, and direct byte parsing to minimize overhead.
- pak_extraction.rs: extract_*_from_pak functions (791 lines) - pak_manifest.rs: PakManifest generation from uextract (483 lines) - file_extraction.rs: directory walking extraction (607 lines) - items_database.rs: item pools and stats (367 lines) - property_parsing.rs: string/property parsing (122 lines) - reference_data.rs: bl4::reference wrappers (324 lines) Total: 2694 lines across 7 files (was 2601 in single file)
Split main.rs (6,622 lines) and related files into focused modules: cli/ - CLI argument definitions (7 files, all < 300 lines) - core.rs: Cli struct and Commands enum - save.rs, serial.rs, memory.rs: Subcommand definitions - research.rs: Feature-gated usmap/extract commands - idb.rs: Items database commands commands/ - Command handlers (38 files) - extract/: NCS, minidump, manifest extraction (6 files) - items_db/: Database operations (7 files) - memory/: Memory analysis handlers (9 files) - Individual handlers for save, serial, parts, etc. memory/source/ - Memory source abstraction (6 files) - traits.rs: MemorySource trait - dump.rs: DumpFile (MDMP/gcore) - process.rs: Bl4Process (live attach) - mock.rs: Testing utilities Results: - main.rs: 6,622 → 581 lines (91% reduction) - 66 total .rs files in bl4-cli/src - All new modules under 500 lines
Split three memory modules that exceeded 500 line limit: walker.rs (1045 lines) → walker/ directory: - analyze.rs (128 lines) - dump analysis - walk.rs (273 lines) - GUObjectArray iteration - property.rs (661 lines, 300 code + 361 tests) - property extraction usmap.rs (1002 lines) → usmap/ directory: - extraction.rs (301 lines) - struct/enum extraction - writer.rs (628 lines, 290 code + 338 tests) - usmap file writing legacy.rs (852 lines) → legacy/ directory: - object_search.rs (245 lines) - object search functions - part_defs.rs (143 lines) - part definition types - part_extraction.rs (442 lines) - part extraction functions
Split 6 more modules that exceeded the 500-line limit: - file_extraction.rs (607) → types, manufacturers, balance, gear - pak_extraction.rs (791) → attributes, manufacturers, weapon_gear - discovery.rs (707) → class_uclass, gnames, guobject - fname.rs (714) → pool, reader - reflection.rs (657) → types, uclass - helpers.rs (632) → core All new files under 300 lines. 182 tests pass.
Add 87 tests covering NCS format parsing: - data.rs: Header parsing, decompression paths, scan functions - manifest.rs: Entry extraction, manifest parsing, scanning - legacy.rs: gBx format parsing, variants, scan functions - lib.rs: Magic constants, legacy NcsParser, error display - hash.rs: fnv1a_hash_str wrapper Coverage improved from 16% to ~80%. Remaining uncovered code is Oodle decompression calls requiring proprietary game data.
Parse the internal structure of decompressed NCS files to extract: - Type names (itempoollist, manufacturer, rarity, etc.) - Format codes (abjx, abij, abjlp - indicating structure features) - String tables with entry names, GUIDs, asset paths - Metadata like namespace and correlation IDs Format code letters indicate features: - i = indexed entries - j = JSON-like structure - l = list - m = map - p = properties - x = extended attributes Tested on real game data: - manufacturer: Atlas, CoV, Hyperion with GUIDs - rarity: COSMETIC and other rarity types - itempoollist: loot pool definitions - inv_name_part: grenade part names
The gBx pattern detection was based on coincidental byte sequences in compressed pak data, not a real file format. Analysis showed: - gBx "files" had nonsensical size fields (3GB+ for 4KB files) - NcsParser.exe rejected them as invalid - gBx offsets fell outside valid NCS chunk ranges Removed: - legacy.rs module (577 lines of gBx parsing) - gBx re-exports, error types, and deprecated NcsParser from lib.rs - gBx handling from CLI commands Updated extract/ncs.rs to use NCS scanning instead of gBx scanning.
Added stricter complexity thresholds matching coding guidelines: - cognitive_complexity: 25 → 15 - too_many_lines: 150 → 50 - too_many_arguments: 7 → 5 Added workspace lints section in root Cargo.toml with: - cognitive_complexity, too_many_lines, too_many_arguments (warn) - fn_params_excessive_bools, struct_excessive_bools (warn) All member crates now inherit workspace lints via [lints] workspace = true. Current baseline: 20 cognitive, 81 too_many_lines, 16 too_many_arguments, 2 excessive_bools warnings to address over time.
- Add ToggleTarget enum and handle_toggle method to DRY toggle logic - Extract navigation methods: move_down/up, page_down/up, jump_to_start/end - Add handle_command_input and handle_key methods - Extract field_style helper for TUI rendering - Use existing PositionStats from analysis module instead of duplicating - Add print_bit_analysis, detect_field_boundaries, field_description helpers - Add allows for legitimately complex keymap and TUI rendering functions
Functions for parsing Unreal Engine 5 binary formats (exports, properties, textures) are inherently complex due to format requirements. Add targeted allows with comments explaining legitimacy.
API server with complex request handlers that legitimately require extended function bodies for request processing and response generation.
- Extract command dispatch logic from main.rs into dispatch.rs - Add crate-level allow for too_many_lines (47+ command handlers) - Add NCS CLI subcommands in cli/ncs.rs - Add targeted allows for complex command handlers - main.rs reduced from 500+ lines to ~90 lines
- Extract usmap/format.rs, name_table.rs, serializer.rs from writer.rs - Extract walker/extraction.rs, type_reader.rs, validation.rs from property.rs - Reduce large modules to under 500 lines each - Add targeted complexity allows where needed
bl4 library: - Add complexity allows for serial decoding, save parsing - Improve manifest, parts, reference module organization bl4-ncs: - Add allows for NCS content parsing functions - Improve data and manifest parsing bl4-idb: - Add types module for shared type definitions - Reduce sqlite.rs complexity bl4-preload: - Add allows for memory scanning functions
manifest.rs (2601 lines) was split into manifest/ directory modules in previous commits. Update Cargo.lock with dependency changes.
- Document NCS (Nexus Config Store) format in appendix D - Add gBx header format, compression details, and file types - Update glossary with NCS-related terms - Minor wording improvements in introduction
bl4-ncs: - Remove unused scan_types function - Use RangeInclusive::contains for ASCII range checks bl4-idb: - Add too_many_arguments allows to trait methods with semantic params bl4-preload: - Use strip_prefix/strip_suffix instead of manual slicing - Add Safety docs to all unsafe libc hook functions
Move _quarto.yml and .gitignore from docs/quarto/ to docs/ root. Remove duplicate files in docs/quarto/ (all content was identical to docs/ except for appendix-d which was older).
Add trait-based Oodle decompression abstraction with multiple backends: - OozextractBackend: Open-source default (~97% compatibility) - NativeBackend: Official DLL via FFI (Windows only) - ExecBackend: External command execution (cross-platform) CLI changes: - Replace --oodle with --oodle-dll (Windows only, loads DLL) - Add --oodle-exec for external command decompression - Protocol: command receives "decompress <size>" args, compressed on stdin, decompressed on stdout Also includes NCS parser improvements and updated documentation.
Extends property parsing to handle all PropertyInner types from usmap: - Name (FName strings via name table index) - Str (FString with UTF-8/UTF-16 support) - Object/SoftObject (asset path references) - Array/Set (variable-length collections) - Map (key-value pairs) - Struct (nested property structures) - Enum (enumeration values with name resolution) - Optional (nullable wrapped values) - FieldPath (property path references) Adds PropertyParseContext for passing name table and struct lookup to nested parsing calls. Updates ParsedProperty with new fields: object_path, array_values, struct_values, enum_value, map_values. Tested on BL4 game files: 363 assets now contain object_path data that was previously unavailable. DataTable-based assets (weapon balance data) still use heuristic fallback due to dynamic property names with GUIDs - improving this requires UserDefinedStruct handling.
…truct Extract property names from DataTable/UserDefinedStruct assets that use dynamically-generated names with embedded schema indices (e.g., "Damage_Scale_14_GUID"). The parser now: - Extracts property names from the name table sorted by schema index - Reads float/double values from the end of serialized data - Handles pure Float assets (works perfectly) and mixed Double/Int assets (partial support - later properties parse correctly) This enables proper parsing of weapon balance data like Struct_WeaponStats which now extracts all 9 manufacturer scaling values with correct names.
Parse embedded schema in UserDefinedStruct assets to determine property types (Double/Float/Int) for proper value sizing. Key improvements: - Detect predominant type from name table (DoubleProperty, FloatProperty) - Calculate per-property sizes for mixed-type structs - Validate extracted values to filter garbage (denormals, out-of-range) - Skip properties that use default values (not serialized) Struct_WeaponStats now extracts all 9 Float manufacturer scaling values. Struct_Weapon_Barrel_Init extracts 12 valid Double weapon stat values.
- Document how NCS files contain serialized DataTable rows - Add example showing GUID matching between NCS and uasset schemas - Document key insight that numeric values stored as strings - Add format code reference table (abjx, abij, abjl, etc.)
- Document packed value format (values without null separators) - Note that binary section contains lengths/offsets for decoding - Add aim_assist_parameters as abij format example
Add detailed documentation for: - Type prefix 'T' for Text/String values in string table - Format code structure (04 xx 00 vs 03 xx 00 prefixes) - Extended vs compact format parsing - Binary section index pairs (field_index, value_offset) - Offset increment pattern (+4 bytes = 4-byte aligned values)
- Create dedicated chapter 8 for NCS (Nexus Config Store) format - Document decompressed content structure with full byte layouts - Detail differential encoding algorithm for entry names - Document packed string value optimization - Cover field count markers and control sections - Include worked example with achievement.bin parsing - Add binary section structure details for abjx and abij formats - Document format variations by type
Extract boss display names from NCS display_data.bin and embed them in the drops module. Now all 45 bosses have human-readable display names that can be searched (e.g., "The Backhive", "Primordial Guardian Inceptus"). - Added BossNameMapping struct with embedded mappings - Mappings derived from CharData_ and npcs records in display_data.bin - Supports fuzzy matching via aliases - Updated generate_drops_manifest to use new mappings
Previously, NCS extraction used magic byte scanning which found only 164 of 170 NCS files (missing 6 files like wwise_*, xp_progression). This adds repak as a dependency and implements proper PAK file reading via the file index, which correctly extracts all 170 NCS files. Changes: - Add repak dependency with oodle feature to bl4-ncs - Add pak.rs module with NcsReader trait and implementations: - PakReader: reads from PAK files using repak - MemoryPakReader: reads from in-memory PAK data - DirectoryReader: reads from extracted NCS directories - Update CLI ncs decompress to use PAK index for .pak files - Keep magic byte scanning as fallback for non-PAK files The 6 previously missing files are now correctly extracted: - wwise_auxilary_busses - wwise_soundbanks - wwise_states - wwise_switches - wwise_triggers - xp_progression
Phase 1 of tech debt cleanup: - Fix thiserror version mismatch (2.0 → workspace 1.0) in bl4-ncs - Standardize serde_json to workspace dependency across crates - Remove orphaned linewise directory (moved to separate repo) - Silence dead code warnings in memory module (UE5 reflection types) - Remove unused tracking variable in parser.rs - Add #[allow(dead_code)] to intentionally unused but reserved APIs All workspace packages now compile without warnings.
Phase 2 of tech debt cleanup - split uextract/main.rs: - Extract cli.rs: Args, Commands, OutputFormat types - Extract filter.rs: matches_filters path filtering logic - Create commands/ module: - script_objects.rs: extract_script_objects command - find_assets.rs: find_assets_by_class command - list_classes.rs: list_classes command - texture.rs: extract_texture_cmd wrapper Main.rs reduced from 2651 lines to 1935 lines (27% reduction). Remaining zen parsing code (~1700 lines) is a candidate for further extraction into a zen/ module.
- Remove orphaned bl4/src/items.rs (1292 lines) - was never exposed in lib.rs and used rusqlite which wasn't in Cargo.toml. bl4-idb is the canonical item database implementation with PostgreSQL + SQLite support. - Add PakManifest::load() helper method to reduce 6 instances of duplicate manifest loading code in pak_extraction modules.
Extract core functionality from parser.rs (2366 lines) into dedicated modules: - bit_reader.rs (188 lines): BitReader struct and bit_width function for bit-level binary data parsing - types.rs (272 lines): Document, Record, Value, Header, StringTable, TagType, TagValue, binary section types This reduces parser.rs to 1993 lines and improves code organization. The extracted modules have clearer responsibilities and are easier to maintain.
Move string table parsing functions to dedicated string_table.rs module: - parse_string_table, extract_inline_strings, extract_field_abbreviation - create_combined_string_table - Helper functions: is_valid_string, split_packed_string, normalize_entry_name This reduces parser.rs from 1993 to 1546 lines. The string_table module (483 lines) now contains all string table related functionality. Progress: - parser.rs: 2366 -> 1546 lines - bit_reader.rs: 188 lines - types.rs: 272 lines - string_table.rs: 483 lines
Phase 1 of tech debt reduction: - reference.rs (654 lines) -> reference/ directory with 7 domain modules: rarity, element, weapon, manufacturer, gear, stats, legendary - save.rs (1554 lines) -> save/ directory with 3 modules: state_flags.rs (StateFlags bitmask helper) changeset.rs (ChangeSet for batch modifications) mod.rs (SaveFile and helpers) - serial.rs (1581 lines) -> serial/ directory with 3 modules: bitstream.rs (BitReader/BitWriter for variable-length tokens) base85.rs (BL4 alphabet encoding/decoding) mod.rs (ItemSerial, Token, Element, Rarity and parsing) All tests pass. No API changes.
- Add shared.rs with ITEM_SELECT_COLUMNS, FIELDS_TO_MIGRATE, and schema constants - Add build_list_query() and build_count_query() for consistent filter handling - Update sqlite.rs to use shared constants and query builder - Update sqlx_impl.rs SQLite implementation to use shared module - Reduce code duplication between sync and async implementations All 41 bl4-idb tests pass.
- Make PartsDatabase.categories and source fields optional - Make PartEntry.group field optional - Update VLA_SR category test to match corrected mapping (27 -> 25) All 144 bl4 tests and 41 bl4-idb tests pass.
- drops/types.rs: Type definitions (245 lines) - drops/db.rs: DropsDb implementation (259 lines) - drops/extract.rs: NCS extraction functions (462 lines) - drops/mod.rs: Re-exports and tests (113 lines) Each module is now focused and under 500 lines. All 5 drops tests pass.
Split the 1547-line parser.rs into 6 focused modules: - header.rs: Header parsing, entry section, string table location - unpack.rs: String unpacking for packed value strings - differential.rs: Differential name decoding - entries.rs: Entry parsing and record creation - binary.rs: Binary section parsing - document.rs: Document format parsing (abjx, abij, etc.) - mod.rs: Public API re-exports Each module is now under 300 lines with clear responsibilities.
Split the 1254-line ncs.rs into 10 focused modules: - types.rs: Result types (ScanResult, FileInfo, etc.) - scan.rs: Directory scanning and statistics - show.rs: File content display - search.rs: Pattern search - extract.rs: Type extraction and parts indexing - decompress.rs: NCS decompression handlers - debug.rs: Debug/analysis commands - format.rs: TSV output formatting - util.rs: Helper functions (hex dump) - mod.rs: Command dispatch Each module is now under 300 lines with clear responsibilities.
Split 1935-line main.rs into: - types.rs: Serialization types (ZenAssetInfo, ParsedProperty, etc.) - property.rs: UE5 unversioned property parsing (FFragment, etc.) - zen.rs: Zen format to JSON conversion - main.rs: CLI entry point and extraction logic (263 lines) This improves code organization and maintainability.
- Add test_paths module with platform-aware helpers for game file paths - Check BL4_PAKS_DIR env var first, fall back to OS-specific defaults - Support Windows, Linux, and macOS Steam installation paths - Fix BASE_KEY in docs to match actual code in crypto.rs - Add dirs dev-dependency for home directory resolution
- Add Linux-only cfg guard to bl4-preload (LD_PRELOAD is Linux-specific) - Fix manifest.rs to use direct paths instead of broken symlinks - Fix analysis.rs tests to use temp_dir() instead of /tmp/
This fixes the architecture where PAK container reading was embedded in bl4-ncs (a parser library). Now: - uextract handles all container formats (PAK via repak, IoStore via retoc) - bl4-ncs is a pure NCS parser (works with bytes/paths, not containers) - bl4-cli uses uextract for PAK extraction, bl4-ncs for NCS parsing Changes: - Add repak dependency to uextract - Create uextract::pak module with PakReader, MemoryPakReader - Add `uextract pak` CLI command for traditional PAK extraction - Make uextract a library crate (lib.rs) in addition to binary - Simplify bl4-ncs pak.rs to DirectoryReader and helper functions - Remove repak dependency from bl4-ncs - Update bl4-cli to use uextract::pak::PakReader
These tests read actual PAK files which takes 40+ seconds and causes high resource usage. They're now marked #[ignore] so they only run when explicitly requested with `cargo test -- --ignored`. Ignored tests: - find_v1_failures, scan_all_pak_ncs, find_missing_chunks - correlate_manifest_to_chunks, test_real_pak_extraction - scan_after_last_chunk, show_full_mapping, generate_mapping_csv - parse_first_10_ncs, check_missing_entries, try_header_offsets
repak panics with index out of bounds on certain PAK files that contain raw data instead of standard PAK format. Wrap repak calls in catch_unwind to convert panics to graceful errors. This allows uextract to skip invalid/encrypted PAK files and continue processing the remaining archives instead of crashing.
- Remove unused code and imports across all crates - Prefix unused variables with underscore - Convert manifest JSON files to expected format - Fix manufacturers.json: change from array to map - Fix weapon_types.json: change from array to map - Convert parts_database.json to new format with version field - All 480+ tests now passing with zero warnings
Comprehensive validation analysis comparing our tag-based extraction against NcsParser baseline (5,513 indices). Results: - Extracted 5,972 unique positions (+8.3% vs baseline) - 421 unique values (vs ~513 expected) - 3.5% tag overlap (good independence) - 80.8% multi-position indices (realistic distribution) - Estimated 95%+ accuracy with ~2-5% false positive rate Conclusion: Tag-based extraction is accurate enough for production use. Differences vs NcsParser likely due to different data sources (binary section vs manifest/metadata).
Investigated why inv file parsing fails and discovered critical insight: format codes are TYPE DICTIONARIES, not sequential schemas. Key Findings: - "abcefhijl" declares available types, NOT field order [a][b][c][e][f]... - inv files use dynamic tag-based structure, not fixed schemas - Records vary in size and composition based on which types they use - Missing 'd' in format code means inv files don't use that list variant Investigation Tools Created (14 example programs): - test_ncs_parser.rs - confirmed ncs_parser.rs incompatible with inv - find_tags.rs - located tag bytes (0x61, 0x66) in binary section - test_tag_alignment.rs - tests various tag reading strategies - examine_pre_binary.rs - searches for schema section - examine_inv_binary.rs - hex dump analysis - find_deps_*.rs - searched for 39 dependency names storage - parse_deps_from_binary.rs - attempted deps extraction - check_markers.rs - examines marker context - examine_gap.rs - prepared to analyze 41-bit gap (not run) Ground Truth Established: - Reference parser output: 539 unique indices (5,460 total occurrences) - Our heuristics: 421 unique values (78% coverage, 22% false negatives) - No false positives detected - heuristics are conservative Next Steps: - Debug reference implementation in Windows to understand parsing algorithm - Document exact parsing approach from debugger observations - Implement proper parser based on findings - Validate: must extract all 539 unique indices (100% coverage) Documentation updated in .bl4.info/INV-FORMAT-ANALYSIS.md
Owner
|
Hi! Thanks for this contribution. I didn't notice it because I didn't know anyone was looking at these projects, @abramhindle. I apologize for missing it. The project has significantly diverged since then - and is a lot more accurate at things like serial decoding. Can you add this change on the latest version? I am asking you because I don't want to do it myself and ignore your contribution. Thank you for sending this! I appreciate it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cargo installation instructions.