Skip to content

README updated with How to install from git clone.#4

Open
abramhindle wants to merge 81 commits intomonokrome:mainfrom
abramhindle:readmeupdate
Open

README updated with How to install from git clone.#4
abramhindle wants to merge 81 commits intomonokrome:mainfrom
abramhindle:readmeupdate

Conversation

@abramhindle
Copy link

Cargo installation instructions.

Split NCS parsing into a dedicated library with modular structure:
- data.rs: NCS data format with optimized zero-copy header parsing
- manifest.rs: NCS manifest parsing with SIMD-accelerated search
- field.rs: Field type parsing for decompressed content
- legacy.rs: Deprecated gBx format support

Uses memchr for SIMD pattern matching, pre-allocated vectors,
and direct byte parsing to minimize overhead.
- pak_extraction.rs: extract_*_from_pak functions (791 lines)
- pak_manifest.rs: PakManifest generation from uextract (483 lines)
- file_extraction.rs: directory walking extraction (607 lines)
- items_database.rs: item pools and stats (367 lines)
- property_parsing.rs: string/property parsing (122 lines)
- reference_data.rs: bl4::reference wrappers (324 lines)

Total: 2694 lines across 7 files (was 2601 in single file)
Split main.rs (6,622 lines) and related files into focused modules:

cli/ - CLI argument definitions (7 files, all < 300 lines)
  - core.rs: Cli struct and Commands enum
  - save.rs, serial.rs, memory.rs: Subcommand definitions
  - research.rs: Feature-gated usmap/extract commands
  - idb.rs: Items database commands

commands/ - Command handlers (38 files)
  - extract/: NCS, minidump, manifest extraction (6 files)
  - items_db/: Database operations (7 files)
  - memory/: Memory analysis handlers (9 files)
  - Individual handlers for save, serial, parts, etc.

memory/source/ - Memory source abstraction (6 files)
  - traits.rs: MemorySource trait
  - dump.rs: DumpFile (MDMP/gcore)
  - process.rs: Bl4Process (live attach)
  - mock.rs: Testing utilities

Results:
- main.rs: 6,622 → 581 lines (91% reduction)
- 66 total .rs files in bl4-cli/src
- All new modules under 500 lines
Split three memory modules that exceeded 500 line limit:

walker.rs (1045 lines) → walker/ directory:
- analyze.rs (128 lines) - dump analysis
- walk.rs (273 lines) - GUObjectArray iteration
- property.rs (661 lines, 300 code + 361 tests) - property extraction

usmap.rs (1002 lines) → usmap/ directory:
- extraction.rs (301 lines) - struct/enum extraction
- writer.rs (628 lines, 290 code + 338 tests) - usmap file writing

legacy.rs (852 lines) → legacy/ directory:
- object_search.rs (245 lines) - object search functions
- part_defs.rs (143 lines) - part definition types
- part_extraction.rs (442 lines) - part extraction functions
Split 6 more modules that exceeded the 500-line limit:

- file_extraction.rs (607) → types, manufacturers, balance, gear
- pak_extraction.rs (791) → attributes, manufacturers, weapon_gear
- discovery.rs (707) → class_uclass, gnames, guobject
- fname.rs (714) → pool, reader
- reflection.rs (657) → types, uclass
- helpers.rs (632) → core

All new files under 300 lines. 182 tests pass.
Add 87 tests covering NCS format parsing:
- data.rs: Header parsing, decompression paths, scan functions
- manifest.rs: Entry extraction, manifest parsing, scanning
- legacy.rs: gBx format parsing, variants, scan functions
- lib.rs: Magic constants, legacy NcsParser, error display
- hash.rs: fnv1a_hash_str wrapper

Coverage improved from 16% to ~80%. Remaining uncovered code
is Oodle decompression calls requiring proprietary game data.
Parse the internal structure of decompressed NCS files to extract:
- Type names (itempoollist, manufacturer, rarity, etc.)
- Format codes (abjx, abij, abjlp - indicating structure features)
- String tables with entry names, GUIDs, asset paths
- Metadata like namespace and correlation IDs

Format code letters indicate features:
- i = indexed entries
- j = JSON-like structure
- l = list
- m = map
- p = properties
- x = extended attributes

Tested on real game data:
- manufacturer: Atlas, CoV, Hyperion with GUIDs
- rarity: COSMETIC and other rarity types
- itempoollist: loot pool definitions
- inv_name_part: grenade part names
The gBx pattern detection was based on coincidental byte sequences
in compressed pak data, not a real file format. Analysis showed:
- gBx "files" had nonsensical size fields (3GB+ for 4KB files)
- NcsParser.exe rejected them as invalid
- gBx offsets fell outside valid NCS chunk ranges

Removed:
- legacy.rs module (577 lines of gBx parsing)
- gBx re-exports, error types, and deprecated NcsParser from lib.rs
- gBx handling from CLI commands

Updated extract/ncs.rs to use NCS scanning instead of gBx scanning.
Added stricter complexity thresholds matching coding guidelines:
- cognitive_complexity: 25 → 15
- too_many_lines: 150 → 50
- too_many_arguments: 7 → 5

Added workspace lints section in root Cargo.toml with:
- cognitive_complexity, too_many_lines, too_many_arguments (warn)
- fn_params_excessive_bools, struct_excessive_bools (warn)

All member crates now inherit workspace lints via [lints] workspace = true.

Current baseline: 20 cognitive, 81 too_many_lines, 16 too_many_arguments,
2 excessive_bools warnings to address over time.
- Add ToggleTarget enum and handle_toggle method to DRY toggle logic
- Extract navigation methods: move_down/up, page_down/up, jump_to_start/end
- Add handle_command_input and handle_key methods
- Extract field_style helper for TUI rendering
- Use existing PositionStats from analysis module instead of duplicating
- Add print_bit_analysis, detect_field_boundaries, field_description helpers
- Add allows for legitimately complex keymap and TUI rendering functions
Functions for parsing Unreal Engine 5 binary formats (exports,
properties, textures) are inherently complex due to format requirements.
Add targeted allows with comments explaining legitimacy.
API server with complex request handlers that legitimately require
extended function bodies for request processing and response generation.
- Extract command dispatch logic from main.rs into dispatch.rs
- Add crate-level allow for too_many_lines (47+ command handlers)
- Add NCS CLI subcommands in cli/ncs.rs
- Add targeted allows for complex command handlers
- main.rs reduced from 500+ lines to ~90 lines
- Extract usmap/format.rs, name_table.rs, serializer.rs from writer.rs
- Extract walker/extraction.rs, type_reader.rs, validation.rs from property.rs
- Reduce large modules to under 500 lines each
- Add targeted complexity allows where needed
bl4 library:
- Add complexity allows for serial decoding, save parsing
- Improve manifest, parts, reference module organization

bl4-ncs:
- Add allows for NCS content parsing functions
- Improve data and manifest parsing

bl4-idb:
- Add types module for shared type definitions
- Reduce sqlite.rs complexity

bl4-preload:
- Add allows for memory scanning functions
manifest.rs (2601 lines) was split into manifest/ directory modules
in previous commits. Update Cargo.lock with dependency changes.
- Document NCS (Nexus Config Store) format in appendix D
- Add gBx header format, compression details, and file types
- Update glossary with NCS-related terms
- Minor wording improvements in introduction
bl4-ncs:
- Remove unused scan_types function
- Use RangeInclusive::contains for ASCII range checks

bl4-idb:
- Add too_many_arguments allows to trait methods with semantic params

bl4-preload:
- Use strip_prefix/strip_suffix instead of manual slicing
- Add Safety docs to all unsafe libc hook functions
Move _quarto.yml and .gitignore from docs/quarto/ to docs/ root.
Remove duplicate files in docs/quarto/ (all content was identical
to docs/ except for appendix-d which was older).
Add trait-based Oodle decompression abstraction with multiple backends:
- OozextractBackend: Open-source default (~97% compatibility)
- NativeBackend: Official DLL via FFI (Windows only)
- ExecBackend: External command execution (cross-platform)

CLI changes:
- Replace --oodle with --oodle-dll (Windows only, loads DLL)
- Add --oodle-exec for external command decompression
- Protocol: command receives "decompress <size>" args, compressed on
  stdin, decompressed on stdout

Also includes NCS parser improvements and updated documentation.
Extends property parsing to handle all PropertyInner types from usmap:

- Name (FName strings via name table index)
- Str (FString with UTF-8/UTF-16 support)
- Object/SoftObject (asset path references)
- Array/Set (variable-length collections)
- Map (key-value pairs)
- Struct (nested property structures)
- Enum (enumeration values with name resolution)
- Optional (nullable wrapped values)
- FieldPath (property path references)

Adds PropertyParseContext for passing name table and struct lookup
to nested parsing calls. Updates ParsedProperty with new fields:
object_path, array_values, struct_values, enum_value, map_values.

Tested on BL4 game files: 363 assets now contain object_path data
that was previously unavailable. DataTable-based assets (weapon
balance data) still use heuristic fallback due to dynamic property
names with GUIDs - improving this requires UserDefinedStruct handling.
…truct

Extract property names from DataTable/UserDefinedStruct assets that use
dynamically-generated names with embedded schema indices (e.g.,
"Damage_Scale_14_GUID"). The parser now:

- Extracts property names from the name table sorted by schema index
- Reads float/double values from the end of serialized data
- Handles pure Float assets (works perfectly) and mixed Double/Int
  assets (partial support - later properties parse correctly)

This enables proper parsing of weapon balance data like Struct_WeaponStats
which now extracts all 9 manufacturer scaling values with correct names.
Parse embedded schema in UserDefinedStruct assets to determine property
types (Double/Float/Int) for proper value sizing. Key improvements:

- Detect predominant type from name table (DoubleProperty, FloatProperty)
- Calculate per-property sizes for mixed-type structs
- Validate extracted values to filter garbage (denormals, out-of-range)
- Skip properties that use default values (not serialized)

Struct_WeaponStats now extracts all 9 Float manufacturer scaling values.
Struct_Weapon_Barrel_Init extracts 12 valid Double weapon stat values.
- Document how NCS files contain serialized DataTable rows
- Add example showing GUID matching between NCS and uasset schemas
- Document key insight that numeric values stored as strings
- Add format code reference table (abjx, abij, abjl, etc.)
- Document packed value format (values without null separators)
- Note that binary section contains lengths/offsets for decoding
- Add aim_assist_parameters as abij format example
Add detailed documentation for:
- Type prefix 'T' for Text/String values in string table
- Format code structure (04 xx 00 vs 03 xx 00 prefixes)
- Extended vs compact format parsing
- Binary section index pairs (field_index, value_offset)
- Offset increment pattern (+4 bytes = 4-byte aligned values)
- Create dedicated chapter 8 for NCS (Nexus Config Store) format
- Document decompressed content structure with full byte layouts
- Detail differential encoding algorithm for entry names
- Document packed string value optimization
- Cover field count markers and control sections
- Include worked example with achievement.bin parsing
- Add binary section structure details for abjx and abij formats
- Document format variations by type
monokrome and others added 28 commits January 3, 2026 08:11
Extract boss display names from NCS display_data.bin and embed them in
the drops module. Now all 45 bosses have human-readable display names
that can be searched (e.g., "The Backhive", "Primordial Guardian Inceptus").

- Added BossNameMapping struct with embedded mappings
- Mappings derived from CharData_ and npcs records in display_data.bin
- Supports fuzzy matching via aliases
- Updated generate_drops_manifest to use new mappings
Previously, NCS extraction used magic byte scanning which found only
164 of 170 NCS files (missing 6 files like wwise_*, xp_progression).

This adds repak as a dependency and implements proper PAK file reading
via the file index, which correctly extracts all 170 NCS files.

Changes:
- Add repak dependency with oodle feature to bl4-ncs
- Add pak.rs module with NcsReader trait and implementations:
  - PakReader: reads from PAK files using repak
  - MemoryPakReader: reads from in-memory PAK data
  - DirectoryReader: reads from extracted NCS directories
- Update CLI ncs decompress to use PAK index for .pak files
- Keep magic byte scanning as fallback for non-PAK files

The 6 previously missing files are now correctly extracted:
- wwise_auxilary_busses
- wwise_soundbanks
- wwise_states
- wwise_switches
- wwise_triggers
- xp_progression
Phase 1 of tech debt cleanup:

- Fix thiserror version mismatch (2.0 → workspace 1.0) in bl4-ncs
- Standardize serde_json to workspace dependency across crates
- Remove orphaned linewise directory (moved to separate repo)
- Silence dead code warnings in memory module (UE5 reflection types)
- Remove unused tracking variable in parser.rs
- Add #[allow(dead_code)] to intentionally unused but reserved APIs

All workspace packages now compile without warnings.
Phase 2 of tech debt cleanup - split uextract/main.rs:

- Extract cli.rs: Args, Commands, OutputFormat types
- Extract filter.rs: matches_filters path filtering logic
- Create commands/ module:
  - script_objects.rs: extract_script_objects command
  - find_assets.rs: find_assets_by_class command
  - list_classes.rs: list_classes command
  - texture.rs: extract_texture_cmd wrapper

Main.rs reduced from 2651 lines to 1935 lines (27% reduction).
Remaining zen parsing code (~1700 lines) is a candidate for
further extraction into a zen/ module.
- Remove orphaned bl4/src/items.rs (1292 lines) - was never exposed in
  lib.rs and used rusqlite which wasn't in Cargo.toml. bl4-idb is the
  canonical item database implementation with PostgreSQL + SQLite support.

- Add PakManifest::load() helper method to reduce 6 instances of
  duplicate manifest loading code in pak_extraction modules.
Extract core functionality from parser.rs (2366 lines) into dedicated modules:

- bit_reader.rs (188 lines): BitReader struct and bit_width function
  for bit-level binary data parsing
- types.rs (272 lines): Document, Record, Value, Header, StringTable,
  TagType, TagValue, binary section types

This reduces parser.rs to 1993 lines and improves code organization.
The extracted modules have clearer responsibilities and are easier
to maintain.
Move string table parsing functions to dedicated string_table.rs module:
- parse_string_table, extract_inline_strings, extract_field_abbreviation
- create_combined_string_table
- Helper functions: is_valid_string, split_packed_string, normalize_entry_name

This reduces parser.rs from 1993 to 1546 lines. The string_table module
(483 lines) now contains all string table related functionality.

Progress:
- parser.rs: 2366 -> 1546 lines
- bit_reader.rs: 188 lines
- types.rs: 272 lines
- string_table.rs: 483 lines
Phase 1 of tech debt reduction:

- reference.rs (654 lines) -> reference/ directory with 7 domain modules:
  rarity, element, weapon, manufacturer, gear, stats, legendary

- save.rs (1554 lines) -> save/ directory with 3 modules:
  state_flags.rs (StateFlags bitmask helper)
  changeset.rs (ChangeSet for batch modifications)
  mod.rs (SaveFile and helpers)

- serial.rs (1581 lines) -> serial/ directory with 3 modules:
  bitstream.rs (BitReader/BitWriter for variable-length tokens)
  base85.rs (BL4 alphabet encoding/decoding)
  mod.rs (ItemSerial, Token, Element, Rarity and parsing)

All tests pass. No API changes.
- Add shared.rs with ITEM_SELECT_COLUMNS, FIELDS_TO_MIGRATE, and schema constants
- Add build_list_query() and build_count_query() for consistent filter handling
- Update sqlite.rs to use shared constants and query builder
- Update sqlx_impl.rs SQLite implementation to use shared module
- Reduce code duplication between sync and async implementations

All 41 bl4-idb tests pass.
- Make PartsDatabase.categories and source fields optional
- Make PartEntry.group field optional
- Update VLA_SR category test to match corrected mapping (27 -> 25)

All 144 bl4 tests and 41 bl4-idb tests pass.
- drops/types.rs: Type definitions (245 lines)
- drops/db.rs: DropsDb implementation (259 lines)
- drops/extract.rs: NCS extraction functions (462 lines)
- drops/mod.rs: Re-exports and tests (113 lines)

Each module is now focused and under 500 lines.
All 5 drops tests pass.
Split the 1547-line parser.rs into 6 focused modules:
- header.rs: Header parsing, entry section, string table location
- unpack.rs: String unpacking for packed value strings
- differential.rs: Differential name decoding
- entries.rs: Entry parsing and record creation
- binary.rs: Binary section parsing
- document.rs: Document format parsing (abjx, abij, etc.)
- mod.rs: Public API re-exports

Each module is now under 300 lines with clear responsibilities.
Split the 1254-line ncs.rs into 10 focused modules:
- types.rs: Result types (ScanResult, FileInfo, etc.)
- scan.rs: Directory scanning and statistics
- show.rs: File content display
- search.rs: Pattern search
- extract.rs: Type extraction and parts indexing
- decompress.rs: NCS decompression handlers
- debug.rs: Debug/analysis commands
- format.rs: TSV output formatting
- util.rs: Helper functions (hex dump)
- mod.rs: Command dispatch

Each module is now under 300 lines with clear responsibilities.
Split 1935-line main.rs into:
- types.rs: Serialization types (ZenAssetInfo, ParsedProperty, etc.)
- property.rs: UE5 unversioned property parsing (FFragment, etc.)
- zen.rs: Zen format to JSON conversion
- main.rs: CLI entry point and extraction logic (263 lines)

This improves code organization and maintainability.
- Add test_paths module with platform-aware helpers for game file paths
- Check BL4_PAKS_DIR env var first, fall back to OS-specific defaults
- Support Windows, Linux, and macOS Steam installation paths
- Fix BASE_KEY in docs to match actual code in crypto.rs
- Add dirs dev-dependency for home directory resolution
- Add Linux-only cfg guard to bl4-preload (LD_PRELOAD is Linux-specific)
- Fix manifest.rs to use direct paths instead of broken symlinks
- Fix analysis.rs tests to use temp_dir() instead of /tmp/
This fixes the architecture where PAK container reading was embedded
in bl4-ncs (a parser library). Now:

- uextract handles all container formats (PAK via repak, IoStore via retoc)
- bl4-ncs is a pure NCS parser (works with bytes/paths, not containers)
- bl4-cli uses uextract for PAK extraction, bl4-ncs for NCS parsing

Changes:
- Add repak dependency to uextract
- Create uextract::pak module with PakReader, MemoryPakReader
- Add `uextract pak` CLI command for traditional PAK extraction
- Make uextract a library crate (lib.rs) in addition to binary
- Simplify bl4-ncs pak.rs to DirectoryReader and helper functions
- Remove repak dependency from bl4-ncs
- Update bl4-cli to use uextract::pak::PakReader
These tests read actual PAK files which takes 40+ seconds and causes
high resource usage. They're now marked #[ignore] so they only run
when explicitly requested with `cargo test -- --ignored`.

Ignored tests:
- find_v1_failures, scan_all_pak_ncs, find_missing_chunks
- correlate_manifest_to_chunks, test_real_pak_extraction
- scan_after_last_chunk, show_full_mapping, generate_mapping_csv
- parse_first_10_ncs, check_missing_entries, try_header_offsets
repak panics with index out of bounds on certain PAK files that contain
raw data instead of standard PAK format. Wrap repak calls in
catch_unwind to convert panics to graceful errors.

This allows uextract to skip invalid/encrypted PAK files and continue
processing the remaining archives instead of crashing.
- Remove unused code and imports across all crates
- Prefix unused variables with underscore
- Convert manifest JSON files to expected format
- Fix manufacturers.json: change from array to map
- Fix weapon_types.json: change from array to map
- Convert parts_database.json to new format with version field
- All 480+ tests now passing with zero warnings
Comprehensive validation analysis comparing our tag-based extraction
against NcsParser baseline (5,513 indices).

Results:
- Extracted 5,972 unique positions (+8.3% vs baseline)
- 421 unique values (vs ~513 expected)
- 3.5% tag overlap (good independence)
- 80.8% multi-position indices (realistic distribution)
- Estimated 95%+ accuracy with ~2-5% false positive rate

Conclusion: Tag-based extraction is accurate enough for production use.
Differences vs NcsParser likely due to different data sources (binary
section vs manifest/metadata).
Investigated why inv file parsing fails and discovered critical insight:
format codes are TYPE DICTIONARIES, not sequential schemas.

Key Findings:
- "abcefhijl" declares available types, NOT field order [a][b][c][e][f]...
- inv files use dynamic tag-based structure, not fixed schemas
- Records vary in size and composition based on which types they use
- Missing 'd' in format code means inv files don't use that list variant

Investigation Tools Created (14 example programs):
- test_ncs_parser.rs - confirmed ncs_parser.rs incompatible with inv
- find_tags.rs - located tag bytes (0x61, 0x66) in binary section
- test_tag_alignment.rs - tests various tag reading strategies
- examine_pre_binary.rs - searches for schema section
- examine_inv_binary.rs - hex dump analysis
- find_deps_*.rs - searched for 39 dependency names storage
- parse_deps_from_binary.rs - attempted deps extraction
- check_markers.rs - examines marker context
- examine_gap.rs - prepared to analyze 41-bit gap (not run)

Ground Truth Established:
- Reference parser output: 539 unique indices (5,460 total occurrences)
- Our heuristics: 421 unique values (78% coverage, 22% false negatives)
- No false positives detected - heuristics are conservative

Next Steps:
- Debug reference implementation in Windows to understand parsing algorithm
- Document exact parsing approach from debugger observations
- Implement proper parser based on findings
- Validate: must extract all 539 unique indices (100% coverage)

Documentation updated in .bl4.info/INV-FORMAT-ANALYSIS.md
@monokrome
Copy link
Owner

Hi! Thanks for this contribution. I didn't notice it because I didn't know anyone was looking at these projects, @abramhindle. I apologize for missing it. The project has significantly diverged since then - and is a lot more accurate at things like serial decoding.

Can you add this change on the latest version? I am asking you because I don't want to do it myself and ignore your contribution.

Thank you for sending this! I appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants