Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected bug on paralellization of patterns #1083

Open
Guillermogsjc opened this issue Sep 6, 2023 · 14 comments
Open

unexpected bug on paralellization of patterns #1083

Guillermogsjc opened this issue Sep 6, 2023 · 14 comments

Comments

@Guillermogsjc
Copy link

Guillermogsjc commented Sep 6, 2023

What version of regex are you using?

Upgrading the dependency from regex = "1.8" to

[dependencies.regex]
version = "1.9"
features = ["perf"]

with rayon = "1.7" and compiled with rustc 1.71.1 (eb26296b5 2023-08-03)

Describe the bug at a high level.

Used parallelization on searches in battle tested way with the combo rayon -> 1.7 and regex -> 1.8, but after upgrading, the same code generates unexpected bug.
The paralellization is made by chunks of patterns, meaning that built regex are not shared across threads neither cloned, with something like:

...
.patterns
            .par_chunks(regexset_chunk_size)
            .enumerate()
            .map(|(pattern_chunk_index, patterns)| {
                let pattern_start_index: usize = pattern_chunk_index * regexset_chunk_size;
                // BUILD
                let (regex_built, partial_invalid_regex_index) =
                    build::BuiltRegex::build_new(self.regex_config, pattern_start_index, patterns);
                let partial_found_regex = perform_regex_find(
                    &regex_built,
                    &self.preprocessed_texts,
                    &self.text_n_char,
                    0,
                );
                (partial_found_regex, partial_invalid_regex_index)
            })
            .reduce(
                || {
                    let zero_invalid_indexes: BTreeMap<usize, (String, RegexCompilingError)> =
                        BTreeMap::new();
                    (MatchesByTextPattern::new(), zero_invalid_indexes)
                },
                |(mut accum_matches, mut accum_invalid), (new_matches, new_invalid)| {
                    accum_matches.add(new_matches);
                    accum_invalid.extend(new_invalid);
                    (accum_matches, accum_invalid)
                },
...

where BuiltRegex is a struct that has regex::RegexSet compiled (built in each chunk of patterns <-> thread), and a vec of regex::Regex .

Log:

**thread '<unnamed>' panicked at 'invalid 'from' id: LazyStateID(134217600)', /builds/g6313/pytextrust/.cargo/registry/src/index.crates.io-6f17d22bba15001f/regex-automata-0.3.8/src/hybrid/dfa.rs:2563:9**
stack backtrace:
   0:     0x7f53e4e192e1 - std::backtrace_rs::backtrace::libunwind::trace::h782cc21a5acaf6cb
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7f53e4e192e1 - std::backtrace_rs::backtrace::trace_unsynchronized::hc579eb24ab204515
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f53e4e192e1 - std::sys_common::backtrace::_print_fmt::h7223525cfdbacda2
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x7f53e4e192e1 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hbd7d55b7108d2ab8
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7f53e4e3a4df - core::fmt::rt::Argument::fmt::hb4f4a02b9bd9dd49
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/fmt/rt.rs:138:9
   5:     0x7f53e4e3a4df - core::fmt::write::h6d54cd7c9e155ec5
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/fmt/mod.rs:1094:21
   6:     0x7f53e4e17071 - std::io::Write::write_fmt::h6a453a71c692f63b
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/io/mod.rs:1713:15
   7:     0x7f53e4e190f5 - std::sys_common::backtrace::_print::h1cbaa8b42678f928
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x7f53e4e190f5 - std::sys_common::backtrace::print::h4ddf81241a51b337
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x7f53e4e1a597 - std::panicking::default_hook::{{closure}}::hff91f1f484ade5cd
  10:     0x7f53e4e1a384 - std::panicking::default_hook::h21f14afd59f7aef9
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:288:9
  11:     0x7f53e4e1aa4c - std::panicking::rust_panic_with_hook::h45f66047b14c555c
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:705:13
  12:     0x7f53e4e1a947 - std::panicking::begin_panic_handler::{{closure}}::h49d1a88ef0908eb4
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:597:13
  13:     0x7f53e4e19716 - std::sys_common::backtrace::__rust_end_short_backtrace::hccebf9e57f8cc425
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:151:18
  14:     0x7f53e4e1a692 - rust_begin_unwind
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:593:5
  15:     0x7f53e4c59d93 - core::panicking::panic_fmt::h54ec9d0e3180a83d
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/panicking.rs:67:14
  16:     0x7f53e4d36a79 - regex_automata::hybrid::dfa::Lazy::set_transition::h40d9f99cde8aad3b
  17:     0x7f53e4c53a1b - regex_automata::hybrid::dfa::Lazy::cache_next_state::hee43c6cf1eebd065
  18:     0x7f53e4d4f1b0 - regex_automata::hybrid::search::find_overlapping_fwd::hab272e92ae9fe7db
  19:     0x7f53e4d32ef9 - regex_automata::meta::wrappers::HybridEngine::try_which_overlapping_matches::h021040f0c7f16ca3
  20:     0x7f53e4d2b2eb - <regex_automata::meta::strategy::ReverseAnchored as regex_automata::meta::strategy::Strategy>::which_overlapping_matches::h16ff42eddbd22ab3
  21:     0x7f53e4c6b39d - regex::regexset::string::RegexSet::matches_at::h8c4b9f5760adeede
  22:     0x7f53e4c6dca4 - pytextrust::pkg::apply::perform_regex_find::h387779a4b727fd29
  23:     0x7f53e4c6904d - core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut::h8a1a242980b7d931
  24:     0x7f53e4cb4103 - <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::hf12286d40c59ba5a
  25:     0x7f53e4c9c726 - rayon::iter::plumbing::bridge_producer_consumer::helper::h4f120d10ea2efc9b
  26:     0x7f53e4cd0021 - rayon_core::job::StackJob<L,F,R>::run_inline::h29d9ac8b4cd69e32
  27:     0x7f53e4cd0c95 - rayon_core::join::join_context::{{closure}}::hb5a83b3e0724fddb
  28:     0x7f53e4cd1477 - rayon_core::registry::in_worker::h15b4b94eed70e110
  29:     0x7f53e4c9c0f3 - rayon::iter::plumbing::bridge_producer_consumer::helper::h4f120d10ea2efc9b
  30:     0x7f53e4cd0a02 - rayon_core::join::join_context::{{closure}}::hb5a83b3e0724fddb
  31:     0x7f53e4cd1477 - rayon_core::registry::in_worker::h15b4b94eed70e110
  32:     0x7f53e4c9c0f3 - rayon::iter::plumbing::bridge_producer_consumer::helper::h4f120d10ea2efc9b
  33:     0x7f53e4cd0a02 - rayon_core::join::join_context::{{closure}}::hb5a83b3e0724fddb
  34:     0x7f53e4cd1477 - rayon_core::registry::in_worker::h15b4b94eed70e110
  35:     0x7f53e4c9c0f3 - rayon::iter::plumbing::bridge_producer_consumer::helper::h4f120d10ea2efc9b
  36:     0x7f53e4cd0a02 - rayon_core::join::join_context::{{closure}}::hb5a83b3e0724fddb
  37:     0x7f53e4cd1477 - rayon_core::registry::in_worker::h15b4b94eed70e110
  38:     0x7f53e4c9c0f3 - rayon::iter::plumbing::bridge_producer_consumer::helper::h4f120d10ea2efc9b
  39:     0x7f53e4cd0a02 - rayon_core::join::join_context::{{closure}}::hb5a83b3e0724fddb
  40:     0x7f53e4cd1477 - rayon_core::registry::in_worker::h15b4b94eed70e110
  41:     0x7f53e4c9c0f3 - rayon::iter::plumbing::bridge_producer_consumer::helper::h4f120d10ea2efc9b
  42:     0x7f53e4cd63b4 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hdb767f879f7dcf76
  43:     0x7f53e4c508f3 - rayon_core::registry::WorkerThread::wait_until_cold::h7efda915f27a9dea
  44:     0x7f53e4cf3a38 - rayon_core::registry::ThreadBuilder::run::hcbd44bc14931b20d
  45:     0x7f53e4cf59ea - std::sys_common::backtrace::__rust_begin_short_backtrace::hf0a4cb93a436a68b
  46:     0x7f53e4cf852f - core::ops::function::FnOnce::call_once{{vtable.shim}}::h027bf9b01eb64595
  47:     0x7f53e4e1d115 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::he3e5dbdfabe0b668
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/boxed.rs:1985:9
  48:     0x7f53e4e1d115 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h246f7c7964633611
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/boxed.rs:1985:9
  49:     0x7f53e4e1d115 - std::sys::unix::thread::Thread::new::thread_start::hadf9e3501ff0df23
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys/unix/thread.rs:108:17
  50:     0x7f554b09b044 - start_thread
                               at ./nptl/pthread_create.c:442:8
  51:     0x7f554b11b5fc - __GI___clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  52:                0x0 - <unknown>

What are the steps to reproduce the behavior?

It is very hard to generate reproducible steps.

What is the expected behavior?

It was usually working fine before change of minor from v1.8 to v1.9 on regex crate with default features.

On the Cargo.lock, i can see changes:

  • from memchr 2.5.0 to 2.6.3
  • from regex 1.8.1 to 1.9.5 with regex-automata 0.3.8 (being the last one not a dependency on previous version v1.8 as it is not in the cargo lock).
@BurntSushi
Copy link
Member

I need a reproduction please.

It was usually working fine before change of minor from v1.8 to v1.9

It was usually working fine with regex 1.8? Or was always working correctly?

@Guillermogsjc
Copy link
Author

Guillermogsjc commented Sep 6, 2023

It was usually working fine with regex 1.8? Or was always working correctly?

It is working battle tested through really too many batch heavy processes on the dependency v1.8 in a flawless manner (not a single panic).

I will try to create something similar, I guess it will take me time to reach a reproducible piece of code. It is by the way being called trhough pyo3 with allow_threads, maybe something is hitting after the change with this.

I will update here as soon as possible with an example breaking this parallelism.

Kind regards

@BurntSushi
Copy link
Member

Okay, if there wasn't a single panic then it "always worked," not "usually worked."

@BurntSushi
Copy link
Member

And yes, a reproduction would be extremely helpful here.

@Guillermogsjc
Copy link
Author

Guillermogsjc commented Sep 10, 2023

trying to reproduce it but only appears on one exact combo of patterns, if I subset them into distinct groups it does not appear. In fact it works perfectly in any other combination of groups of patterns of the same set (this gives me headache because I was supposing to have a rotten apple there).

It does not look like an OOM problem, it is far from OOM and it is not a 143 exit.

Trying to generate the example, I do not have any clue about how to reproduce it without giving exact data (what I can not do).

Could you give me any clue about what is this panic about?

thread '<unnamed>' panicked at 'invalid 'from' id: LazyStateID(134217600)', /io/.project_cargo/registry/src/index.crates.io-6f17d22bba15001f/regex-automata-0.3.8/src/hybrid/dfa.rs:2563:9

The assert breaking is this one https://docs.rs/regex-automata/0.3.8/src/regex_automata/hybrid/dfa.rs.html#2563

Any clue would be very nice to try to build something that breaks it in a generalist manner.

By the way, the kind of load in work here is somehow reproduced here (with haystack and patterns to be provided) https://gist.github.com/Guillermogsjc/74874a806e5d1006ec7075ddffd5f25a#file-load_test-rs

it is not exactly this , but close (it is launched from a pyo3 function with allow_threads).

Thanks in advance.

@BurntSushi
Copy link
Member

I don't know what's causing it sadly and I don't know how to contextualize that assert in a way that relates to the public API. If I could do that, I probably wouldn't need a reproduction.

@Guillermogsjc
Copy link
Author

is here maybe a bad practice or usage of the regex crate inside the rayon parallel scheme in such a way it is prone to throw this panic ? https://gist.github.com/Guillermogsjc/74874a806e5d1006ec7075ddffd5f25a#file-load_test-rs

@BurntSushi
Copy link
Member

Nope. The panic is definitively a bug. Regex find routines should never panic under any circumstances (except for perhaps exhausting heap memory, but I don't think that's relevant here).

@BurntSushi
Copy link
Member

I tried looking into this to see if I could figure out how you're getting the panic, and I can't see it. My best guess is that there is some kind of bug in handling the lazy DFA's cache resets.

I think the bottom line here is that I'm going to need a repro. If you could get me that, that would be wonderful and I'd be most appreciative. To be clear, a repro means that you can provide me a set of commands and inputs that I can run on my own machine and observe the same thing you're observing. So a repro means you'd provide me with a transcript of actions that I can take and see the problem for myself.

@Guillermogsjc
Copy link
Author

Guillermogsjc commented Oct 9, 2023

Hi @BurntSushi , yes I know that we need a deterministic input+code combination to be able to dig into this bug.

I have been trying several times with distinct shareable inputs against same flow, but so far did not break it.

Will continue searching for this shareable inputs that reach same state, but as you well know, it is like searching a needle in a haystack, because do not have a remote clue about the characterization of the patterns that cause this.

To be on track, this is about the patterns not about the haystack right? Or has to do with the combination?

@BurntSushi
Copy link
Member

It is almost certainly a combination of both patterns and haystack.

@Guillermogsjc
Copy link
Author

Guillermogsjc commented Oct 28, 2023

hi @BurntSushi ,

have been for a while trying to encrypt both patterns and haystack in a way that everything works similar and it breaks as breaks with original data, to provide reproduction of the bug.

So far did not get a reliable encrypting that honors patterns match over haystack in a sufficient way (with only obfuscation i get it, but as i encrypt, the subword patterns stop matching and everything changes and does not break).

The thing is that after upgrading rayon and regex crate, with the exact combination that I had issues... its is gone!

rayon: 1.7.0 into 1.8.0
regex: 1.9.5 into 1.10.2
regex-automata: 0.3.8 into 0.4.3

also removing those unneeded Arc over &str haystack vectors before feeding rayon.par_chunks.

So if is ok for you, this issue would be better closed.

@BurntSushi
Copy link
Member

Definitely sounds suspicious, but without a solid repro I agree with closing this. Feel free to re-open this if you rediscover the bug.

@LuaKT
Copy link

LuaKT commented Jul 16, 2024

Hi @BurntSushi,
I have ran into this issue today when increasing the dfa_size_limit.
I've created a repro here: https://github.com/LuaKT/regex_crash_repro
I'll try to dig into it some more tomorrow.

Thanks!

@Guillermogsjc Guillermogsjc reopened this Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants