Skip to content

Conversation

@dabund24
Copy link

@dabund24 dabund24 commented Nov 5, 2025

First part of #1805.
Second case will be handled in a separate PR.

To be handled

Non-transitive version

When creating $t_1$, $t_0$ must hold a lock $l$. If $l$ is not released before $t_1$ is definitely joined into $t_0$, $t_1$ is protected by $l$.

Examples

graph TB;
subgraph t1;
    E["..."]-->F["return;"];
end;
subgraph t0;
    A["lock(l);"]-->B;
    B["create(t1);"]-->C;
    C["join(t1);"]-->D["unlock(l);"]
end;
B-.->E
F-.->C
Loading
graph TB;
subgraph t1;
    E["..."]-->F["return;"];
end;
subgraph t0;
    A["lock(l);"]-->B;
    B["create(t1);"]-->C[return;];
end;
B-.->E
Loading

General version

Let $t_d$ be a may-descendant of $t_1$. When creating $t_1$, $t_0$ must hold a lock $l$. If $l$ is not released before $t_d$ is definitely joined into $t_0$, $t_d$ is protected by $l$.

Example

graph TB;
subgraph td;
    G["..."]-->H["return;"];
end;
subgraph t1;
    E["create(td);"]-->F["return;"];
end;
subgraph t0;
    A["lock(l);"]-->B;
    B["create(t1);"]-->C;
    C["join(td);"]-->D["unlock(l);"]
end;
B-.->E
E-.->G
H-.->C
Loading

Dependency Analyses

  • $\mathcal T$: Ego Thread Id at program point with ana.thread.domain set to "history" and ana.thread.include-node and ana.thread.context.create-edges both enabled
  • $\mathcal L$: Must-Lockset at program point
  • $\mathcal C$: May-Creates of ego thread before program point
  • $\mathcal J$: Transitive Must-Joins of ego thread before program point
  • $\mathcal{DES}(t)$: Descendant threads of $t$ (implemented in this PR)

Conditions to satisfy

  1. maybe $\exists$ create(t1) in $t_0$ with $l\in\mathcal L$ and $t_d\in\mathcal{DES}(t_1)$
  2. $\neg$ (maybe $\exists$ create(t1) in $t_0$ with $l\notin\mathcal L$ and $t_d\in\mathcal{DES}(t_1)$ ). If 1. holds, we get this for free; see final section
  3. $\neg$ (maybe $\exists$ unlock(l) in $t_0$ with $t_d\in\left(\mathcal C\cup\bigcup_{c\in\mathcal C}\mathcal{DES}(c)\right)\setminus\mathcal J$ )

Analyses

Creation Lockset

  • $\mathcal{CL}\subseteq T\to 2^{T\times L}$
  • May-Set
  • Flow-Insensitive
  • Condition 1 is satisfied if $(t_0, l)\in\mathcal{CL}(t_d)$

Contributions

  • create(t1):
    $\forall t\in \{t_1\}\cup\mathcal{DES}(t_1):\mathcal{CL}(t)\sqsupseteq\{\mathcal T\}\times\mathcal L$

Tainted Creation Lockset

  • $\mathcal{TCL}\subseteq T\to 2^{T\times L}$
  • May-Set
  • Flow-Insensitive
  • Condition 3 is satisfied if $(t_0, l)\notin\mathcal{TCL}(t_d)$

Contributions

  • unlock(l):
    $\forall t\in \left(\mathcal C\cup\bigcup_{c\in\mathcal C}\mathcal{DES}(c) \right)\setminus\mathcal J:\mathcal{TCL}(t)\sqsupseteq \{(\mathcal T,l)\}$

Rules for MHP exclusion

Let $\mathcal{IL}(t):=\mathcal{CL}(t)\setminus\mathcal{TCL}(t)$.
Program points $s_1$ with $\mathcal T_1$, $\mathcal L_1$ and $\mathcal{IL}_1$ and $s_2$ with $\mathcal T_2$, $\mathcal L_2$ and $\mathcal{IL}_2$ cannot happen in parallel if at least one condition holds:

  • $\exists (t_a,l_a)\in\mathcal{IL}_1:l_a\in\mathcal L_2,t_a\neq \mathcal T_2$
  • $\exists (t_a,l_a)\in\mathcal{IL}_2:l_a\in\mathcal L_1,t_a\neq \mathcal T_1$
  • $\exists(t_{a1},l)\in\mathcal{IL}_ 1,(t_{a2},l)\in\mathcal {IL}_ 2: t_{a1}\neq t_{a2}$

Notes on non-unique thread ids

By requiring thread ids to include the full history and the creation point, we work around the problem of incorrectly marking two program points as sequential because of an ambiguous creation history. Notes on some edge cases:

Ambiguity due to multiple thread creations in one thread

graph TB;
    A((t1))-->B((t2));
    A-->B;
Loading

If $l$ is included at only some, but not all locksets at create(t2) statements in $t_1$, the analysis would not work. However, since we include the creation node in the thread id, this case is impossible.

Ambiguity due to diamond-like thread creations

graph TB;
    A((t1))-->B((t2));
    A-->C((t3));
    B-->D((t4));
    C-->D;
Loading

If $t_4$ were marked as protected by $t_2$ only, this would be incorrect, since a creation via $t_3$ is also possible. Though, this cannot happen, since $t_4$ would have two different histories (each following a path of the diamond) and thus two different thread ids.

Ambiguity due to circular thread creations

graph TB;
    A((t1))-->B((t2));
    B-->C((t3));
    C-->B;
Loading

Marking $t_2$ as protected by $t_3$ would be problematic, since the loop does not need to be entered at all. However, the first iteration of circular loops still receives a non-unique thread id which is not marked as a descendant of $t_3$, so problematic program points in $t_2$ are flagged as racing nevertheless.

@sim642 sim642 changed the title Improve mhp precision using ancestor locksets Improve MHP precision using ancestor locksets Nov 10, 2025
@dabund24 dabund24 marked this pull request as ready for review December 4, 2025 13:04
Copy link
Author

@dabund24 dabund24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general questions:

  • Is there a way to enforce configuration settings to be set to certain values in order for the analysis to work (i.e. similar to how analyses can be declared as dependencies)? This would be necessary for the settings revolving around thread ids as described in the pr summary. Checking settings using GobConfig.get_string at the start of transfer functions doesn't really seem ideal to me
  • is there an ideal amount of tests I should write or can I write as many as I feel to be necessary?

Comment on lines +73 to +79
match tid_lifted, child_tid_lifted with
| `Lifted tid, `Lifted child_tid ->
let descendants = descendants_closure child_ask child_tid in
let lockset = ask.f Queries.MustLockset in
let to_contribute = cartesian_prod (TIDs.singleton tid) lockset in
TIDs.iter (contribute_lock man to_contribute) descendants
| _ -> (* TODO deal with top or bottom? *) ()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for a thread id to be top or bottom? Not sure how to deal with this here and in some other places

@sim642
Copy link
Member

sim642 commented Dec 5, 2025

  • Is there a way to enforce configuration settings to be set to certain values in order for the analysis to work (i.e. similar to how analyses can be declared as dependencies)? This would be necessary for the settings revolving around thread ids as described in the pr summary. Checking settings using GobConfig.get_string at the start of transfer functions doesn't really seem ideal to me

Maingoblint.check_arguments has a bunch of ad-hoc checks like this.

@sim642
Copy link
Member

sim642 commented Dec 5, 2025

  • is there an ideal amount of tests I should write or can I write as many as I feel to be necessary?

As many as necessary to cover all the functionality and ideally the corner cases of the domains/analyses.
We also have code coverage available, but it's not required, although it might be useful for finding untested stuff.

@dabund24
Copy link
Author

dabund24 commented Dec 5, 2025

  • Is there a way to enforce configuration settings to be set to certain values in order for the analysis to work (i.e. similar to how analyses can be declared as dependencies)? This would be necessary for the settings revolving around thread ids as described in the pr summary. Checking settings using GobConfig.get_string at the start of transfer functions doesn't really seem ideal to me

Maingoblint.check_arguments has a bunch of ad-hoc checks like this.

that's what I needed, nice :D
added in 13443f9

@dabund24
Copy link
Author

dabund24 commented Dec 5, 2025

  • is there an ideal amount of tests I should write or can I write as many as I feel to be necessary?

As many as necessary to cover all the functionality and ideally the corner cases of the domains/analyses. We also have code coverage available, but it's not required, although it might be useful for finding untested stuff.

I'm going to add some more in the coming days 👍

@dabund24
Copy link
Author

dabund24 commented Dec 5, 2025

The ci builds failed due to an incorrect semi colon. I fixed this in f1efd32, so everything should be working now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants