Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an edge case in the Tarjan GC bridge that leads to losing xref information #112825

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

filipnavara
Copy link
Member

In the Tarjan SCC bridge processing there's a color graph used to find out connections between SCCs. There was a rare case which only manifested when a cycle in the object graph points to another cycle that points to a bridge object. We only recognized direct bridge pointers but not pointers to other non-bridge SCCs that in turn point to bridges and where we already calculated the xrefs. These xrefs were then lost.

TBD: Extend the description

Fixes dotnet/android#9039
Ref dotnet/android#9789 for discussion on the root cause

/cc @AaronRobinsonMSFT @simonrozsival ... I'd be happy if you could take this one over. Happy to explain in my words what is going on but the description is quite convoluted.

In the Tarjan SCC bridge processing there's a color graph used to find out
connections between SCCs. There was a rare case which only manifested when
a cycle in the object graph points to another cycle that points to a bridge
object. We only recognized direct bridge pointers but not pointers to other
non-bridge SCCs that in turn point to bridges and where we already calculated
the xrefs. These xrefs were then lost.
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 23, 2025
@filipnavara
Copy link
Member Author

filipnavara commented Feb 23, 2025

Example code that got broken by the bug:

        // Run this on UI thread (eg. in MainActivity.OnCreate)  
        Task.Run(GCLoop); // <== Repeat this line up to 4 times depending on HW/emulator to make the repro more reliable
        _ = AsyncStreamWriter();

        public static async Task GCLoop()
        {
            while (true)
            {
                GC.Collect();
                await Task.Delay(10);
            }
        }

        public static async Task AsyncStreamWriter()
        {
            var bs = new ByteArrayOutputStream();
            var osi = new Android.Runtime.OutputStreamInvoker(bs);
            try
            {
                while (true)
                    await osi.WriteAsync(new byte[2]);
            }
            catch (ObjectDisposedException ex)
            {
                // <== Fails here because ByteArrayOutputStream got finalized/disposed
                System.Environment.FailFast(ex.ToString());
            }
        }

The task machinery of AsyncStreamWriter captures the synchronization context from the UI thread (Android.App.SyncContext). When the continuation of the osi.WriteAsync call is queue through the SyncContext.Post method it creates a RunnableImplementation object (Java bridged) that hold the only reference to the object graph of the task. The RunnableImplementation object itself is not rooted in .NET.

If a GC occurs at this point and starts the GC bridge machinery we end up with both ByteArrayOutputStream and RunnableImplementation seen as unreferenced objects on the .NET side. The GC bridge is responsible for reporting that graph to the Android GC bridge and preserving the cross-reference (xref) between RunnableImplementation and ByteArrayOutputStream. The bug, however, caused the two objects to be reported as independent and leading to ByteArrayOutputStream being finalized while there was still a reference to it from live code rooted by the RunnableImplementation.

To understand why the GC bridge reported it incorrectly, we must look at the log (produced with custom DUMP_GRAPH build):

-----------------
+scanning ByteArrayOutputStream (0x6d21ff42f0) index 0 color 0x0
-finishing ByteArrayOutputStream (0x6d21ff42f0) index 0 low-index 0 color 0x0
-finished ByteArrayOutputStream (0x6d21ff42f0) index 0 low-index 0 color 0x0
|SCC 0x7079a06010 rooted in ByteArrayOutputStream (0x6d21ff42f0) has bridge 1
	loop stack: (0/0)
	member ByteArrayOutputStream (0x6d21ff42f0) index 0 low-index 0 color 0x0 state 2
+scanning RunnableImplementor (0x6db6b821b0) index 1 color 0x0
	= pushing 0x6db6b789b0 Action -> pushed!
+scanning Action (0x6db6b789b0) index 2 color 0x0
	= pushing 0x6db6b649d0 <>c__DisplayClass2_0 -> pushed!
+scanning <>c__DisplayClass2_0 (0x6db6b649d0) index 3 color 0x0
	= pushing 0x6db6b78b30 SendOrPostCallback -> alive
	= pushing 0x6db6b79a30 Action -> pushed!
+scanning Action (0x6db6b79a30) index 4 color 0x0
	= pushing 0x6db6b827a0 AsyncStateMachineBox`1 -> pushed!
+scanning AsyncStateMachineBox`1 (0x6db6b827a0) index 5 color 0x0
	= pushing 0x6db6b79a30 Action -> already marked
	= pushing 0x6d24010cf8 ExecutionContext -> alive
	= pushing 0x6db6b827f0 <AsyncStreamWriter>d__2 -> pushed!
+scanning <AsyncStreamWriter>d__2 (0x6db6b827f0) index 6 color 0x0
	= pushing 0x6db6b827a0 AsyncStateMachineBox`1 -> already marked
	= pushing 0x6d21ff42f0 ByteArrayOutputStream -> already marked
	= pushing 0x6db6b64890 OutputStreamInvoker -> pushed!
	= pushing 0x6d24015e60 ReadWriteTask -> alive
+scanning OutputStreamInvoker (0x6db6b64890) index 7 color 0x0
	= pushing 0x6db6b74830 SemaphoreSlim -> pushed!
	= pushing 0x6d21ff42f0 ByteArrayOutputStream -> already marked
+scanning SemaphoreSlim (0x6db6b74830) index 8 color 0x0
	= pushing 0x6db6b501a8 StrongBox`1 -> opaque
-finishing SemaphoreSlim (0x6db6b74830) index 8 low-index 8 color 0x0
	compute low 0x6db6b74830 ->0x6db6b501a8 (StrongBox`1) 0x0 (-2 / -2, color 0x0)
-finished SemaphoreSlim (0x6db6b74830) index 8 low-index 8 color 0x0
|SCC 0x0 rooted in SemaphoreSlim (0x6db6b74830) has bridge 0
	loop stack: (1/1)(2/2)(3/3)(4/4)(5/5)(6/6)(7/7)(8/8)
	member SemaphoreSlim (0x6db6b74830) index 8 low-index 8 color 0x0 state 2
-finishing OutputStreamInvoker (0x6db6b64890) index 7 low-index 7 color 0x0
	compute low 0x6db6b64890 ->0x6db6b74830 (SemaphoreSlim) 0x7079a041f8 (8 / 8, color 0x0)
	compute low 0x6db6b64890 ->0x6d21ff42f0 (ByteArrayOutputStream) 0x7079a04018 (0 / 0, color 0x7079a06010)
		add color 0x7079a06010 to color_merge_array
-finished OutputStreamInvoker (0x6db6b64890) index 7 low-index 7 color 0x0
|SCC 0x7079a06010 rooted in OutputStreamInvoker (0x6db6b64890) has bridge 0
	loop stack: (1/1)(2/2)(3/3)(4/4)(5/5)(6/6)(7/7)
	member OutputStreamInvoker (0x6db6b64890) index 7 low-index 7 color 0x0 state 2
-finishing <AsyncStreamWriter>d__2 (0x6db6b827f0) index 6 low-index 6 color 0x0
	compute low 0x6db6b827f0 ->0x6db6b827a0 (AsyncStateMachineBox`1) 0x7079a04168 (5 / 5, color 0x0)
	compute low 0x6db6b827f0 ->0x6d21ff42f0 (ByteArrayOutputStream) 0x7079a04018 (0 / 0, color 0x7079a06010)
		add color 0x7079a06010 to color_merge_array
	compute low 0x6db6b827f0 ->0x6db6b64890 (OutputStreamInvoker) 0x7079a041c8 (7 / 7, color 0x7079a06010)
	compute low 0x6db6b827f0 ->0x6d24015e60 (ReadWriteTask) 0x0 (-2 / -2, color 0x0)
-finished <AsyncStreamWriter>d__2 (0x6db6b827f0) index 6 low-index 5 color 0x0
-finishing AsyncStateMachineBox`1 (0x6db6b827a0) index 5 low-index 5 color 0x0
	compute low 0x6db6b827a0 ->0x6db6b79a30 (Action) 0x7079a04138 (4 / 4, color 0x0)
	compute low 0x6db6b827a0 ->0x6d24010cf8 (ExecutionContext) 0x0 (-2 / -2, color 0x0)
	compute low 0x6db6b827a0 ->0x6db6b827f0 (<AsyncStreamWriter>d__2) 0x7079a04198 (6 / 5, color 0x0)
-finished AsyncStateMachineBox`1 (0x6db6b827a0) index 5 low-index 4 color 0x0
-finishing Action (0x6db6b79a30) index 4 low-index 4 color 0x0
	compute low 0x6db6b79a30 ->0x6db6b827a0 (AsyncStateMachineBox`1) 0x7079a04168 (5 / 4, color 0x0)
-finished Action (0x6db6b79a30) index 4 low-index 4 color 0x0
|SCC 0x0 rooted in Action (0x6db6b79a30) has bridge 0
	loop stack: (1/1)(2/2)(3/3)(4/4)(5/4)(6/5)
	member <AsyncStreamWriter>d__2 (0x6db6b827f0) index 6 low-index 5 color 0x0 state 2
	member AsyncStateMachineBox`1 (0x6db6b827a0) index 5 low-index 4 color 0x0 state 2
	member Action (0x6db6b79a30) index 4 low-index 4 color 0x0 state 2
-finishing <>c__DisplayClass2_0 (0x6db6b649d0) index 3 low-index 3 color 0x0
	compute low 0x6db6b649d0 ->0x6db6b78b30 (SendOrPostCallback) 0x0 (-2 / -2, color 0x0)
	compute low 0x6db6b649d0 ->0x6db6b79a30 (Action) 0x7079a04138 (4 / 4, color 0x0)
-finished <>c__DisplayClass2_0 (0x6db6b649d0) index 3 low-index 3 color 0x0
|SCC 0x0 rooted in <>c__DisplayClass2_0 (0x6db6b649d0) has bridge 0
	loop stack: (1/1)(2/2)(3/3)
	member <>c__DisplayClass2_0 (0x6db6b649d0) index 3 low-index 3 color 0x0 state 2
-finishing Action (0x6db6b789b0) index 2 low-index 2 color 0x0
	compute low 0x6db6b789b0 ->0x6db6b649d0 (<>c__DisplayClass2_0) 0x7079a04108 (3 / 3, color 0x0)
-finished Action (0x6db6b789b0) index 2 low-index 2 color 0x0
|SCC 0x0 rooted in Action (0x6db6b789b0) has bridge 0
	loop stack: (1/1)(2/2)
	member Action (0x6db6b789b0) index 2 low-index 2 color 0x0 state 2
-finishing RunnableImplementor (0x6db6b821b0) index 1 low-index 1 color 0x0
	compute low 0x6db6b821b0 ->0x6db6b789b0 (Action) 0x7079a040d8 (2 / 2, color 0x0)
-finished RunnableImplementor (0x6db6b821b0) index 1 low-index 1 color 0x0
|SCC 0x7079a06038 rooted in RunnableImplementor (0x6db6b821b0) has bridge 1
	loop stack: (1/1)
	member RunnableImplementor (0x6db6b821b0) index 1 low-index 1 color 0x0 state 2
+scanning MainActivity (0x6d21ff4280) index 9 color 0x0
-finishing MainActivity (0x6d21ff4280) index 9 low-index 9 color 0x0
-finished MainActivity (0x6d21ff4280) index 9 low-index 9 color 0x0
|SCC 0x7079a06060 rooted in MainActivity (0x6d21ff4280) has bridge 1
	loop stack: (9/9)
	member MainActivity (0x6d21ff4280) index 9 low-index 9 color 0x0 state 2
+scanning Looper (0x6d21ff42b8) index 10 color 0x0
-finishing Looper (0x6d21ff42b8) index 10 low-index 10 color 0x0
-finished Looper (0x6d21ff42b8) index 10 low-index 10 color 0x0
|SCC 0x7079a06088 rooted in Looper (0x6d21ff42b8) has bridge 1
	loop stack: (10/10)
	member Looper (0x6d21ff42b8) index 10 low-index 10 color 0x0 state 2
----summary----
bridges:
	ByteArrayOutputStream (0x6d21ff42f0) index 0 color 0x7079a06010
	RunnableImplementor (0x6db6b821b0) index 1 color 0x7079a06038
	MainActivity (0x6d21ff4280) index 9 color 0x7079a06060
	Looper (0x6d21ff42b8) index 10 color 0x7079a06088
colors after tarjan:
	0:  bridges: 0 
	0:  bridges: 1 
	0:  bridges: 9 
	0:  bridges: 10 
***** API *****
number of SCCs 4
TOTAL XREFS 0
---xrefs:

The key part of the log is that there are several cycles in the object graph (or strongly connected component / SCC in terms of how the graph is processed). The inner SCC is processed correctly and you can see that add color X to color_merge_array messages (Code) are produced that would later lead to producing xrefs in the final graph. These are eventually transferred to node xrefs here. However, once the algorithm analyzes the next SCC in the graph, it should add the collected xrefs there on this line. This never happened because this other SCC didn't have any bridge objects by itself, it had regular .NET object AND a previously processes SCC that pointed to a bridge object. Since the previously processes SCC was not considered a "bridge object", color data for the outer SCC were never allocated and then never merged with the already gathered xrefs, leading to an information loss here (color_data == NULL && dyn_array_ptr_size (&other->xrefs) > 0))...

@filipnavara
Copy link
Member Author

For those who are more visual and can bear my handscribbling:

image

Yellow are bridge objects. Blue is the first SCC in the graph, it itself points to a bridge object. Green is the second SCC graph. It points to no bridge objects by itself but it does contain the blue SCC which does. When processing the green SCC it erroneously decided that since there's no bridge objects it doesn't need to be colored and threw away the XREFs calculated from the blue SCC.

@simonrozsival
Copy link
Member

/cc @BrzVlad

@filipnavara filipnavara changed the title Fix an edge case in the Tarjan SCC that lead to losing xref information Fix an edge case in the Tarjan SCC that leads to losing xref information Feb 23, 2025
@filipnavara filipnavara changed the title Fix an edge case in the Tarjan SCC that leads to losing xref information Fix an edge case in the Tarjan GC bridge that leads to losing xref information Feb 23, 2025
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-GC-mono community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpClient ObjectDisposed after SDK upgrade from 34.0.95 -> 34.0.113
3 participants