Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix self-referencing closure memory leaks #161

Draft
wants to merge 2 commits into
base: 4
Choose a base branch
from
Draft

Conversation

diamondburned
Copy link
Owner

This commit experiments with refactoring package core/intern to use
the newly added runtime.AddCleanup and weak.Pointer APIs from Go
1.24.

This commit has been tested against issues #106 and #126 to ensure
stability. It is still not recommended to use this in production
until further testing has been done, however.

Copy link
Contributor

@RSWilli RSWilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe looking at https://github.com/go-gst/go-glib/blob/main/glib/connect.go can help, the connect implementation there is pretty robust.

Comment on lines +137 to +141
// weak.Pointer's behavior is to be invalidated by the time the
// finalizer is called, so we temporarily resurrect the box so that
// destroy signal handlers can obtain it. We'll purge it from the
// registry after the destroy callbacks.
shared.weak[obj] = weak.Make(box)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a very bad idea, what are you trying to solve here? Maybe I can help.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 23, 2025

An easy fix for #106 and #126 is the following, I don't know if this can be solved any other way:

app := gtk.NewApplication("com.github.diamondburned.gotk4-examples.gtk4.simple", gio.ApplicationFlagsNone)
weakApp := weak.Make(app)
app.ConnectActivate(func() {
	app := weakApp.Value()
	if app == nil {
		return // activated after cleanup?
	}
	window1 := gtk.NewApplicationWindow(app)

	// ...
})

The problem is that the signal handler closes over app (you can see this in delve by stepping into ConnectActivate and looking at the function). This means that the function itself prevents the cleanup of app, but because the app is never cleaned up, the signal handler closure is never removed either, keeping the function alive, which keeps the app alive.

There are two ways to solve this with the current bindings:

  1. never close over the instance or anything that has a (strong) reference to it. This is quite hard and can be easily done accidentally, e.g. with nested structs.
  2. detach the signal handler manually with Disconnect when you are done with the object. This breaks the cycle.

There is no other way of doing this from the bindings, because you cannot know which references an object has that the closure closes over, because these references might be in the C code.

@diamondburned
Copy link
Owner Author

This PR already works for those 2 issues without needing the user to explicitly use weak references themselves.

My main goal with keeping this a draft right now is just to encourage others to test it and report any new bugs/regressions before I go ahead and merge it.

@diamondburned
Copy link
Owner Author

@RSWilli

There is no other way of doing this from the bindings, because you cannot know which references an object has that the closure closes over, because these references might be in the C code.

There actually is!

gotk4 accomplishes this by using Go's weak pointers and GLib's toggle references to be able to detect when to appropriately free objects for this reason. Specifically:

  1. When the object is first allocated, in most cases, the caller is assumed to be the owner, so the object is referenced from the Go side. The toggle reference callback tells us this.
    • At this point, the object is stored in Go as a weak pointer. More on this later [1].
  2. When the object is shared to the C side (e.g., a box is being displayed), C takes its own reference, and the toggle reference callback updates us.
    • At this point, the object is stored in Go as a strong pointer.
    • The object's signal callbacks are stored within the object itself rather than globally stored on its own.
  3. When a signal is emitted, gotk4 does a lookup on the associated object to find the callback to be invoked.
  4. When the object is no longer needed, C stops taking references to it. This drops the reference count back to 1 and the toggle reference callback updates us. The object will be stored once again as a weak pointer.
  5. At this point, Go will also notify us as soon as no other Go-side references are made to the object, as it is kept as a weak pointer [1]. Once this assurance has been made, we can start tearing down our reference as usual. [2]
  6. Once our reference has been torn down, GLib will start calling all the finalizers (including widget destroy signal callbacks), then free up the object.

[2]: This should answer your code review for shared.weak[obj] = weak.Make(box)!

@diamondburned
Copy link
Owner Author

I believe GJS also uses toggle references for this reason.

This commit experiments with refactoring package `core/intern` to use
the newly added `runtime.AddCleanup` and `weak.Pointer` APIs from Go
1.24.

This commit has been tested against issues #106 and #126 to ensure
stability. It is still not recommended to use this in production
until further testing has been done, however.
@diamondburned
Copy link
Owner Author

Also side note:

Specifically for issue #106, referencing window inside the destroy signal callback will not work. This makes sense, as it's meant to be a destruction callback, so you may as well assume that window is an invalid object by that point. It also only applies to the destroy signal itself.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 24, 2025

Go will also notify us as soon as no other Go-side references are made to the object, as it is kept as a weak pointer

How would this work in the following use case? It is using gstreamer elements, because I'm not familiar with gtk.

el := gst.ElementFactoryMake("audiotestsrc", "").(*gst.Element)

el.ConnectPadAdded(func(pad *gst.Pad) {
    el.GetObjectProperty("")
})

runtime.GC()

// potentially more references to el after connecting
el.SetObjectProperty("", "")
  1. There is only one reference on el, which is controlled by the go GC.
  2. The only reference to the func is in the bindings in the closure box.
  3. the element may be used after connecting a signal, but not guaranteed

We have some places to break the cycle here:

  1. in the element itself: reference the finalized struct with a weak pointer. This means that we either need a second reference somewhere else, or we risk premature cleanup
  2. in the box, where the closure is referenced: this would mean that the GC will clean up the function, resulting in an error when the signal is finally emitted.

The thing with this example is that there is no additional reference taken. It is just the closure that prevents the GC of the element.

IMO doing anything else with weak pointers in the bindings will not work in all cases and will potentially introduce hard to find bugs.

@diamondburned
Copy link
Owner Author

The thing with this example is that there is no additional reference taken. It is just the closure that prevents the GC of the element.

Then what's the point of this object existing? It would have to make it over to the C side eventually, because the core logic of everything is in C, so if it's not, then what is it doing by just being on the Go side?

If el is used after the signal is bound but only on the Go side, then as soon as it is not being used anymore, Go's GC will run the finalizers needed and free up the object. This works the same as what I wrote, minus steps 2 to 5, pretty much.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 24, 2025

Then what's the point of this object existing?

The object can still create threads and do stuff on the c side. Toplevel structs like pipelines in gstreamer are owned by the user and do everything in C, but the only reference there is exists in go. That means that this will leak if we create a reference cycle, but be prematurely cleaned up if we drop the reference.

I talked to @sdroege about this and he thinks that toggle refs will not solve the problem and potentially create more issues. Quoting him:

  1. toggle refs are not composable, meaning that if gotk4 calls into code created by other bindings using toggle refs it will break. This is definitely a use case with gstreamer plugins.
  2. There are more usecases where you can accidentally create a cycle without knowing it. E.g. referencing an object that has a reference to the object where the closure is attached, e.g. with callbacks (easily reproducable with signals)

His pseudo example:

pipeline.bus().add_watch(|bus, msg| {
    pipeline.do_things();
});

The pipeline owns the bus, the bus has a watch and the watch references the pipeline -> cycle! There is no way to detect this from the runtime. Python, JS, and rust bindings also leak in this case. The user has to resort to weak pointers here.

@diamondburned
Copy link
Owner Author

Python, JS, and rust bindings also leak in this case. The user has to resort to weak pointers here.

I think in this case, I would be happy with letting the user handle this for Go as well then. Still, I'm not willing to compromise the ease of use for all of GTK just for a handful of edge cases.

  1. toggle refs are not composable, meaning that if gotk4 calls into code created by other bindings using toggle refs it will break. This is definitely a use case with gstreamer plugins.

This is a fair point! How does GJS and other stuff handle this? They definitely rely on toggle references too, AFAICT.

@sdroege
Copy link

sdroege commented Feb 24, 2025

How does GJS and other stuff handle this?

They don't (and Python AFAIK doesn't even have a GWeakRef binding) but GTK has a hack to make this less of a problem: closing a window disposes all widgets inside it, and that gets rid of all the signal handlers and by that breaks most of the reference cycles.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 24, 2025

Still, I'm not willing to compromise the ease of use for all of GTK just for a handful of edge cases.

I'm not familiar with GTK, but I don't think that these cases are that rare in reality. If you never reference the object itself in the closure, then the current implementation already works fine. I'd expect this to be true for e.g. buttons that perform some action without giving user feedback (to not reference any other object in the tree). But if you want to show feedback then you'd need a handle to your app or similar, which then creates a circular reference again.

In gstreamer the trivial cases also do not produce that issue, but once you get more advanced then you will somehow react to the signal which easily creates reference cycles.

Python AFAIK doesn't even have a GWeakRef binding

The good thing is that since go 1.24 we don't even need new bindings for this, we can just pass a go weak pointer.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 24, 2025

A way to warn the user would be to inspect the closure context for anything that contains a gotk4 internal object. This is not documented at all though, I tried to look at how delve does it, and they are just casting the function to a struct to inspect the internal pointers.

But I think that once you go there you open pandora's box. Doing this at runtime could potentially introduce massive performance penalties.

@diamondburned
Copy link
Owner Author

But if you want to show feedback then you'd need a handle to your app or similar, which then creates a circular reference again.

This case is being handled perfectly fine by this PR, with everything being GC'd as normal.

I believe the edge case here is more the "pipeline owns the bus, the bus has a watch and the watch references the pipeline -> cycle" part though. It would help me tremendously if there's like a directed graph diagram for this sort of thing...

The good thing is that since go 1.24 we don't even need new bindings for this, we can just pass a go weak pointer.

I'm not too sure, actually. Right now, gotk4 explicitly supports GWeakRef for this exact issue, but it does not use Go's weak pointer. The GWeakRef is moreso to obtain a weak pointer to a C object.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 25, 2025

I'm not too sure, actually

As shown above: #161 (comment) , this prevents the closure from closing over the finalized object, it instead closes over the weak pointer, which allows the underlying C object to be cleaned up.

This case is being handled perfectly fine by this PR, with everything being GC'd as normal.

I don't understand how your case is handled if my example gives you headaches. How does your solution prevent something similar to this from being a cycle?

  1. Click handler on a button that triggers an action on widget X
  2. X handles the action, for which it doesn't need any other dependency
  3. but X also has a reference to widget Y for another usecase
  4. Y disables the Button if X triggers the right action for it.

If this is all happening in go then fine, but as soon as one of those components is implemented e.g. in C then there is more than one reference taken, and a leak would be undetectable.

It would help me tremendously if there's like a directed graph diagram for this sort of thing...

That would require traversing the specific implementation of each object. This would require big changes to GObject.

@sdroege
Copy link

sdroege commented Feb 25, 2025

I don't understand how your case is handled if my example gives you headaches. How does your solution prevent something similar to this from being a cycle?
[...]

Once gtk_widget_destroy() / gtk_window_destroy() is called (e.g. when closing the window), the two signal handlers are disconnected, which breaks the reference cycle in two places.

If that wasn't the case with GTK, I don't see either how this wouldn't leak.

@diamondburned
Copy link
Owner Author

diamondburned commented Feb 26, 2025

I don't understand how your case is handled if my example gives you headaches. How does your solution prevent something similar to this from being a cycle?

I think you should just give the PR a try. I think you and I are describing very different edge cases to the same problem, so having reproducible code (or at least a simple directed graph diagram illustrating the problem) would help me a lot in figuring out the right solution to both cases. As it stands, both #106 and #126 work with this PR, and nothing gets leaked.

@diamondburned
Copy link
Owner Author

FWIW, I agree that this PR might not work for GStreamer specifically, and I'd be happy to draft up some kind of opt-out mechanism for GStreamer objects to not be memory-managed in such manner.

In the GTK world, there's not a lot of multi-language interop, though. It's usually just C and Go, maybe Vala.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 26, 2025

I get that it fixes the specific problems, but I find the intern package hard to reason about for several reasons: Objects get resurrected, toggle refs check for GC on the C side, etc.

I have one more question: Who holds a strong reference to the box if it gets moved to the intern weak part? From what I understand, the box holds the go closure that needs to be called in the go marshal function.

I'll experiment a bit with the changes.

@diamondburned
Copy link
Owner Author

Who holds a strong reference to the box if it gets moved to the intern weak part?

If the box is intern-weaked, then that implies that the C side holds no reference to this box. The only other references would just be other Go code that still needs this object. Once that's no longer true, the box gets freed.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 26, 2025

Wait - the box holds the object, not the closure?

The closure is saved in a global Slab here: https://github.com/diamondburned/gotk4/blob/4/pkg/core/gbox/box.go#L88

A slab doesn't have any weak pointers - because it mustn't have any. Meaning that you always have a strong reference to to the closure.

If the closure closes over the object, then the object is strongly referenced too, meaning that it will never leave go's heap. That means that you still easily leak the object, because it is always alive on the go heap. I tried this branch with the gst bindings, and did the following:

package main

import (
	"log"
	"runtime"
	"time"

	"github.com/go-gst/go-gst/pkg/gst"
)

func mkPipeline() *gst.Pipeline {
	ret, err := gst.ParseLaunch("audiotestsrc ! decodebin name=decodebin0 ! fakesink")

	if err != nil {
		panic(err)
	}

	pipeline := ret.(*gst.Pipeline)

	el := pipeline.ByName("decodebin0").(*gst.Bin)

	h := el.ConnectPadAdded(func(newPad *gst.Pad) {
		el.SetName("foobar") // dummy method, to make the handler close over el
	})

	runtime.AddCleanup(el, func(_ struct{}) {
		log.Println("garbage collected element")
	}, struct{}{})

	// runtime.AddCleanup(pipeline, func(el *gst.Bin) {
	// 	el.HandlerDisconnect(h)
	// }, el)

	return pipeline
}

func main() {
	gst.Init()

	for range 100 {
		pipeline := mkPipeline()

		runtime.GC()

		pipeline.State(gst.ClockTimeNone)

		runtime.GC()
		runtime.GC()
		runtime.GC()
		runtime.GC()
	}

	runtime.GC()
	time.Sleep(2 * time.Second)
	runtime.GC()
}

This creates a new pipeline, retrieves an object from it, attaches a signal handler and adds a cleanup to the object, where it should log "garbage collected element". This line never happens, because of the reference cycle. el is closed by the closure and thus always strongly referenced.

There are two ways to break the cycle here:

  • either use a weak pointer to access el in the signal handler
  • or disconnect the signal handler on el when pipeline gets cleaned up (commented out above)

In both cases the log line appears. I don't think that this PR is doing what you want to @diamondburned , and I don't know if it is even possible.

@diamondburned
Copy link
Owner Author

Wait - the box holds the object, not the closure?

The closure is saved in a global Slab here: https://github.com/diamondburned/gotk4/blob/4/pkg/core/gbox/box.go#L88

This isn't true.

The box holds both the object and the closures. See intern.Box type.

gbox is only used to save weak pointers for recalling auxiliary data.

This line never happens, because of the reference cycle. el is closed by the closure and thus always strongly referenced.

You'd want to enable log/slog with debug levels then run the program with GOTK4_DEBUG=all, then watch the allocations.

In both cases the log line appears. I don't think that this PR is doing what you want to @diamondburned , and I don't know if it is even possible.

-_-

Why are you acting like I've not done any testing myself? Have you even bothered looking into #126 and running it with this PR? I never claimed it'll work for GStreamer.

@RSWilli
Copy link
Contributor

RSWilli commented Feb 26, 2025

You'd want to enable log/slog with debug levels then run the program with GOTK4_DEBUG=all, then watch the allocations.

Relevant log section:

2025/02/26 17:28:10 DEBUG allocating new box for object stack="goroutine 1 [running]:\nruntime/debug.Stack()\n\t/nix/store/vh5d5bj1sljdhdypy80x1ydx2jx6rv2q-go-1.24.0/share/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/diamondburned/gotk4/pkg/core/intern.newBox(0x73f658004050)\n\t/home/wbartel/projects/go-gst/gotk4/pkg/core/intern/intern.go:173 +0xb0\ngithub.com/diamondburned/gotk4/pkg/core/intern.Get(0x73f658004050, 0x0)\n\t/home/wbartel/projects/go-gst/gotk4/pkg/core/intern/intern.go:230 +0x69\ngithub.com/diamondburned/gotk4/pkg/core/glib.newObject(...)\n\t/home/wbartel/projects/go-gst/gotk4/pkg/core/glib/glib.go:637\ngithub.com/diamondburned/gotk4/pkg/core/glib.AssumeOwnership(...)\n\t/home/wbartel/projects/go-gst/gotk4/pkg/core/glib/glib.go:627\ngithub.com/go-gst/go-gst/pkg/gst.(*Bin).ByName(0xc0000e6030, {0x64a5fb, 0xa})\n\t/home/wbartel/projects/go-gst/go-gst/pkg/gst/gst.go:14143 +0x117\nmain.mkPipeline()\n\t/home/wbartel/projects/go-gst/go-gst/examples/signals/main.go:21 +0x59\nmain.main()\n\t/home/wbartel/projects/go-gst/go-gst/examples/signals/main.go:54 +0x24f\n" gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=2
2025/02/26 17:28:10 DEBUG Get: will introduce new box for object gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=2
2025/02/26 17:28:10 DEBUG Get: added toggle reference to object gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=3
2025/02/26 17:28:10 DEBUG Get: not taking, so unref'd the object gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=2
2025/02/26 17:28:10 DEBUG cleaning up object box=true gobject.ptr=0x73f658007e50 gobject.type=GstPipeline gobject.refs=1
2025/02/26 17:28:10 DEBUG makeWeak: obtained box strong=true box=true gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=1
2025/02/26 17:28:10 DEBUG goToggleNotify: toggled notify for object is_last=true finalizing=false gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=1
2025/02/26 17:28:10 DEBUG post-finalizer aftermath for box is_weak=true is_strong=false gobject.ptr=0x73f658007e50 gobject.type=GstPipeline gobject.refs=1

I think the reference cycle happens in the box itself? I'm not sure, but I can definitely say that the objects are not getting cleaned up and are still being referenced strongly by go.

I never claimed it'll work for GStreamer.

Sorry if it sounded rude, but what are you trying to solve here? A bug in the GTK bindings or a bug in the GLib bindings?

I have not looked into the original issue, because I don't know GTK. Using the gst bindings is an easy way for me to attach a signal handler that does not get detached by any side effects. The code above doesn't do anything else besides creating new object instances and attach some signal handlers. In fact it can be simplified even more:

package main

import (
	"log"
	"log/slog"
	"runtime"
	"time"

	"github.com/go-gst/go-gst/pkg/gst"
)

func mkObject() {
	el := gst.ElementFactoryMake("decodebin", "").(*gst.Bin)

	el.ConnectPadAdded(func(newPad *gst.Pad) {
		el.SetName("foobar")
		// log.Println("handler")
	})

	runtime.AddCleanup(el, func(_ struct{}) {
		log.Println("garbage collected element")
	}, struct{}{})

	return
}

func main() {
	gst.Init()

	slog.SetLogLoggerLevel(slog.LevelDebug)

	log.Println("go:")
	for range 5 {
		log.Println("loop")
		mkObject()

		runtime.GC()
		runtime.GC()
		runtime.GC()
	}

	runtime.GC()
	time.Sleep(2 * time.Second)
	runtime.GC()
}

@RSWilli
Copy link
Contributor

RSWilli commented Feb 26, 2025

I think to the go GC the reference "cycle" looks as follows:

  • the intern.strong map contains the box with the c object pointer and closure.Registry. Since this is a global map everything in it is always reachable.
  • the closures.Registry contains all the FuncStacks, aka stack traces and actual closures
  • the closure in the FuncStack references the go object if the variable is used inside of it.

Thus the GC always thinks that the go object is always reachable, even if no other scope is able to call the closure. It cannot delete the map entry, because at any time the _gotk4_marshal might decide to call the closure (it wont, but the GC doesn't know this)

@RSWilli
Copy link
Contributor

RSWilli commented Feb 26, 2025

For GTK the cycle is automatically broken if the signal handler is removed, like @sdroege said above:

Once gtk_widget_destroy() / gtk_window_destroy() is called (e.g. when closing the window), the two signal handlers are disconnected, which breaks the reference cycle in two places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants