-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ExtensionManagerServer.Shutdown
idempotent
#117
Conversation
} | ||
s.serverClient.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to close ?
(s.serverClient.Close()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it closed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. This line looks like I should have removed it in #112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
s.serverClient.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. This line looks like I should have removed it in #112
s.mutex.Lock() | ||
serverClient := s.serverClient | ||
s.mutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally sure I understand why this helps. I think it's so that this routine can happen at the same time some other thread might have called shutdown (and set s.serverClient
to nil), but I'm not totally sure I see it.
Is this about a narrow race where shutdown happens between if serverClient == nil
and s.serverClient.Ping()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's narrow.
Here's the race condition if we don't add the locks (it's between the read serverClient := s.serverClient
and the write s.serverClient = nil
in Shutdown
):
=== RUN TestNoDeadlockOnError
==================
WARNING: DATA RACE
Write at 0x00c00010f610 by goroutine 14:
github.com/osquery/osquery-go.(*ExtensionManagerServer).Shutdown()
/Users/luk/fleetdm/git/forks/osquery-go/server.go:352 +0x3d2
github.com/osquery/osquery-go.(*ExtensionManagerServer).Run()
/Users/luk/fleetdm/git/forks/osquery-go/server.go:279 +0x1d0
github.com/osquery/osquery-go.TestNoDeadlockOnError()
/Users/luk/fleetdm/git/forks/osquery-go/server_test.go:57 +0x55c
testing.tRunner()
/Users/luk/go/src/testing/testing.go:1595 +0x238
testing.(*T).Run.func1()
/Users/luk/go/src/testing/testing.go:1648 +0x44
Previous read at 0x00c00010f610 by goroutine 16:
github.com/osquery/osquery-go.(*ExtensionManagerServer).Run.func2()
/Users/luk/fleetdm/git/forks/osquery-go/server.go:259 +0x64
Goroutine 14 (running) created at:
testing.(*T).Run()
/Users/luk/go/src/testing/testing.go:1648 +0x82a
testing.runTests.func1()
/Users/luk/go/src/testing/testing.go:2054 +0x84
testing.tRunner()
/Users/luk/go/src/testing/testing.go:1595 +0x238
testing.runTests()
/Users/luk/go/src/testing/testing.go:2052 +0x896
testing.(*M).Run()
/Users/luk/go/src/testing/testing.go:1925 +0xb57
main.main()
_testmain.go:69 +0x2bd
Goroutine 16 (running) created at:
github.com/osquery/osquery-go.(*ExtensionManagerServer).Run()
/Users/luk/fleetdm/git/forks/osquery-go/server.go:255 +0x18c
github.com/osquery/osquery-go.TestNoDeadlockOnError()
/Users/luk/fleetdm/git/forks/osquery-go/server_test.go:57 +0x55c
testing.tRunner()
/Users/luk/go/src/testing/testing.go:1595 +0x238
testing.(*T).Run.func1()
/Users/luk/go/src/testing/testing.go:1648 +0x44
==================
testing.go:1465: race detected during execution of test
--- FAIL: TestNoDeadlockOnError (0.00s)
…#15017) #15022 The issue in the package is being fixed here osquery/osquery-go#117 But to not block on that we will downgrade the osquery-go version we use. - ~[ ] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information.~ - ~[ ] Documented any permissions changes (docs/Using Fleet/manage-access.md)~ - ~[ ] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements)~ - ~[ ] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features.~ - ~[ ] Added/updated tests~ - [ ] Manual QA for all new/changed functionality - ~For Orbit and Fleet Desktop changes:~ - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
The change in dbeefc0 causes nil dereference panics when calling
ExtensionManagerServer.Shutdown
more than once.Here's a stack trace of the panic without the fix: