Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ExtensionManagerServer.Shutdown idempotent #117

Merged
merged 4 commits into from
Nov 8, 2023

Conversation

lucasmrod
Copy link
Contributor

@lucasmrod lucasmrod commented Nov 7, 2023

The change in dbeefc0 causes nil dereference panics when calling ExtensionManagerServer.Shutdown more than once.

Here's a stack trace of the panic without the fix:

=== RUN   TestShutdownBasic
--- FAIL: TestShutdownBasic (0.02s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x11ea684]

goroutine 6 [running]:
testing.tRunner.func1.2({0x121b320, 0x1438880})
        /Users/luk/go/src/testing/testing.go:1545 +0x238
testing.tRunner.func1()
        /Users/luk/go/src/testing/testing.go:1548 +0x397
panic({0x121b320?, 0x1438880?})
        /Users/luk/go/src/runtime/panic.go:914 +0x21f
github.com/osquery/osquery-go.(*ExtensionManagerServer).Shutdown(0xc00010ef00, {0xc0001544e0?, 0x0?})
        /Users/luk/fleetdm/git/forks/osquery-go/server.go:318 +0xa4
github.com/osquery/osquery-go.TestShutdownBasic(0xc0001544e0)
        /Users/luk/fleetdm/git/forks/osquery-go/server_test.go:218 +0x345
testing.tRunner(0xc0001544e0, 0x12888b0)
        /Users/luk/go/src/testing/testing.go:1595 +0xff
created by testing.(*T).Run in goroutine 1
        /Users/luk/go/src/testing/testing.go:1648 +0x3ad
exit status 2
FAIL    github.com/osquery/osquery-go   0.683s

}
s.serverClient.Close()
Copy link

@sharon-fdm sharon-fdm Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to close ?
(s.serverClient.Close())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it closed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. This line looks like I should have removed it in #112

Copy link

@sharon-fdm sharon-fdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
s.serverClient.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. This line looks like I should have removed it in #112

Comment on lines +258 to +260
s.mutex.Lock()
serverClient := s.serverClient
s.mutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure I understand why this helps. I think it's so that this routine can happen at the same time some other thread might have called shutdown (and set s.serverClient to nil), but I'm not totally sure I see it.

Is this about a narrow race where shutdown happens between if serverClient == nil and s.serverClient.Ping() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's narrow.

Here's the race condition if we don't add the locks (it's between the read serverClient := s.serverClient and the write s.serverClient = nil in Shutdown):

=== RUN   TestNoDeadlockOnError
==================
WARNING: DATA RACE
Write at 0x00c00010f610 by goroutine 14:
  github.com/osquery/osquery-go.(*ExtensionManagerServer).Shutdown()
      /Users/luk/fleetdm/git/forks/osquery-go/server.go:352 +0x3d2
  github.com/osquery/osquery-go.(*ExtensionManagerServer).Run()
      /Users/luk/fleetdm/git/forks/osquery-go/server.go:279 +0x1d0
  github.com/osquery/osquery-go.TestNoDeadlockOnError()
      /Users/luk/fleetdm/git/forks/osquery-go/server_test.go:57 +0x55c
  testing.tRunner()
      /Users/luk/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /Users/luk/go/src/testing/testing.go:1648 +0x44

Previous read at 0x00c00010f610 by goroutine 16:
  github.com/osquery/osquery-go.(*ExtensionManagerServer).Run.func2()
      /Users/luk/fleetdm/git/forks/osquery-go/server.go:259 +0x64

Goroutine 14 (running) created at:
  testing.(*T).Run()
      /Users/luk/go/src/testing/testing.go:1648 +0x82a
  testing.runTests.func1()
      /Users/luk/go/src/testing/testing.go:2054 +0x84
  testing.tRunner()
      /Users/luk/go/src/testing/testing.go:1595 +0x238
  testing.runTests()
      /Users/luk/go/src/testing/testing.go:2052 +0x896
  testing.(*M).Run()
      /Users/luk/go/src/testing/testing.go:1925 +0xb57
  main.main()
      _testmain.go:69 +0x2bd

Goroutine 16 (running) created at:
  github.com/osquery/osquery-go.(*ExtensionManagerServer).Run()
      /Users/luk/fleetdm/git/forks/osquery-go/server.go:255 +0x18c
  github.com/osquery/osquery-go.TestNoDeadlockOnError()
      /Users/luk/fleetdm/git/forks/osquery-go/server_test.go:57 +0x55c
  testing.tRunner()
      /Users/luk/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /Users/luk/go/src/testing/testing.go:1648 +0x44
==================
    testing.go:1465: race detected during execution of test
--- FAIL: TestNoDeadlockOnError (0.00s)

@lucasmrod lucasmrod merged commit e3cde12 into osquery:master Nov 8, 2023
3 checks passed
lucasmrod added a commit to fleetdm/fleet that referenced this pull request Nov 13, 2023
…#15017)

#15022

The issue in the package is being fixed here
osquery/osquery-go#117
But to not block on that we will downgrade the osquery-go version we
use.

- ~[ ] Changes file added for user-visible changes in `changes/` or
`orbit/changes/`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.~
- ~[ ] Documented any permissions changes (docs/Using
Fleet/manage-access.md)~
- ~[ ] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)~
- ~[ ] Added support on fleet's osquery simulator `cmd/osquery-perf` for
new osquery data ingestion features.~
- ~[ ] Added/updated tests~
- [ ] Manual QA for all new/changed functionality
  - ~For Orbit and Fleet Desktop changes:~
- [ ] Manual QA must be performed in the three main OSs, macOS, Windows
and Linux.
- [ ] Auto-update manual QA, from released version of component to new
version (see [tools/tuf/test](../tools/tuf/test/README.md)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants