Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bagpipe: Store a crossbeam GC internally, don't use global singleton #165

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

joshlf
Copy link
Collaborator

@joshlf joshlf commented Mar 7, 2018

  • Store a crossgeam GC collector in the BagPipe
  • Use this GC instance for pinning, eliminating the dependency on the global singleton, and thus on thread-local storage
  • Remove bsalloc dependency from elfmalloc and move it into elfc so that running elfmalloc tests is faster
  • Run elfmalloc tests with opt-level=3 in Travis

@joshlf joshlf requested a review from ezrosent March 7, 2018 13:13
@joshlf joshlf force-pushed the crossbeam-epoch branch from f4eaa34 to 0b35be4 Compare March 7, 2018 17:45
@joshlf
Copy link
Collaborator Author

joshlf commented Mar 8, 2018

Performance numbers (on a 2015 model MacBook Pro):

Without this PR
Enqueue-Dequeue Strong No Prefilling

bp-YangCrummeyQueue: 4 threads, 1.9519816599623547 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
bp-FAAQueueLowLevel: 4 threads, 1.9508903373507258 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAQueueLowLevel: 4 threads, 1.5325428922878377 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAArrayQueue: 4 threads, 1.3955960059034025 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
YangCrummeyQueue: 4 threads, 1.5944080400390173 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
GeneralYC: 4 threads, 1.3022460408411143 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
TreiberStack: 4 threads, 1.1958433627164262 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
MsQueue: 4 threads, 1.3739646670334456 Mops/s 0 failed pushes 0 failed pops. Prefilled 0

Enqueue-Dequeue No Prefilling

bp-YangCrummeyQueue: 4 threads, 1.9126278783385056 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
bp-FAAQueueLowLevel: 4 threads, 2.0073112712600327 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAQueueLowLevel: 4 threads, 2.8624629592473236 Mops/s 138 failed pushes 136 failed pops. Prefilled 0
FAAArrayQueue: 4 threads, 2.1795510159328493 Mops/s 139 failed pushes 138 failed pops. Prefilled 0
YangCrummeyQueue: 4 threads, 2.3053213667575205 Mops/s 190 failed pushes 190 failed pops. Prefilled 0
GeneralYC: 4 threads, 2.074998794124231 Mops/s 142 failed pushes 143 failed pops. Prefilled 0
TreiberStack: 4 threads, 1.9188421537327673 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
MsQueue: 4 threads, 1.87238776420338 Mops/s 0 failed pushes 0 failed pops. Prefilled 0

Enqueue-Dequeue Prefilling

bp-YangCrummeyQueue: 4 threads, 1.4954125643700105 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
bp-FAAQueueLowLevel: 4 threads, 1.6015368140624138 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
FAAQueueLowLevel: 4 threads, 2.442552066439123 Mops/s 21 failed pushes 21 failed pops. Prefilled 1024
FAAArrayQueue: 4 threads, 2.2718775593331975 Mops/s 28 failed pushes 28 failed pops. Prefilled 1024
YangCrummeyQueue: 4 threads, 2.388560569454438 Mops/s 76 failed pushes 76 failed pops. Prefilled 1024
GeneralYC: 4 threads, 2.1810759871006558 Mops/s 56 failed pushes 56 failed pops. Prefilled 1024

Producer-Consumer Strong No Prefilling

bp-YangCrummeyQueue: 4 threads, 2.1998970938182336 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
bp-FAAQueueLowLevel: 4 threads, 2.2791934609168933 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAQueueLowLevel: 4 threads, 2.1494337518634508 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAArrayQueue: 4 threads, 1.8978158600478086 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
YangCrummeyQueue: 4 threads, 1.9943881585485483 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
GeneralYC: 4 threads, 1.7031150453258077 Mops/s 0 failed pushes 0 failed pops. Prefilled 0

Producer-Consumer No Prefilling

bp-YangCrummeyQueue: 4 threads, 3.0048192546668395 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
bp-FAAQueueLowLevel: 4 threads, 3.1721368321619563 Mops/s 0 failed pushes 1 failed pops. Prefilled 0
FAAQueueLowLevel: 4 threads, 3.1755784373107208 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAArrayQueue: 4 threads, 3.1496903104104965 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
YangCrummeyQueue: 4 threads, 3.025112964705967 Mops/s 0 failed pushes 27 failed pops. Prefilled 0
GeneralYC: 4 threads, 2.52991403499883 Mops/s 0 failed pushes 0 failed pops. Prefilled 0

Producer-Consumer Prefilling

bp-YangCrummeyQueue: 4 threads, 2.4498323090419944 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
bp-FAAQueueLowLevel: 4 threads, 2.8042789901583225 Mops/s 0 failed pushes 4 failed pops. Prefilled 1024
FAAQueueLowLevel: 4 threads, 3.207348378892375 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
FAAArrayQueue: 4 threads, 2.523821730251052 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
YangCrummeyQueue: 4 threads, 2.6932331802312777 Mops/s 0 failed pushes 494 failed pops. Prefilled 1024
GeneralYC: 4 threads, 3.200029297143223 Mops/s 0 failed pushes 21139 failed pops. Prefilled 1024

With this PR
Enqueue-Dequeue Strong No Prefilling

bp-YangCrummeyQueue: 4 threads, 1.9140855119841864 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
bp-FAAQueueLowLevel: 4 threads, 1.905525236498113 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAQueueLowLevel: 4 threads, 1.984808842311884 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAArrayQueue: 4 threads, 1.820577589984337 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
YangCrummeyQueue: 4 threads, 2.017594224538017 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
GeneralYC: 4 threads, 1.787036784555395 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
TreiberStack: 4 threads, 1.6170679678352484 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
MsQueue: 4 threads, 1.5945829169590078 Mops/s 0 failed pushes 0 failed pops. Prefilled 0

Enqueue-Dequeue No Prefilling

bp-YangCrummeyQueue: 4 threads, 1.9101245430253515 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
bp-FAAQueueLowLevel: 4 threads, 1.8455122520327654 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAQueueLowLevel: 4 threads, 2.0981297284534595 Mops/s 98 failed pushes 98 failed pops. Prefilled 0
FAAArrayQueue: 4 threads, 1.9531951682180342 Mops/s 126 failed pushes 126 failed pops. Prefilled 0
YangCrummeyQueue: 4 threads, 2.1397299513698202 Mops/s 162 failed pushes 163 failed pops. Prefilled 0
GeneralYC: 4 threads, 1.8859149138124263 Mops/s 144 failed pushes 148 failed pops. Prefilled 0
TreiberStack: 4 threads, 1.5826296160458106 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
MsQueue: 4 threads, 1.3828428152362144 Mops/s 0 failed pushes 0 failed pops. Prefilled 0

Enqueue-Dequeue Prefilling

bp-YangCrummeyQueue: 4 threads, 1.9887120087641126 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
bp-FAAQueueLowLevel: 4 threads, 1.9035677281063597 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
FAAQueueLowLevel: 4 threads, 1.8953879524605721 Mops/s 26 failed pushes 26 failed pops. Prefilled 1024
FAAArrayQueue: 4 threads, 1.8369257504835048 Mops/s 17 failed pushes 17 failed pops. Prefilled 1024
YangCrummeyQueue: 4 threads, 2.0995160550780176 Mops/s 37 failed pushes 37 failed pops. Prefilled 1024
GeneralYC: 4 threads, 1.7750380617345993 Mops/s 73 failed pushes 74 failed pops. Prefilled 1024

Producer-Consumer Strong No Prefilling

bp-YangCrummeyQueue: 4 threads, 1.7520945076612027 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
bp-FAAQueueLowLevel: 4 threads, 1.6710647464759385 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAQueueLowLevel: 4 threads, 2.1687702144051957 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAArrayQueue: 4 threads, 1.9469605432526738 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
YangCrummeyQueue: 4 threads, 2.1090512672498236 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
GeneralYC: 4 threads, 1.9558943744366253 Mops/s 0 failed pushes 0 failed pops. Prefilled 0

Producer-Consumer No Prefilling

bp-YangCrummeyQueue: 4 threads, 1.881443555625732 Mops/s 2 failed pushes 56 failed pops. Prefilled 0
bp-FAAQueueLowLevel: 4 threads, 1.9786879522960363 Mops/s 0 failed pushes 6 failed pops. Prefilled 0
FAAQueueLowLevel: 4 threads, 2.0510475281387337 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
FAAArrayQueue: 4 threads, 1.924439342342764 Mops/s 0 failed pushes 0 failed pops. Prefilled 0
YangCrummeyQueue: 4 threads, 2.038746004464112 Mops/s 14275 failed pushes 14275 failed pops. Prefilled 0
GeneralYC: 4 threads, 1.9028492126250385 Mops/s 1795 failed pushes 1795 failed pops. Prefilled 0

Producer-Consumer Prefilling

bp-YangCrummeyQueue: 4 threads, 1.617036172875394 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
bp-FAAQueueLowLevel: 4 threads, 1.8528148242714402 Mops/s 0 failed pushes 7 failed pops. Prefilled 1024
FAAQueueLowLevel: 4 threads, 2.1076459692810503 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
FAAArrayQueue: 4 threads, 1.6452696434448366 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
YangCrummeyQueue: 4 threads, 2.050858985548702 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024
GeneralYC: 4 threads, 1.8027921050821814 Mops/s 0 failed pushes 0 failed pops. Prefilled 1024

@joshlf joshlf force-pushed the crossbeam-epoch branch 3 times, most recently from e6aa5fd to a6f64c4 Compare March 8, 2018 10:41
@joshlf joshlf force-pushed the crossbeam-epoch branch from a6f64c4 to ab62e20 Compare March 9, 2018 07:58
@joshlf joshlf force-pushed the crossbeam-epoch branch from ab62e20 to c3f02cd Compare March 9, 2018 08:11
@joshlf joshlf force-pushed the crossbeam-epoch branch 3 times, most recently from 5bbdb00 to ff76f84 Compare March 23, 2018 13:45
- Store a crossgeam GC collector in the BagPipe
- Use this GC instance for pinning, eliminating the dependency on
  the global singleton, and thus on thread-local storage
- elfmalloc: Don't set bsalloc as the global allocator because the
  new GC allocates on clone, and if the global allocator is bsalloc,
  each clone call becomes very expensive, which makes tests run
  very slowly
- elfmalloc: Use opt-level=3 for tests in Travis
- elfc: Set bsalloc as the global allocator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant