Sharing data b/w pool of worker threads ? #170
-
To increase performance I have implemented pool of worker threads to do comparison of faces embeddings. I have sent data to worker threads from main thread using Problem:Just realize that, Possible solutions:just came across Worker threads are only solution to increase performance of human and tensorflowJS, but how is it practical. this is how array looks like |
Beta Was this translation helpful? Give feedback.
Replies: 16 comments 11 replies
-
good question - and you found the right problems :) i'm assuming we're talking about browser environment as nodejs workers are not worth using just yet. instead in nodejs i'd go with process pool instead, but then there is really no sharing and option 3 or 4 are the only options option 1assuming array is small, there is no issue with copying it each time you call a worker. but i'm assuming it's intended to significantly grow over time, so that's a no-go option 2in theory there shouldn't be a race condition when using which means you need to make sure that array is always valid in the main thread - and never reinitialize/empty/reduce it once its initialized or you'll get out-of-bounds on index access. but appending to it should be safe and each appended record will be picked up by a worker next time it runs option 3another idea is to copy the array once to worker upon startup and then leave it to each worker to maintain on its own and avoid further copying or sharing the main array completely and when you change array in the main thread, send message to workers with that single change so each worker can reflect the same change inside its own thread basically, you end up with n copies of array each maintained separately without any copy operations or sharing option 4if array is expected to be really big over time and memory size becomes a concern, you could use any decent database (e.g. browser-based just few ideas... |
Beta Was this translation helpful? Give feedback.
-
when i played with i use multi processing as i run parallel detections and
three options (perhaps more, those two are what i can think of):
i use
once descriptor is determined, just copy the methods |
Beta Was this translation helpful? Give feedback.
-
why? i dont know how to deal with object (in your case array of objects) transfer to and from buffer (chrome web workers support concept of transferrable objects, but nodejs doesnt have that) other than manually serialize object to string and placing string into buffer plus ending the buffer with strlenght. and then on receiving side cut strlenght from buffer and deserialize back into object and to avoid out-of-memory when serializing and deserializing large array, it should be done per-record, not per entire array. e.g. and yes, deserialization should also be done per-record and use each record in to find so whats the required size of or complicate even more by implementing sort-of paging so you fit n number of records into single id much rather go with option (3) to start with and have each worker maintain its copy of array by receiving per-record update messages from main thread |
Beta Was this translation helpful? Give feedback.
-
@uzair004 question - is there interest to implement |
Beta Was this translation helpful? Give feedback.
-
I was recently playing with AssemblyScript and got it working cleanly and got a functional loader for both Chrome and NodeJS, Just occurred to me that porting You can take a look at https://github.com/vladmandic/wasm-assemblyscript, Passing non-trivial structures is weird in WASM, but I got it working. Also memory buffers :) Also, generated WASM file is tiny - so tiny (<10kb), it could be base64 encoded and embedded in JS itself, so there are no external network or file requests at all. Which means you could have a worker thread with zero dependencies (Human or otherwise). |
Beta Was this translation helpful? Give feedback.
-
I played quite a lot with WASM and have it fully working for but at the end I found a way to accelerate built-in JS methods by 10x plus optional but...its a breaking change as input params and output structure to new version of
and entire similarity/match implementation is fully separate so you can import it directly and just in case you're interested in wasm implementation: https://github.com/vladmandic/human-match there are some additional notes there on how to reduce descriptor dimension without loss of functionality, so if you're dealing with a very large database and memory becomes an issue, that is also fully solvable - database can be compressed 8x without huge impact. |
Beta Was this translation helpful? Give feedback.
-
yes, it only takes array of descriptors and options (options are also different) and returns index and similarity. const arr = annotatedArray.map((rec) => rec.embedding); and when you have an index and want to get the label, just lookup old array const name = annotatedArray[res.index].name
maybe you'll end up with manual worker thread pool implementation - create n worker threads and send messages yourself. it's pretty simple - you can take a look at my worker process implementation as a reference, concept is the same.
you can ignore wasm implementation completely, i noted that git repository as it documents how to perform descriptor dimension reduction. |
Beta Was this translation helpful? Give feedback.
-
when testing, i use something like this: const t0 = process.hrtime.bigint();
const res = human.match(desc, arr);
const t1 = process.hrtime.bigint();
console.log('match time:', t1 - t0); this is only available in nodejs and gets time in nanoseconds, so very precise
that is useful to see where is time spent inside long calls, for example where did
that warning comes from inside i so dont care about internal messages like that (imo, internal messages should not be printed by a library unless explicitly enabled) export TF_CPP_MIN_LOG_LEVEL=2 (0 is info, 1 is warning, 2 is error, 3 is fatal) or you can set the env variable from within your app before you load process.env.TF_CPP_MIN_LOG_LEVEL = '2';
const tf = require('@tensorflow/tfjs-node'); (or before you load btw, you should see how chatty |
Beta Was this translation helpful? Give feedback.
-
take a look at https://github.com/vladmandic/human-match/tree/main/multithread i think i got multi-threading working nicely and with shared buffer array and without any libraries or dependencies :) |
Beta Was this translation helpful? Give feedback.
-
First of all, memory utilization is much much better since there is only one copy of data So memory is fixed - since each descriptor has 1024 elements, thats 4KB per descriptor fixed size Performance-wise, calculating match is the same as each thread having its own data But appending additional data or creating additional workers is now near-free with shared memory Overall, tons of benefits The key to make this possible was splitting face database array of objects into separate array of descriptors and array of labels |
Beta Was this translation helpful? Give feedback.
-
FYI, example has been cleaned up, documented and committed to the main branch under |
Beta Was this translation helpful? Give feedback.
-
If Plus this is only done on worker start, never repeated Only meaningful transfer was descriptor itself to compare with for each |
Beta Was this translation helpful? Give feedback.
-
Hi Vladimir, Thanks for making this great project! Regarding this copy of the image data buffer passed to the worker: human/demo/multithread/index.js Line 167 in 0ea905e The second argument is a single-item array containing a copy of the buffer, while the first argument contains the original buffer without copying. I'm not understanding what the 2nd argument is used for here? The documented function signature of postMessage says that the second argument is the targetOrigin: |
Beta Was this translation helpful? Give feedback.
-
the first parameter says what to transfer, but the parameter in normally i love MDN site but here its wrong - see actual specification: https://html.spec.whatwg.org/multipage/web-messaging.html#posting-messages |
Beta Was this translation helpful? Give feedback.
-
Just curious, what's the purpose of |
Beta Was this translation helpful? Give feedback.
-
JS engine doesn't care if the names are same or not, just if structure matches - which it does, so its transferrable and if working with a single web worker, then here its present because same buffer data is transferred to multiple web workers in parallel and same data buffer can only be transferred once and i find its faster to create a copy of a buffer (using slice) in the main thread and then transfer it than not use transferrable data and let JS engine perform a serialization to achieve deep clone |
Beta Was this translation helpful? Give feedback.
good question - and you found the right problems :)
i'm assuming we're talking about browser environment as nodejs workers are not worth using just yet. instead in nodejs i'd go with process pool instead, but then there is really no sharing and option 3 or 4 are the only options
option 1
assuming array is small, there is no issue with copying it each time you call a worker. but i'm assuming it's intended to significantly grow over time, so that's a no-go
option 2
in theory there shouldn't be a race condition when using
sharedbufferarray
if array is always a valid array in the main thread asmatch()
function takes array as-is (no additional copies) and just runsfor ... of
loop on it (calc…