How to improve performance of the app? #318
-
Hi, First of all, congratulations on the great work!! I have an ElectronJS app running on an M1 Mac Mini (with an integrated webcam). Currently, this app uses human repo for detecting and tracking people (gesture, mesh, attention, iris, emotion, description, and body). My client wants to improve performance by migrating only the computer vision features to Python (the main app will still use ElectronJS technology). I would like to know if there's any analog repo to this one implemented in Python (either TensorFlow or Pytorch) or even a smart way to run these models efficiently (e.g., TensorFlowLite). I was thinking about converting these TensorFlowJS models to TensorFlowLite (using tensorflowjs_converter) and running them using a Python environment. Still, I'm worried about tracking capabilities and other tasks currently implemented in JS inside this human repo (besides loading a model and computing the prediction). What recommendations could you give me regarding my requirements, given I would want to still use these repo models and tracking features? Some questions I'd like to answer:
Please let me know if I was clear with my situation and questions; otherwise, I can elaborate more. Thank you so much! |
Beta Was this translation helpful? Give feedback.
Replies: 19 comments 33 replies
-
first, you didn't say are you running in browser or node (since electron supports both)? but in general, m1 macs are a bit of a problem:
on on converting to python in general: anyhow, talking about performance in general... so first thing i'd try is to parallelize workflow:
for simple examples (both nodejs and browser) take a look at and even better example for nodejs using thread pool would be |
Beta Was this translation helpful? Give feedback.
-
TensorFlow and tfjs-node do support M1 as of (very) recently, so that should work, but I have no idea how well its optimized for M1 platform (I previously said that it's not present, but I stand corrected - it has been recently added). I have seen some installation issues as Mac will by default try to install
I don't - I said that using parallelism in general would help a lot, regardless if its in browser or in nodejs |
Beta Was this translation helpful? Give feedback.
-
some modules are not supported in nodejs environment due to missing tf kernel ops, but you should not be affected that much otherwise, usage should be near-identical, that was my goal
its the same
on m1 architecture in nodejs or you should first install |
Beta Was this translation helpful? Give feedback.
-
Yes, you do need to install I bundle tfjs in human.esm because tfjs for browsers is JS library, so bundling is clean.
I really hate webpack - every version they release breaks something :( In reality, you shouldn't need to import |
Beta Was this translation helpful? Give feedback.
-
it starts local http/https server and runs compile on-demand then navigate to https://localhost:8001/demo/typescript |
Beta Was this translation helpful? Give feedback.
-
I'm benchmarking the following migration approaches (given human is used in the browser):
During experimentation, I wanted to use the demo/node/node-video.js example but after installing all the reqs the program is not throwing errors nor processing the frames. Can you let me know if this |
Beta Was this translation helpful? Give feedback.
-
had few min free so i took a look at broken the library anyhow, ive fixed it and updated demo is online
2022-12-29 19:30:39 INFO: User: vlado Platform: linux Arch: x64 Node: v19.1.0
2022-12-29 19:30:39 INFO: { human: '3.0.1', tf: '4.1.0' }
2022-12-29 19:30:39 INFO: { input: '/home/vlado/downloads/nikki.mp4' }
2022-12-29 19:30:40 DATA: frame { frame: 1, size: 92185, shape: [ 720, 1280, 3 ], face: 0, body: 0, hand: 0, gesture: 0 }
2022-12-29 19:30:40 DATA: frame { frame: 2, size: 59674, shape: [ 720, 1280, 3 ], face: 0, body: 0, hand: 0, gesture: 0 }
...
2022-12-29 19:30:44 DATA: frame { frame: 70, size: 26332, shape: [ 720, 1280, 3 ], face: 1, body: 0, hand: 0, gesture: 5 }
2022-12-29 19:30:44 DATA: person { score: [ 0.85, 0.78 ], age: 25.7, gender: [ 0.28, 'male' ], emotion: { score: 0.32, emotion: 'angry' } }
2022-12-29 19:30:44 DATA: frame { frame: 71, size: 25712, shape: [ 720, 1280, 3 ], face: 1, body: 0, hand: 0, gesture: 5 }
2022-12-29 19:30:44 DATA: person { score: [ 0.79, 1 ], age: 29.3, gender: [ 0.12, 'male' ], emotion: { score: 0.3, emotion: 'happy' } }
2022-12-29 19:30:44 DATA: frame { frame: 72, size: 25325, shape: [ 720, 1280, 3 ], face: 1, body: 0, hand: 0, gesture: 5 }
2022-12-29 19:30:44 DATA: person { score: [ 0.7, 0.82 ], age: 29.3, gender: [ 0.11, 'male' ], emotion: { score: 0.67, emotion: 'sad' } }
2022-12-29 19:30:44 DATA: frame { frame: 73, size: 26022, shape: [ 720, 1280, 3 ], face: 1, body: 0, hand: 0, gesture: 5 } |
Beta Was this translation helpful? Give feedback.
-
Hi @vladmandic! Happy new year. Considering your previous response:
I'm wondering which is the best approach to optimize my current pipeline. This is my current human config:
The approach of streaming frames to a nodejs server to only run detection seems to be a bottleneck since the nodejs get stuck. Given my app is currently running the human pipeline in the browser (electronjs) and I want to run predictions faster without blocking the client execution, what do you recommend me to do? Is there anything I can implement or modify from my app to achieve better performance? Any guidance or recommendations would be much appreciated. |
Beta Was this translation helpful? Give feedback.
-
I already suggested the approach - I really don't see any value (plus its a massive overhead) of nginx balancer to run multiple nodejs instances. Just run single
And I've provided example on how to manage thread pool and use human like that. Note that in this case there is NO human running in main instance, all processing is done in individual threads. |
Beta Was this translation helpful? Give feedback.
-
use node threads and ipc to send message from main process to individual threads, far more efficient than child processes as each thread runs just as that - a lightweight thread while each child process requires a full nodejs runtime space. not to mention that for frequent communication, event emitter used by threads is far more efficient than ipc to send messages back and forth. take a look at |
Beta Was this translation helpful? Give feedback.
-
yes, why not. you receive websocket message in node main process, find first available non-busy thread and repost message to it. |
Beta Was this translation helpful? Give feedback.
-
shared buffer is data is so each thread can update results directly without sending them back to main thread. |
Beta Was this translation helpful? Give feedback.
-
shared memory is unstructured (raw bytes). view presents a view given a structure (for example, float is 4 bytes). you don't need it.
no, sorry. |
Beta Was this translation helpful? Give feedback.
-
Hi @vladmandic! I didn't find a performance difference between using human in the browser with WebGL vs. sending stream to nodejs and using human with TensorFlow CPU backend with worker threads. I wanted to try using human in nodejs server but with GPU instead of CPU. I'm using a MacMini M1 device and I see you posted that:
Is this still valid? Should I try to run the nodejs server detection with GPU or what strategy do you recommend to optimize the processing? |
Beta Was this translation helpful? Give feedback.
-
what did you try? web workers in browser? how many? threads in node? how many? etc...
like i wrote, |
Beta Was this translation helpful? Give feedback.
-
re: browser & workers - for high resolutions, using workers is expensive as frame data needs to be copied from gpu to cpu to be transferred. but for lower resultions, that impact is tiny, so yes, using workers is good. how many? browser only has one gl execution context, so using many workers doesnt help, but you do want main thread plus two workers so one is always busy (using just one worker does not saturate the pipeline). and yes, each worker should use re: node & threads: - ideally you want to have same number of threads as cpu cores to fully saturate your cpu. monitor your cpu load. re: performance - honestly, no idea what to expect from mac mini m1 re: resolution: it seems low enough not to cause bottlenecks and high enough to have decent precision if person is relatively close to camera |
Beta Was this translation helpful? Give feedback.
-
i'd try both and decide - its really not that much work. |
Beta Was this translation helpful? Give feedback.
-
From what I understand Currently, I have default Filter Config and use
Any other suggestion is more than welcome! Thank you @vladmandic |
Beta Was this translation helpful? Give feedback.
-
That is correct for default Just make sure you set |
Beta Was this translation helpful? Give feedback.
re: browser & workers - for high resolutions, using workers is expensive as frame data needs to be copied from gpu to cpu to be transferred. but for lower resultions, that impact is tiny, so yes, using workers is good. how many? browser only has one gl execution context, so using many workers doesnt help, but you do want main thread plus two workers so one is always busy (using just one worker does not saturate the pipeline). and yes, each worker should use
humangl
backend.re: node & threads: - ideally you want to have same number of threads as cpu cores to fully saturate your cpu. monitor your cpu load.
re: performance - honestly, no idea what to expect from mac mini m1
re: resolution: …