How to improve performance of the app? #318

nicoeiris11 · 2022-12-19T21:11:43Z

nicoeiris11
Dec 19, 2022

Hi,

First of all, congratulations on the great work!!

I have an ElectronJS app running on an M1 Mac Mini (with an integrated webcam). Currently, this app uses human repo for detecting and tracking people (gesture, mesh, attention, iris, emotion, description, and body). My client wants to improve performance by migrating only the computer vision features to Python (the main app will still use ElectronJS technology). I would like to know if there's any analog repo to this one implemented in Python (either TensorFlow or Pytorch) or even a smart way to run these models efficiently (e.g., TensorFlowLite).
The main technical challenge is that FPS is very low in our prod environment.

I was thinking about converting these TensorFlowJS models to TensorFlowLite (using tensorflowjs_converter) and running them using a Python environment. Still, I'm worried about tracking capabilities and other tasks currently implemented in JS inside this human repo (besides loading a model and computing the prediction).

What recommendations could you give me regarding my requirements, given I would want to still use these repo models and tracking features?

Some questions I'd like to answer:

Is there any similar repo to this one in Python? (including models, tracking, etc.)
Do you think it is still possible to use this repo but compute predictions using TensorFlowLite?
Do you know if converting these models to TFLite implies modifying networks or retraining? In other words, TFLite supports all the components present in these models architectures?
Do you think migrating computer vision features to Python could improve the FPS rate of the app?

Please let me know if I was clear with my situation and questions; otherwise, I can elaborate more.

Thank you so much!

Answered by vladmandic

Jan 11, 2023

re: browser & workers - for high resolutions, using workers is expensive as frame data needs to be copied from gpu to cpu to be transferred. but for lower resultions, that impact is tiny, so yes, using workers is good. how many? browser only has one gl execution context, so using many workers doesnt help, but you do want main thread plus two workers so one is always busy (using just one worker does not saturate the pipeline). and yes, each worker should use humangl backend.

re: node & threads: - ideally you want to have same number of threads as cpu cores to fully saturate your cpu. monitor your cpu load.

re: performance - honestly, no idea what to expect from mac mini m1

re: resolution: …

View full answer

vladmandic · 2022-12-20T02:28:19Z

vladmandic
Dec 20, 2022
Maintainer

first, you didn't say are you running in browser or node (since electron supports both)?
and which backend?

but in general, m1 macs are a bit of a problem:

for nodejs, tensorflow support for native m1 cpu ops is still not there.
hopefully will be addressed by tensorflow team soon and then its just a question of integrating it into new version of tfjs-node
for browsers, acceleration with chromium built-in by electron is couple of generations behind
if you really care about performance, you'd want webgpu backend (tested and supported by human),
but for that you'd need to rip-and-replace chromium that is included by default by electronjs

on tflite, i've experimented with it.
but unfortunately it doesn't support sufficient kernel ops making quite a lot of models used by human incompatibile with it.

on converting to python in general:
it could be done by reverse-converting from tfjs graph format to tf saved format instead of tflite
(take a look at https://github.com/patlevin/tfjs-to-tf)
but then what? human is sooo much more than simply executing models.
i don't see this as feasible at all unless you want to re-implement entire human in python.

anyhow, talking about performance in general...
one common issues almost always is that hw is underutilized due to sequential nature of js code
(javascript is by definition single-threaded, even if it runs asynchronously)

so first thing i'd try is to parallelize workflow:

pre-create pool of workers (if in browser) or threads (if in nodejs) each running instance of human
use main thread (or process) just to coordinate them, but never do any processing on the main thread.

for simple examples (both nodejs and browser) take a look at
https://github.com/vladmandic/human/tree/main/demo/multithread

and even better example for nodejs using thread pool would be
https://github.com/vladmandic/human/tree/main/demo/facematch with node-match.js and node-match-worker.js

1 reply

nicoeiris11 Dec 26, 2022
Author

Thank you for your detailed response! To clarify, we are running the Electron app and the human repository in the browser on a MacMini M1 device.

You're right that the performance of the computer vision code in the Electron app could be improved. As you mentioned, one option is to parallelize the workflow using web workers. However, I'm also considering migrating the computer vision code to a separate Node.js microservice, which would run on the same MacMini M1 device (localhost). The plan would be to stream the webcam feed through WebSocket to the microservice, process the stream using the human repository, and then send the results back to the Electron app.

Do you think this would be a good approach to improve performance and avoid blocking the Electron app? Are there any other strategies you would recommend for optimizing the computer vision code in this setup? Why do you think using web workers would be more efficient than the strategy I presented?

Again, thank you for your help. Your insights and advice are much appreciated.

vladmandic · 2022-12-26T13:32:18Z

vladmandic
Dec 26, 2022
Maintainer

I'm also considering migrating the computer vision code to a separate Node.js microservice, which would run on the same MacMini M1 device (localhost)

TensorFlow and tfjs-node do support M1 as of (very) recently, so that should work, but I have no idea how well its optimized for M1 platform (I previously said that it's not present, but I stand corrected - it has been recently added).

I have seen some installation issues as Mac will by default try to install x86 binaries (x86 is detected due to presence of Rosetta) and you need to set platform to arm to install correct binaries (and avoid illegal instruction errors).

Why do you think using web workers would be more efficient than the strategy I presented

I don't - I said that using parallelism in general would help a lot, regardless if its in browser or in nodejs
if using nodejs, use worker_threads and if using browser use web workers
both examples are provided in human/demo

1 reply

nicoeiris11 Dec 26, 2022
Author

Hi,

Thanks for your response! I'm glad to hear that @vladmandic/human and TensorFlow.js support M1 platforms. I'd like to have an idea of how much re-implementation is needed to migrate from using @vladmandic/human in the browser to using it in Node.js. Are there any specific considerations or changes that I need to make when using the package in a Node.js environment?

I also have a couple of questions about human.node and tfjs:

What is the difference between human.node and tfjs? (both present in /dist folder)
Is there any documentation available for the files in the "/dist" folder, or are they specific to my project?

Thanks in advance for your help!

vladmandic · 2022-12-26T15:07:06Z

vladmandic
Dec 26, 2022
Maintainer

Are there any specific considerations or changes that I need to make when using the package in a Node.js environment?

some modules are not supported in nodejs environment due to missing tf kernel ops, but you should not be affected that much
see https://github.com/vladmandic/human/blob/main/TODO.md#known-issues--limitations

otherwise, usage should be near-identical, that was my goal

What is the difference between human.node and tfjs? (both present in /dist folder)

its the same human, but in slightly different formats and linked towards different flavors of tfjs:

human.js: iife format for browsers, using with <script> tag, bundles tfjs
human.esm.js: esm format, used in browsers, using with import statement, bundles tfjs
human.esm-nobundle.js: esm format, used in browsers, using with import statement, same as above but does not bundle tfjs
human.node: cjs format for nodejs, using with require statement, does not bundle tfjs and requires @tensorflow/tfjs-node
human.node-gpu: cjs format for nodejs, using with require statement, does not bundle tfjs and requires @tensorflow/tfjs-node-gpu which uses nvidia cuda libraries for accelration
human.node-wasm: cjs format for nodejs, using with require statement, does not bundle tfjs and requires @tensorflow/tfjs-backend-wasm

on m1 architecture in nodejs or you should first install @tensorflow/tfjs-node and then use human.node

1 reply

nicoeiris11 Dec 26, 2022
Author

So to run in M1 with nodejs I should use human.node dist? In such case, I don't need to install @tensorflow/tfjs-node right?

Besides this discussion, I'm having issues running human in the browser. I'm importing human this way:
import Human from '@vladmandic/human/dist/human.esm.js';, but I'm having the following error using webpack:
"Module not found: Error: Package path ./dist/human.esm.js is not exported from package".
I'm using the last version and also checked out that you already answered a similar question in an issue.

Do you mind telling me why this error could be happening given I'm using the last version and you explicitly exported the /dist folder in your package.json?

Thank you!

vladmandic · 2022-12-26T15:39:37Z

vladmandic
Dec 26, 2022
Maintainer

So to run in M1 with nodejs I should use human.node dist? In such case, I don't need to install @tensorflow/tfjs-node right?

Yes, you do need to install @tensorflow/tfjs-node BEFORE using human.node
as I cannot bundle it - installation of tfjs-node includes platform specific binaries.

I bundle tfjs in human.esm because tfjs for browsers is JS library, so bundling is clean.

I'm having issues running human in the browser. I'm importing human this way:
import Human from '@vladmandic/human/dist/human.esm.js';, but I'm having the following error using webpack:
"Module not found: Error: Package path ./dist/human.esm.js is not exported from package".

I really hate webpack - every version they release breaks something :(

In reality, you shouldn't need to import @vladmandic/human/dist/human.esm.js, simply importing @vladmandic/human should automatically look up exports field in package.json and auto-determine which file from /dist to use.

7 replies

nicoeiris11 Dec 26, 2022
Author

Thank you for the information provided.
As I don't have a webcam integrated into my MacMini M1, I'd like to run a human demo in the browser using a video instead of the webcam. My intention is to measure the FPS of using a couple of models as a benchmark vs. using a nodejs microservice to process the video via WebSockets.

Which demo do you recommend for this experiment?

vladmandic Dec 26, 2022
Maintainer

demo/typescript is cleanest, just replace human.webcam with human.video.

nicoeiris11 Dec 26, 2022
Author

Cool! thank you so much.

Regarding your comment:

I have seen some installation issues as Mac will by default try to install x86 binaries (x86 is detected due to presence of Rosetta) and you need to set platform to arm to install correct binaries (and avoid illegal instruction errors).

Do you know how can I set platform to arm?

I just cloned human repo, and npm install --dev is failing because of the arch (I'm using M1).
I tried running npm install --dev @tensorflow/tfjs-node --platform=darwin --arch=arm64 but still having errors while installing.

I'd really appreciate any guidance.

vladmandic Dec 26, 2022
Maintainer

i don't from the top of my head, search tfjs github, thats where i saw it

nicoeiris11 Dec 26, 2022
Author

Which are the steps to run the demo/typescript locally? I assume opening demo/typescript/index.html in the browser is not enough.

vladmandic · 2022-12-26T19:33:08Z

vladmandic
Dec 26, 2022
Maintainer

npm run dev

it starts local http/https server and runs compile on-demand

then navigate to https://localhost:8001/demo/typescript

0 replies

nicoeiris11 · 2022-12-28T12:10:02Z

nicoeiris11
Dec 28, 2022
Author

I'm benchmarking the following migration approaches (given human is used in the browser):

From my browser send webcam stream to a nodejs server for detection via WebSocket.
Get webcam directly from nodejs and send human results to browser electron app.
Implement web workers in the existing browser app.

During experimentation, I wanted to use the demo/node/node-video.js example but after installing all the reqs the program is not throwing errors nor processing the frames. pipe2jpeg is not receiving the output of FFmpeg spawned process and the program ends without errors FFmpeg exit 0 null.

Can you let me know if this node-video.js is deprecated or if I'm missing something?

8 replies

vladmandic Dec 29, 2022
Maintainer

No, I'm away this week, should be able to test it early next week.

nicoeiris11 Dec 29, 2022
Author

Sure! Thank you so much and have a great end of the year!

btw, do you think the strategy of sending a web webcam stream (from the browser) to a nodejs server for detection via WebSocket (in real-time) is something doable? (both apps in localhost) I just want to execute human detection on the nodejs server side and do the drawings + tracking on the client side. The main goal is to increase the FPS by not blocking the electronjs app (browser) when detecting.

I still have pending testing your approach of using web workers in the electron js browser app.

vladmandic Dec 29, 2022
Maintainer

given that both apps are in localhost, yes, its a viable approach. people absuse websockets and forget just how much data is being transferred.

except there is one issue - video stream is in browser and as frame gets copied to canvas, entire pixel data needs to be transferred to tensor. that is really fast with webgl/webgpu as data copy happens within gpu, but if you're using either webworkers or planning to send data via websocket to nodejs, data needs to be downloaded from gpu to ram and that is not the fastest thing - so running in single worker (or thread in nodejs) is actually going to be slower than just running in main thread in browser (so there is no gpu to ram copy) - so you'll need to increase number of workers/threads just to offset what you lost to start with.

in my experience, gpu to cpu copy is about ~20ms for fullhd image, so that limits how many frames can be achieved in total.
if resolution is lower, its much faster (its not proportionate), so you may want to use 720 resolution (plus that would reduce load on websockets)

nicoeiris11 Dec 29, 2022
Author

So given my context do you recommend experimenting with web workers (keep all the processing in the browser as it is now) as a first approach toward optimizing FPS?

vladmandic Dec 29, 2022
Maintainer

Even with web workers, there is still GPU to RAM copy as there is no way to move GL context to worker.

vladmandic · 2022-12-30T00:38:52Z

vladmandic
Dec 30, 2022
Maintainer

had few min free so i took a look at broken node-video instead of waiting...

the library pipe2jpeg recently changed some methods, so human was compatible with pipe2jpeg@0.3.x, but not latest pipe2jpeg@0.4.0

anyhow, ive fixed it and updated demo is online

node node-video.js ~/downloads/nikki.mp4

2022-12-29 19:30:39 INFO:  User: vlado Platform: linux Arch: x64 Node: v19.1.0
2022-12-29 19:30:39 INFO:  { human: '3.0.1', tf: '4.1.0' }
2022-12-29 19:30:39 INFO:  { input: '/home/vlado/downloads/nikki.mp4' }
2022-12-29 19:30:40 DATA:  frame { frame: 1, size: 92185, shape: [ 720, 1280, 3 ], face: 0, body: 0, hand: 0, gesture: 0 }
2022-12-29 19:30:40 DATA:  frame { frame: 2, size: 59674, shape: [ 720, 1280, 3 ], face: 0, body: 0, hand: 0, gesture: 0 }
...
2022-12-29 19:30:44 DATA:  frame { frame: 70, size: 26332, shape: [ 720, 1280, 3 ], face: 1, body: 0, hand: 0, gesture: 5 }
2022-12-29 19:30:44 DATA:  person { score: [ 0.85, 0.78 ], age: 25.7, gender: [ 0.28, 'male' ], emotion: { score: 0.32, emotion: 'angry' } }
2022-12-29 19:30:44 DATA:  frame { frame: 71, size: 25712, shape: [ 720, 1280, 3 ], face: 1, body: 0, hand: 0, gesture: 5 }
2022-12-29 19:30:44 DATA:  person { score: [ 0.79, 1 ], age: 29.3, gender: [ 0.12, 'male' ], emotion: { score: 0.3, emotion: 'happy' } }
2022-12-29 19:30:44 DATA:  frame { frame: 72, size: 25325, shape: [ 720, 1280, 3 ], face: 1, body: 0, hand: 0, gesture: 5 }
2022-12-29 19:30:44 DATA:  person { score: [ 0.7, 0.82 ], age: 29.3, gender: [ 0.11, 'male' ], emotion: { score: 0.67, emotion: 'sad' } }
2022-12-29 19:30:44 DATA:  frame { frame: 73, size: 26022, shape: [ 720, 1280, 3 ], face: 1, body: 0, hand: 0, gesture: 5 }

1 reply

nicoeiris11 Dec 30, 2022
Author

It's working now! Thank you so much for looking at this during your time off! I really appreciate it!!

nicoeiris11 · 2023-01-02T16:42:39Z

nicoeiris11
Jan 2, 2023
Author

Hi @vladmandic! Happy new year.

Considering your previous response:

given that both apps are in localhost, yes, its a viable approach. people absuse websockets and forget just how much data is being transferred.

except there is one issue - video stream is in browser and as frame gets copied to canvas, entire pixel data needs to be transferred to tensor. that is really fast with webgl/webgpu as data copy happens within gpu, but if you're using either webworkers or planning to send data via websocket to nodejs, data needs to be downloaded from gpu to ram and that is not the fastest thing - so running in single worker (or thread in nodejs) is actually going to be slower than just running in main thread in browser (so there is no gpu to ram copy) - so you'll need to increase number of workers/threads just to offset what you lost to start with.

in my experience, gpu to cpu copy is about ~20ms for fullhd image, so that limits how many frames can be achieved in total.
if resolution is lower, its much faster (its not proportionate), so you may want to use 720 resolution (plus that would reduce load on websockets)

I'm wondering which is the best approach to optimize my current pipeline.
My app is an electronjs app running in the browser.

This is my current human config:

const MAX_DETECTING_PEOPLE = 5;

export const configForPreload = {
  backend: 'webgl',
  modelBasePath: '../../assets/models/',
  cacheModels: true,
  validateModels: true,
  wasmPath: 'https://vladmandic.github.io/tfjs/dist/',
  wasmPlatformFetch: false,
  debug: true,
  async: true,
  warmup: 'full',
  cacheSensitivity: 0.7,
  skipAllowed: false,
  deallocate: false,
  flags: {},
  softwareKernels: false,
  filter: {
    enabled: true,
    equalization: false,
    width: 0,
    height: 0,
    flip: false,
    return: true,
    autoBrightness: true,
    brightness: 0,
    contrast: 0,
    sharpness: 0,
    blur: 0,
    saturation: 0,
    hue: 0,
    negative: false,
    sepia: false,
    vintage: false,
    kodachrome: false,
    technicolor: false,
    polaroid: false,
    pixelate: 0,
  },
  gesture: {
    enabled: true,
  },
  face: {
    enabled: true,
    detector: {
      modelPath: 'blazeface.json',
      rotation: true,
      maxDetected: MAX_DETECTING_PEOPLE,
      skipFrames: 99,
      skipTime: 2500,
      minConfidence: 0.2,
      iouThreshold: 0.1,
      mask: false,
      return: false,
    },
    mesh: {
      enabled: true,
      modelPath: 'facemesh.json',
      keepInvalid: false,
    },
    attention: {
      enabled: false,
      modelPath: 'facemesh-attention.json',
    },
    iris: {
      enabled: false,
      modelPath: 'iris.json',
    },
    emotion: {
      enabled: true,
      minConfidence: 0.1,
      skipFrames: 99,
      skipTime: 1500,
      modelPath: 'emotion.json',
    },
    description: {
      enabled: true,
      modelPath: 'faceres.json',
      skipFrames: 99,
      skipTime: 3000,
      minConfidence: 0.1,
    },
    antispoof: {
      enabled: false,
      skipFrames: 99,
      skipTime: 4000,
      modelPath: 'antispoof.json',
    },
    liveness: {
      enabled: false,
      skipFrames: 99,
      skipTime: 4000,
      modelPath: 'liveness.json',
    },
  },
  body: {
    enabled: true,
    modelPath: 'movenet-lightning.json',
    maxDetected: MAX_DETECTING_PEOPLE,
    minConfidence: 0.3,
    skipFrames: 1,
    skipTime: 200,
  },
  hand: {
    enabled: false,
    rotation: true,
    skipFrames: 99,
    skipTime: 1000,
    minConfidence: 0.5,
    iouThreshold: 0.2,
    maxDetected: -1,
    landmarks: true,
    detector: {
      modelPath: 'handtrack.json',
    },
    skeleton: {
      modelPath: 'handlandmark-lite.json',
    },
  },
  object: {
    enabled: false,
    modelPath: 'centernet.json',
    minConfidence: 0.2,
    iouThreshold: 0.4,
    maxDetected: MAX_DETECTING_PEOPLE,
    skipFrames: 99,
    skipTime: 2000,
  },
  segmentation: {
    enabled: false,
    modelPath: 'rvm.json',
    ratio: 0.5,
    mode: 'default',
  },
  inputSize: {
    width: 640,
    height: 360,
  },
};

The approach of streaming frames to a nodejs server to only run detection seems to be a bottleneck since the nodejs get stuck.
Now I'm trying to scale this nodejs service using an Nginx balancer to process the human detection using multiple instances. Do you think this is something that can work toward optimizing the processing?

Given my app is currently running the human pipeline in the browser (electronjs) and I want to run predictions faster without blocking the client execution, what do you recommend me to do? Is there anything I can implement or modify from my app to achieve better performance?

Any guidance or recommendations would be much appreciated.

0 replies

vladmandic · 2023-01-02T18:32:15Z

vladmandic
Jan 2, 2023
Maintainer

The approach of streaming frames to a nodejs server to only run detection seems to be a bottleneck since the nodejs get stuck.
Now I'm trying to scale this nodejs service using an Nginx balancer to process the human detection using multiple instances. Do you think this is something that can work toward optimizing the processing?

I already suggested the approach - I really don't see any value (plus its a massive overhead) of nginx balancer to run multiple nodejs instances.

Just run single nodejs instance that

creates pool of threads where each thread runs instance of human that does processing
accepts requests from front-end and submits to first available (not-busy) thread from thread pool

And I've provided example on how to manage thread pool and use human like that. Note that in this case there is NO human running in main instance, all processing is done in individual threads.

4 replies

nicoeiris11 Jan 3, 2023
Author

I'm implementing your node-multiprocess approach. In addition, I'm also using your client-server websocket to send and receive video frames.

The problem I'm having is integrating websocket communication with the multiprocess strategy. I tested defining the workers both inside and outside the ws message listener without successful results.

Here is my current code (not working):

async function main() {
    const wss = new WebSocket.Server({ port: process.argv[2] });

    log.info("Waiting for stream to process data...");

    wss.on('connection', async function connection(ws) {
        log.info("New connection!");

        ws.on('message', async function incoming(data) {
            images.push(data);

            if (workers.length == 0) {
                process.on('unhandledRejection', (err) => {
                    // @ts-ignore // no idea if exception message is compelte
                    log.error(err?.message || err || 'no error message');
                });

                log.header();

                t[0] = process.hrtime.bigint();
                t[1] = process.hrtime.bigint();
                t[2] = process.hrtime.bigint();

                // manage worker processes
                for (let i = 0; i < numWorkers; i++) {
                    // create worker process
                    workers[i] = await childProcess.fork(workerFile, ['special']);
                    // parse message that worker process sends back to main
                    // if message is ready, dispatch next image in queue
                    // if message is processing result, just print how many faces were detected
                    // otherwise it's an unknown message
                    workers[i].on('message', (msg) => {
                        if (msg.ready) submitDetect(workers[i]);
                        else if (msg.image) {
                            ws.send(msg.image);
                            processedImages++;
                            log.data('Main: worker finished:', workers[i].pid, 'detected faces:', msg.detected.face?.length, 'bodies:', msg.detected.body?.length, 'hands:', msg.detected.hand?.length, 'objects:', msg.detected.object?.length);
                        }
                        // else if (msg.test) measureLatency();
                        else log.data('Main: worker message:', workers[i].pid, msg);
                    });
                    // just log when worker exits
                    workers[i].on('exit', (msg) => log.state('Main: worker exit:', workers[i].pid, msg));
                    // just log which worker was started
                    log.state('Main: started worker:', workers[i].pid);
                }

                // wait for workers to complete
                waitWorkersCompletion();
            }
        });
    });
}

main();

submitDetect() is executed only one time. How do you recommend me to structure the ws listener and the workers definition in this case?

Please let me know if I'm clear with my question, otherwise I can ellaborate more on this.

Thank you!

nicoeiris11 Jan 3, 2023
Author

@vladmandic even if using child process is not the most efficient approach. Where would you locate workers creation if I need them to process incoming messages from websockets?
Please refer to the code I present above.

nicoeiris11 Jan 3, 2023
Author

I will have the same problems even I use worker_threads

vladmandic Jan 3, 2023
Maintainer

You don't create them on demand as it's slow.

You pre-create pool of workers (like I do in the demo), keep state of each one (busy/idle - again covered in the demo) and pass work to first available one.

Only thing you need to figure out for yourself is what do you want with result? That's up to you.
I used absolute fastest way for small structures which is shared buffer array with a view on it. That is pointless for varying result set you'd get as you couldn't get a static view on it. So pass result back as normal message and then use it however you want.

Sorry, I really don't have time to go over your code.

vladmandic · 2023-01-03T18:12:37Z

vladmandic
Jan 3, 2023
Maintainer

use node threads and ipc to send message from main process to individual threads, far more efficient than child processes as each thread runs just as that - a lightweight thread while each child process requires a full nodejs runtime space.

not to mention that for frequent communication, event emitter used by threads is far more efficient than ipc to send messages back and forth.

take a look at demo/facematch/node-match.js

1 reply

nicoeiris11 Jan 3, 2023
Author

Ok, I will try this approach. Do you think is possible to integrate the WebSocket implementation from your client-server ws example?

How would you structure node threads implementation having the onMessage() from WebSocket?

I will not have a DB of frames but receive the stream in real-time from the client.

vladmandic · 2023-01-03T18:55:00Z

vladmandic
Jan 3, 2023
Maintainer

yes, why not. you receive websocket message in node main process, find first available non-busy thread and repost message to it.

1 reply

nicoeiris11 Jan 3, 2023
Author

Do you have an example of worker_threads + websocket + detection?

I'm not sure if I need to manage a shared buffer, jobs, and descriptors in main process. Are these specific for the facematch demo?
From the demo which are the required elements to make this process work for my detection using ws problem?

vladmandic · 2023-01-03T19:17:48Z

vladmandic
Jan 3, 2023
Maintainer

shared buffer is data is so each thread can update results directly without sending them back to main thread.
i'd assume you wouldn't need that and threads will send result back via message to main process and it would be responsible for maintaining total view.

1 reply

nicoeiris11 Jan 3, 2023
Author

If my intention is just to return detections using the frames coming from the WebSocket client (in real-time) should I manage a view object? What's the purpose of this structure?

Is there any chance you could quickly implement a demo to run human.detect() with worker_threads using incoming frames from a WebSocket in real time? Sorry for the inconvenience, but I'm not sure what's required from the facematch demo in my specific case (ws + worker_threads + detection).

vladmandic · 2023-01-03T19:55:25Z

vladmandic
Jan 3, 2023
Maintainer

shared memory is unstructured (raw bytes). view presents a view given a structure (for example, float is 4 bytes). you don't need it.

Is there any chance you could quickly implement a demo to run human.detect() with worker_threads using incoming frames from a WebSocket in real time?

no, sorry.

0 replies

nicoeiris11 · 2023-01-11T15:54:50Z

nicoeiris11
Jan 11, 2023
Author

Hi @vladmandic!

I didn't find a performance difference between using human in the browser with WebGL vs. sending stream to nodejs and using human with TensorFlow CPU backend with worker threads.

I wanted to try using human in nodejs server but with GPU instead of CPU. I'm using a MacMini M1 device and I see you posted that:

For NodeJS, recommended backend is tfjs-node. Note that while tfjs-node-gpu achieves slightly higher performance than tfjs-node, it's usage is limited to specific nVidia CUDA enabled environments and does not achieve much higher performance since bottlneck is upload&download of textures from GPU, not execution itself

Is this still valid? Should I try to run the nodejs server detection with GPU or what strategy do you recommend to optimize the processing?

0 replies

vladmandic · 2023-01-11T16:01:22Z

vladmandic
Jan 11, 2023
Maintainer

I didn't find a performance difference between using human in the browser with WebGL vs. sending stream to nodejs and using human with TensorFlow CPU backend with worker threads.

what did you try? web workers in browser? how many? threads in node? how many? etc...

I wanted to try using human in nodejs server but with GPU instead of CPU. I'm using a MacMini M1 device ...

like i wrote, tjfs-node-gpu is limited to cuda enabled systems - macmini m1 does not have nvidia gpu and definitey does not have cuda (nor does any apple product ever as there is no cuda for macos even if you have nvidia gpu by some magic).

1 reply

nicoeiris11 Jan 11, 2023
Author

In my POC I tried sending the stream via websocket and detect using 4 worker threads in node to then return result object to the client.

I'm afraid my current app solution (detecting using webgl in browser) blocks execution and is a bottleneck for performance.

Some questions:

Do you think it's worth trying to experiment web workers? (trade-off between single thread doing all in GPU vs. async copying tensors to RAM)
Do you think 4 is a reasonable number of threads for nodejs server?
If experimenting with web workers, should I use humangl backend in each web worker for detection?
How many web workers should I use?
My current app (browser + webgl) is using gesture, face (mesh, emotion, description), and body models. Regarding FPS, the average is 15-20 fps, but in heavy dynamic it drops to 5-10 fps using frame resolution of 640x360. Do you think these are good results or could be improved?
Do you recommend another frame resolution?

Thank you so much for your support!

vladmandic · 2023-01-11T16:53:14Z

vladmandic
Jan 11, 2023
Maintainer

re: browser & workers - for high resolutions, using workers is expensive as frame data needs to be copied from gpu to cpu to be transferred. but for lower resultions, that impact is tiny, so yes, using workers is good. how many? browser only has one gl execution context, so using many workers doesnt help, but you do want main thread plus two workers so one is always busy (using just one worker does not saturate the pipeline). and yes, each worker should use humangl backend.

re: node & threads: - ideally you want to have same number of threads as cpu cores to fully saturate your cpu. monitor your cpu load.

re: performance - honestly, no idea what to expect from mac mini m1

re: resolution: it seems low enough not to cause bottlenecks and high enough to have decent precision if person is relatively close to camera

1 reply

nicoeiris11 Jan 11, 2023
Author

So given my current resolution 640x360 you recommend experimenting with web workers using humangl backend vs. continue my work with nodejs server?

vladmandic · 2023-01-11T17:19:40Z

vladmandic
Jan 11, 2023
Maintainer

i'd try both and decide - its really not that much work.
i'd tell you a stronger recommendation, but the fact is that i never played with apple's m1 based systems, so i don't know how well different backends behave.

1 reply

vladmandic Jan 13, 2023
Maintainer

do i mind letting you know what? its up to you to experiment is browser+webgl or node+tensorflow better for your use case.
i just gave you notes on how many browser workers or node threads to ideally use.
more than that, i cannot do much as i do not have m1 platform for testing.

nicoeiris11 · 2023-01-19T21:44:53Z

nicoeiris11
Jan 19, 2023
Author

From what I understand blazeface is designed for front-facing cameras on mobile devices, where faces in view tend to occupy a relatively large fraction of the canvas. This model may struggle to identify far-away faces. I'm currently trying to detect people in front of the camera, and having issues detecting a person >2 meters from the camera. Attached you can find an example.

Currently, I have default Filter Config and use blazeface default model.

Do you recommend me any custom modifications to Filter config?
Do you recommend to switch from blazeface to any other model for Face config?

Any other suggestion is more than welcome! Thank you @vladmandic

0 replies

vladmandic · 2023-01-19T22:36:41Z

vladmandic
Jan 19, 2023
Maintainer

From what I understand blazeface is designed for front-facing cameras on mobile device

That is correct for default blazeface in MediaPipe which is not what is default in Human.
There are two variations of blazeface, both supported by human. For simplicity, I call them front and back as one is optimized for front-facing camera like you describe and second is just a versatile one - and that is default in Human

Just make sure you set config.face.detector.maxDetected = _number_ as default is 1 for performance reasons.

4 replies

nicoeiris11 Jan 19, 2023
Author

I'm using the default human face model with max_detected=5 but the person with green jacket is not detected when moves back and distance to webcam increases.

When I run blazeface-front the faces are never detected.

I'm missing something? Do you recommend me to use the front model for the example I shared? I need to detect both faces even the distance increases for one person (~ 3 meters)

vladmandic Jan 20, 2023
Maintainer

no, blazeface-front would perform even worse, where would you get the idea that it would work for a tiny face?!

for the example you've shared, without knowing ahead of time there is a person in the back (because you can see the body), nobody would guess there's a face - face is a total blur and very dark.

for images where face is that far away, having higher resolution input would definitely help.
and for the image filters, definitely increase brightness a bit, image is quite dark overall.

nicoeiris11 Jan 20, 2023
Author

Does Human support this MediaPipe blazeface full range model (~5 meters)?

If not, how can I convert it from tflite to tfjs version?

vladmandic Jan 20, 2023
Maintainer

Does Human support this MediaPipe blazeface full range model (~5 meters)?

If not, how can I convert it from tflite to tfjs version?

thats exactly the model that i was referring to! its the default in human

How to improve performance of the app? #318

nicoeiris11 Dec 19, 2022

Replies: 19 comments · 33 replies

vladmandic Dec 20, 2022 Maintainer

nicoeiris11 Dec 26, 2022 Author

vladmandic Dec 26, 2022 Maintainer

nicoeiris11 Dec 26, 2022 Author

vladmandic Dec 26, 2022 Maintainer

nicoeiris11 Dec 26, 2022 Author

vladmandic Dec 26, 2022 Maintainer

nicoeiris11 Dec 26, 2022 Author

vladmandic Dec 26, 2022 Maintainer

nicoeiris11 Dec 26, 2022 Author

vladmandic Dec 26, 2022 Maintainer

nicoeiris11 Dec 26, 2022 Author

vladmandic Dec 26, 2022 Maintainer

nicoeiris11 Dec 28, 2022 Author

vladmandic Dec 29, 2022 Maintainer

nicoeiris11 Dec 29, 2022 Author

vladmandic Dec 29, 2022 Maintainer

nicoeiris11 Dec 29, 2022 Author

vladmandic Dec 29, 2022 Maintainer

vladmandic Dec 30, 2022 Maintainer

nicoeiris11 Dec 30, 2022 Author

nicoeiris11 Jan 2, 2023 Author

vladmandic Jan 2, 2023 Maintainer

nicoeiris11 Jan 3, 2023 Author

nicoeiris11 Jan 3, 2023 Author

nicoeiris11 Jan 3, 2023 Author

vladmandic Jan 3, 2023 Maintainer

vladmandic Jan 3, 2023 Maintainer

nicoeiris11 Jan 3, 2023 Author

vladmandic Jan 3, 2023 Maintainer

nicoeiris11 Jan 3, 2023 Author

vladmandic Jan 3, 2023 Maintainer

nicoeiris11 Jan 3, 2023 Author

vladmandic Jan 3, 2023 Maintainer

nicoeiris11 Jan 11, 2023 Author

vladmandic Jan 11, 2023 Maintainer

nicoeiris11 Jan 11, 2023 Author

vladmandic Jan 11, 2023 Maintainer

nicoeiris11 Jan 11, 2023 Author

vladmandic Jan 11, 2023 Maintainer

vladmandic Jan 13, 2023 Maintainer

nicoeiris11 Jan 19, 2023 Author

vladmandic Jan 19, 2023 Maintainer

nicoeiris11 Jan 19, 2023 Author

vladmandic Jan 20, 2023 Maintainer

nicoeiris11 Jan 20, 2023 Author

vladmandic Jan 20, 2023 Maintainer

nicoeiris11
Dec 19, 2022

Replies: 19 comments 33 replies

vladmandic
Dec 20, 2022
Maintainer

nicoeiris11 Dec 26, 2022
Author

vladmandic
Dec 26, 2022
Maintainer

nicoeiris11 Dec 26, 2022
Author

vladmandic
Dec 26, 2022
Maintainer

nicoeiris11 Dec 26, 2022
Author

vladmandic
Dec 26, 2022
Maintainer

nicoeiris11 Dec 26, 2022
Author

vladmandic Dec 26, 2022
Maintainer

nicoeiris11 Dec 26, 2022
Author

vladmandic Dec 26, 2022
Maintainer

nicoeiris11 Dec 26, 2022
Author

vladmandic
Dec 26, 2022
Maintainer

nicoeiris11
Dec 28, 2022
Author

vladmandic Dec 29, 2022
Maintainer

nicoeiris11 Dec 29, 2022
Author

vladmandic Dec 29, 2022
Maintainer

nicoeiris11 Dec 29, 2022
Author

vladmandic Dec 29, 2022
Maintainer

vladmandic
Dec 30, 2022
Maintainer

nicoeiris11 Dec 30, 2022
Author

nicoeiris11
Jan 2, 2023
Author

vladmandic
Jan 2, 2023
Maintainer

nicoeiris11 Jan 3, 2023
Author

nicoeiris11 Jan 3, 2023
Author

nicoeiris11 Jan 3, 2023
Author

vladmandic Jan 3, 2023
Maintainer

vladmandic
Jan 3, 2023
Maintainer

nicoeiris11 Jan 3, 2023
Author

vladmandic
Jan 3, 2023
Maintainer

nicoeiris11 Jan 3, 2023
Author

vladmandic
Jan 3, 2023
Maintainer

nicoeiris11 Jan 3, 2023
Author

vladmandic
Jan 3, 2023
Maintainer

nicoeiris11
Jan 11, 2023
Author

vladmandic
Jan 11, 2023
Maintainer

nicoeiris11 Jan 11, 2023
Author

vladmandic
Jan 11, 2023
Maintainer

nicoeiris11 Jan 11, 2023
Author

vladmandic
Jan 11, 2023
Maintainer

vladmandic Jan 13, 2023
Maintainer

nicoeiris11
Jan 19, 2023
Author

vladmandic
Jan 19, 2023
Maintainer

nicoeiris11 Jan 19, 2023
Author

vladmandic Jan 20, 2023
Maintainer

nicoeiris11 Jan 20, 2023
Author

vladmandic Jan 20, 2023
Maintainer