This is a very nebulous problem description I'm afraid. My test program consistently runs in 32ms without instrumentation, and 5500ms with funtrace instrumentation.
The timing is very consistent. I doesn't matter which compiler or wrapper I use, the program always runs in 5500ms.
From what I can tell using the traces that it produces, the program pauses for hundreds of milliseconds on pthread_join. I'm not sure where to start, so any suggestions are welcome.