Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QGIS Server performance questions #10

Closed
lucamanga opened this issue Jun 5, 2020 · 17 comments
Closed

QGIS Server performance questions #10

lucamanga opened this issue Jun 5, 2020 · 17 comments

Comments

@lucamanga
Copy link

Hello,
I'm doing a WFS querying on your qgis docker: https://hub.docker.com/r/3liz/qgis-map-server/
I noticed that there some performance issues on WFS querying.
First time it takes 5 seconds to identify a point, then the second time is instantaneously. But the third time, after some time (45 seconds), it takes 5 seconds again.

Logs of first REQUEST: 10:16:32 -> 10:16:37

2020-06-05 10:16:32,479 INFO    [3498]  Qgis: Server: BBOX:664037,5103713,664037,5103713,EPSG:25832
2020-06-05 10:16:32,479 INFO    [3498]  Qgis: Server: MAP:mercati3.qgs
2020-06-05 10:16:32,479 INFO    [3498]  Qgis: Server: OUTPUTFORMAT:application/json
2020-06-05 10:16:32,479 INFO    [3498]  Qgis: Server: PROPERTYNAME:*
2020-06-05 10:16:32,479 INFO    [3498]  Qgis: Server: REQUEST:GetFeature
2020-06-05 10:16:32,479 INFO    [3498]  Qgis: Server: SERVICE:WFS
2020-06-05 10:16:32,479 INFO    [3498]  Qgis: Server: TYPENAME:jlucia
2020-06-05 10:16:32,479 INFO    [3498]  Qgis: Server: VERSION:1.1.0
2020-06-05 10:16:32,480 INFO    [3498]  Qgis: Server: WFS Request parameters:
2020-06-05 10:16:32,480 INFO    [3498]  Qgis: Server:  - OUTPUTFORMAT : application/json
2020-06-05 10:16:32,480 INFO    [3498]  Qgis: Server:  - PROPERTYNAME : *
2020-06-05 10:16:32,480 INFO    [3498]  Qgis: Server:  - TYPENAME : jlucia
2020-06-05 10:16:32,480 INFO    [3498]  Qgis: Server:  - BBOX : 664037,5103713,664037,5103713,EPSG:25832
2020-06-05 10:16:32,480 INFO    [3498]  Qgis: Server:  - VERSION : 1.1.0
2020-06-05 10:16:37,515 DEBUG   [3498]  b'\x83S\xb5(\xa6J\x11\xea\xa2\x8a\x02B\xac\x1a\x00\x02': Flushing response data: (461 bytes)
2020-06-05 10:16:37,516 DEBUG   [25]    SND worker: b'\x83S\xb5(\xa6J\x11\xea\xa2\x8a\x02B\xac\x1a\x00\x02' -> client: b'OWS-SERVER-1' : b'\xa0\xf4\xec*\xa7\x15\x11\xea\x87i\x02B\xac\x1a\x00\x02'
2020-06-05 10:16:37,517 DEBUG   [3498]  b'\x83S\xb5(\xa6J\x11\xea\xa2\x8a\x02B\xac\x1a\x00\x02': Flushing response data: (4 bytes)
2020-06-05 10:16:37,517 DEBUG   [25]    SND worker: b'\x83S\xb5(\xa6J\x11\xea\xa2\x8a\x02B\xac\x1a\x00\x02' -> client: b'OWS-SERVER-1' : b'\xa0\xf4\xec*\xa7\x15\x11\xea\x87i\x02B\xac\x1a\x00\x02'
2020-06-05 10:16:37,517 DEBUG   [25]    SND worker: b'\x83S\xb5(\xa6J\x11\xea\xa2\x8a\x02B\xac\x1a\x00\x02' -> client: b'OWS-SERVER-1' : b'\xa0\xf4\xec*\xa7\x15\x11\xea\x87i\x02B\xac\x1a\x00\x02'
2020-06-05 10:16:37,517 INFO    [3498]  Qgis: Server: Request finished in 5038 ms
2020-06-05 10:16:37,518 DEBUG   [25]    READY b'\x83S\xb5(\xa6J\x11\xea\xa2\x8a\x02B\xac\x1a\x00\x02'
2020-06-05 10:16:37,582 RREQ    [1]             206     GET     ?MAP=mercati3.qgs&OUTPUTFORMAT=application%2Fjson&SERVICE=WFS&PROPERTYNAME=%2A&REQUEST=GetFeature&TYPENAME=jlucia&VERSION=1.1.0&BBOX=664037%2C5103713%2C664037%2C5103713%2CEPSG%3A25832  5104    -1
2020-06-05 10:16:37,636 REQ     [1]     192.168.10.78   200     GET     /ows/?MAP=mercati3.qgs&outputFormat=application%2Fjson&service=WFS&propertyname=%2A&request=GetFeature&typename=jlucia&version=1.1.0&bbox=664037%2C5103713%2C664037%2C5103713%2CEPSG%3A25832     5158    -1       python-requests/2.18.4

SECOND request: the response is instant

2020-06-05 10:16:51,697 INFO    [10917] Qgis: Server:  - VERSION : 1.1.0
2020-06-05 10:16:51,740 DEBUG   [10917] b'\x13\x97\x18$\xa0\xa5\x11\xea\x89\xcb\x02B\xac\x1a\x00\x02': Flushing response data: (461 bytes)

THIRD request after 30-45 seconds: 10:19:26 -> 10:19:31, again 5 seconds

2020-06-05 10:19:26,048 INFO    [10917] Qgis: Server:  - VERSION : 1.1.0
2020-06-05 10:19:31,081 DEBUG   [10917] b'\x13\x97\x18$\xa0\xa5\x11\xea\x89\xcb\x02B\xac\x1a\x00\x02': Flushing response data: (461 bytes)
@dmarteau
Copy link
Member

dmarteau commented Jun 5, 2020

It is known that the first time a project has to be loaded in qgis server it may takes some outrageous amount of time depending on the number of layers and the datasource involved.

Now you must know that py-qgis-server use qgis server worker in child processes for handling requests (qgis server by itself is not asynchronous and that there is no shared cache between those workers (an issue that cannot be solved without rewriting a large part of Qgis code).

From this you may experience latency each time a project has to be loaded in a worker cache and
you will have always optimal response time when the project has been loaded into each worker cache.

Depending of the nature and the number of projects (number of layers, big datasources....) you are using you may have to use different proxy strategies (for exemple you may implement sharding with several py-qgis-server instances). If you have a few projects, you may also considering seeding with multiple initial requests until all workers have their projects loaded.

@lucamanga
Copy link
Author

Interesting. How to implement sharding with several py qgis servers?

@dmarteau
Copy link
Member

dmarteau commented Jun 5, 2020

You pop several instances of py-qgis-servers and may use a nginx as reverse proxy with some consistent hashing of the MAP parameter.

@lucamanga
Copy link
Author

I noticed that there is QGSRV_CACHE_ROOTDIR variable. May it help?

@dmarteau
Copy link
Member

dmarteau commented Jun 5, 2020

No, the cache QGSRV_CACHE_ROOTDIR set the location of the projects files. The configuration is not well documented and we are working on it. You may adjust the number of workers with QGSRV_SERVER_WORKERS.

@kikislater
Copy link

QGSRV_SERVER_WORKERS

Interesting discussion !
So how workers works ? Is there a rule between cpu threads and workers ?

@dmarteau
Copy link
Member

It is not thread, it is really multi processing. Requests are distributed using a fair queuing with 0MQ messaging. you may also distribute your workers on a whole cluster by running worker only/proxy only containers.

@kikislater
Copy link

kikislater commented Jul 7, 2020

Ok understand, thank you. So it means, it will need a big infrastructure to achieve good performances but when I tried with lizmap-docker-compose on big project and 8 vcpu + 8 gb ram, it never reach full computer load (ram and cpu : ram does not reach his maximum and cpu stay at around 25% by vCPU).

image

Even with agressive parameters in :

  • qgis server (QGSRV_SERVER_WORKERS 2 to 8 to 32 : could be stupid but just trying to saturate computer and QGIS_SERVER_MAX_THREADS : 8),
  • php fpm : pm.start_servers, pm.min_spare_servers, pm.max_spare_servers ... I will play later with others fpm parameters like PM_CHILD_PROCESS, PM_MAX_CHILDREN, PM_MAX_REQUESTS and PM_PROCESS_IDLE_TIMEOUT setting in environment variable do nothing at this time in current docker configuration. First ones are written with bash command sed to fpm configuration.
  • nginx : worker_processes and worker_connections

So at this time, my question is : any idea where this limitation come from :

  • docker ? (I don't think so cause I manage to get 100% vCPU on running stress or stress ng in qgis server, lizmap, nginx and redis container)
  • project reading ? 2.2 Mo
  • ...

@dmarteau
Copy link
Member

dmarteau commented Jul 7, 2020

You will saturate your CPU with computation intensive jobs. This is highly dependend of the context and the kind of project, as a rule a thumb you may expect that jobs spend most of the doing I/0 which means it has mostly no impact on cpu demand.

Increasing the number of workers will not change loading time nor the time spent internally by one worker to process your request: it will enable you to process more request at the same time scale according to your request rate.

Because of this you must also set the proper values for php-fpm depending on what is your scenario.

A said before, performances depends on many factors and the appropriate solution depends on what you want to improve.

AMHA, here are the questions to asks:

Number of workers, php configuration and cache size will play a role with:

  • What is the the expected request rate
  • How requests distribute on projects
  • How many numbers of differents projects I have to handle.

And the following will impact the internal performances of each workers

  • How many layers is there is in the projects
  • How big is the data I have to handle.
  • How is access to my backend databases - (I have seen many performances issues from bad settings in postgis databases).

The former questions target your infrastructure choice, the latter rely for the most part on Qgis internal performances.

@kikislater
Copy link

Ok thank you for the reply, with docker stats I clearly show the I/0.
Docker have one limitation about I/0 : by default it's reduced depending on linux distribution. docker.service needs to be updated from

LimitNOFILE=1048576
LimitNPROC=1048576

to

LimitNOFILE=infinity
LimitNPROC=infinity

Tested on Lizmap and show an improvement on pre cached layers

@dmarteau
Copy link
Member

dmarteau commented Jul 7, 2020

Tested on Lizmap and show an improvement on pre cached layers

Good to know thanks !

@dmarteau
Copy link
Member

dmarteau commented Jul 7, 2020

@kikislater

Do you have some metrics ? Could be interesting to investigate the performance gain.

@kikislater
Copy link

No sorry, just visual but you could read here some input about I/0 on docker with metrics : moby/moby#21485

So you could test by yourself with reading / writing inside and outside your container.

PR at the end moby/moby#24307
Consider looking at TasksMax=infinity as well in same systemd service as it was not mention in PR and related to your kernel option

@lucamanga
Copy link
Author

lucamanga commented Jul 7, 2020 via email

@kikislater
Copy link

Depending on your host linux distribution. Some distribution already have it well tuned. Could not be a final solution and need to be more tested with qgis server
It used to be in /usr/lib/systemd/system/docker.service

@TANK2003
Copy link
Contributor

TANK2003 commented Sep 4, 2020

I see a new config "SERVER_RESTARTMON", can we use it to improve internal performances of each workers ? By updating the file that "SERVER_RESTARTMON" is watching before a user make an OWS request.

Is there a way to make have timeout before request send a '422 Unprocessable Entity' ?

Thanks for your work !

@dmarteau
Copy link
Member

dmarteau commented Sep 4, 2020

@TANK2003 SERVER_RESTARTMON is a just a very simple way to ask the workers to make a graceful restart, for example when you are updating plugins, it is not really related to internal performances.
The main process broadcasts a notification to the workers: they restart as soon they have finished the current processing. while new incoming requests are held back by the dispatcher. This ensure that there is no lost of requests during the restart process.

'422 Unprocessable Entity' has nothing to do with timeout, it is sent when you have invalid layers in strict checking mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants