Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor REST performance vs. eXist #17

Open
IanDavey opened this issue Jan 25, 2021 · 9 comments
Open

Poor REST performance vs. eXist #17

IanDavey opened this issue Jan 25, 2021 · 9 comments

Comments

@IanDavey
Copy link

I have been benchmarking different types of requests to the RESTful API for both Fusion and eXist, using the latest USLM versions of titles 1 (a short one) and 42 (the longest one) of the US Code. Each of these tests had 100 repetitions, and both eXist and Fusion were identically configured on the same VM.

First, I tested a simple PUT of a document to /exist/rest/db/test:

usc01.xml eXist:
    Minimum: 0.43300747871398926 s
    Maximum: 1.388643503189087 s
    Mean:    0.6491883111000061 s
    Median:  0.6234605312347412 s
    Stddev:  0.15249426666857233 s
usc01.xml Fusion:
    Minimum: 0.5359818935394287 s
    Maximum: 2.7659928798675537 s
    Mean:    1.2113053154945375 s
    Median:  1.1449862718582153 s
    Stddev:  0.47974903573925054 s
usc42.xml eXist:
    Minimum: 35.580822229385376 s
    Maximum: 54.87896108627319 s
    Mean:    41.51840656995773 s
    Median:  41.32154309749603 s
    Stddev:  4.195144231762311 s
usc42.xml Fusion:
    Minimum: 188.74798727035522 s
    Maximum: 1077.4066643714905 s
    Mean:    404.10182027816774 s
    Median:  320.5594769716263 s
    Stddev:  191.0485587590224 s

Next, I tested GET:

usc01.xml eXist:
    Minimum: 0.03500247001647949 s
    Maximum: 0.2610175609588623 s
    Mean:    0.05440183639526367 s
    Median:  0.0410001277923584 s
    Stddev:  0.04378882484633538 s
usc01.xml Fusion:
    Minimum: 0.5689961910247803 s
    Maximum: 0.7550196647644043 s
    Mean:    0.6137096118927002 s
    Median:  0.6049911975860596 s
    Stddev:  0.030363894050856537 s
usc42.xml eXist:
    Minimum: 13.213871717453003 s
    Maximum: 15.583616971969604 s
    Mean:    13.67682737827301 s
    Median:  13.544020891189575 s
    Stddev:  0.40090442662817194 s
usc42.xml Fusion:
    Minimum: 264.0210030078888 s
    Maximum: 293.1509966850281 s
    Mean:    272.6948435425758 s
    Median:  269.29898858070374 s
    Stddev:  7.9907570945070505 s

Then, I POSTed some XQuery to get a specific section of a document:

1 U.S.C. 1 eXist:
    Minimum: 0.004994392395019531 s
    Maximum: 0.011027336120605469 s
    Mean:    0.006702215671539307 s
    Median:  0.0060079097747802734 s
    Stddev:  0.0012504552254434967 s
1 U.S.C. 1 Fusion:
    Minimum: 0.00799870491027832 s
    Maximum: 0.03599071502685547 s
    Mean:    0.010678362846374512 s
    Median:  0.009988903999328613 s
    Stddev:  0.0033822034856866566 s
42 U.S.C. 2000e eXist:
    Minimum: 0.012010574340820312 s
    Maximum: 0.04998445510864258 s
    Mean:    0.016210291385650635 s
    Median:  0.014001250267028809 s
    Stddev:  0.005966727038671187 s
42 U.S.C. 2000e Fusion:
    Minimum: 0.02099442481994629 s
    Maximum: 0.06598377227783203 s
    Mean:    0.026210627555847167 s
    Median:  0.02399766445159912 s
    Stddev:  0.006877559298661334 s

(the XQuery I used was an adapted version of a giant block we're using for a client that does a bunch of pre- and post-processing, but the most relevant piece is //*[@identifier=$identifier], and this is an indexed attribute)

Finally, I ran a DELETE (PUTting the document back between repetitions, but only timing the DELETE):

1 U.S.C. eXist:
    Minimum: 0.4656083583831787 s
    Maximum: 1.5618107318878174 s
    Mean:    0.6475286889076233 s
    Median:  0.5726385116577148 s
    Stddev:  0.2086682498874921 s
1 U.S.C. Fusion:
    Minimum: 2.4692959785461426 s
    Maximum: 4.827873945236206 s
    Mean:    3.485269944667816 s
    Median:  3.424243688583374 s
    Stddev:  0.5818232474143926 s
42 U.S.C. eXist:
    Minimum: 18.95114755630493 s
    Maximum: 26.387588024139404 s
    Mean:    21.3156515455246 s
    Median:  21.073147296905518 s
    Stddev:  1.509445739774873 s
42 U.S.C. Fusion:
    Minimum: 1957.4530036449432 s
    Maximum: 2816.1365151405334 s
    Mean:    2314.854492249489 s
    Median:  2259.181742668152 s
    Stddev:  224.80505757124126 s

It appears that Fusion's running time across the board grows faster than eXist with respect to document size. As several of Xcential's use cases involve handling large-ish documents, as a result we currently can't recommend Fusion over eXist to clients (which is a unfortunate, because in other POST XQuery tests, such as generating a small top-level outline for all USC titles, Fusion is faster).

@joewiz
Copy link

joewiz commented Jan 25, 2021

@IanDavey Could you provide a link to the data and, if possible, the scripts used for these tests?

@IanDavey
Copy link
Author

The data can be found here:

https://uscode.house.gov/download/download.shtml

(you want the XML link)

@IanDavey
Copy link
Author

The script was continuously changed for each test, but currently I have (with certain items redacted):

#!/usr/bin/env python

import base64
import requests
import statistics
import time

REPETITIONS = 100
EXIST_AUTH = 'Basic ' + base64.b64encode(b'*****:*******').decode()


XQUERY = '''<?xml version="1.0" encoding="utf-8"?>
<query xmlns="http://exist.sourceforge.net/NS/exist" cache="no">
    <text><![CDATA[
            declare default element namespace "http://xml.house.gov/schemas/uslm/1.0";

            declare boundary-space preserve;

            (# exist:batch-transaction #) {
                (: redacted :)
            }
    ]]></text>
</query>
'''

def timed_send(fn, port):
    start = time.time()
    response = requests.delete(f'http://localhost:{port}/exist/rest/db/test/{fn}', headers={'Authorization': EXIST_AUTH})
    assert response.status_code < 300, response.reason + '\n' + response.text
    result = response.text
    end = time.time()
    with open(fn, 'rb') as f:
        response = requests.put(f'http://localhost:{port}/exist/rest/db/test/{fn}', f, headers={'Content-Type': 'application/xml', 'Authorization': EXIST_AUTH})
        assert response.status_code < 300, response.reason + '\n' + response.text
    return end - start

def output_stats(label, dataset):
    print(f'{label}:')
    print(f'\tMinimum: {min(dataset)} s')
    print(f'\tMaximum: {max(dataset)} s')
    print(f'\tMean:    {statistics.mean(dataset)} s')
    print(f'\tMedian:  {statistics.median(dataset)} s')
    if REPETITIONS > 1: print(f'\tStddev:  {statistics.stdev(dataset)} s')

usc01_exist, usc01_fusion, usc42_exist, usc42_fusion = [], [], [], []
for _ in range(REPETITIONS):
    usc01_exist.append(timed_send('usc01.xml', 8080))
    usc01_fusion.append(timed_send('usc01.xml', 4059))
    usc42_exist.append(timed_send('usc42.xml', 8080))
    usc42_fusion.append(timed_send('usc42.xml', 4059))

output_stats('1 U.S.C. eXist', usc01_exist)
output_stats('1 U.S.C. Fusion', usc01_fusion)
output_stats('42 U.S.C. eXist', usc42_exist)
output_stats('42 U.S.C. Fusion', usc42_fusion)

@adamretter
Copy link
Member

adamretter commented Jan 25, 2021

Hi @IanDavey can you tell me which version of FusionDB you tested this against? If you are not using a nightly build, then as recently discussed with Grant, the nightly builds should have much better performance than Alpha3.

@IanDavey
Copy link
Author

It was Alpha 3. That makes sense. I'll retest with the nightly.

@adamretter
Copy link
Member

Thanks @IanDavey much appreciated.

@IanDavey
Copy link
Author

@adamretter Just to confirm — is the latest nightly from 11/25? That's what's showing up on the link you sent us.

@adamretter
Copy link
Member

@IanDavey Yes, that's right. We have been working on a change which has taken more engineering than expected, but we hope to have that pushed in the next few days - so there will then be a new nightly.

@IanDavey
Copy link
Author

IanDavey commented Nov 2, 2021

@adamretter Just checking in — has there been anything new on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants