Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deindexing large snapshots with reindex.py causes a TimeoutError #6291

Open
achave11-ucsc opened this issue May 24, 2024 · 2 comments
Open
Assignees
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost infra [subject] Project infrastructure like CI/CD, build and deployment scripts needs design [process] Solution to issue has yet to be devised orange [process] Done by the Azul team

Comments

@achave11-ucsc
Copy link
Member

achave11-ucsc commented May 24, 2024

Deleting snapshot T2T_CHRY (largest at the moment with 309,979 sub-graphs) in anvilprod took longer than 60s to execute, causing an elasticsearch client timeout.

Running …

❯ python scripts/reindex.py --deindex --catalogs anvil6 anvil6-it --sources 'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'

… outputted:

2024-05-24 08:12:06,585   DEBUG MainThread __main__: Source glob 'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3' matched sources ['tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'] in catalog 'anvil6'
2024-05-24 08:12:06,587   DEBUG MainThread __main__: Source glob 'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3' matched sources ['tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'] in catalog 'anvil6-it'
2024-05-24 08:12:06,593    INFO MainThread botocore.credentials: Found credentials in shared credentials file: ~/.aws/credentials
2024-05-24 08:12:06,626    INFO MainThread azul.deployment: Allocated new Boto3 client for 'secretsmanager' with ID 4416954896
2024-05-24 08:12:07,451    INFO MainThread azul.terra: Making GET request to 'https://data.terra.bio/api/repository/v1/snapshots?filter=ANVIL_T2T_CHRY_20240301_ANV5_202403040508&limit=2'
2024-05-24 08:12:07,451   DEBUG MainThread azul.terra: … without request body
2024-05-24 08:12:08,434    INFO MainThread azul.terra: Got 200 response after 0.983s from GET to https://data.terra.bio/api/repository/v1/snapshots?filter=ANVIL_T2T_CHRY_20240301_ANV5_202403040508&limit=2
2024-05-24 08:12:08,434   DEBUG MainThread azul.terra: … with response headers HTTPHeaderDict({'Date': 'Fri, 24 May 2024 15:12:08 GMT', 'Server': 'Apache', 'X-Frame-Options': 'SAMEORIGIN', 'Access-Control-Allow-Headers': 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Accept,Referer,X-App-Id,Origin', 'Access-Control-Allow-Methods': 'GET,POST,DELETE,PUT,PATCH,OPTIONS,HEAD', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000;includeSubDomains', 'Cache-Control': 'no-cache,no-store,must-revalidate', 'X-Request-ID': 'aErjmGxe', 'Content-Type': 'application/json', 'Content-Length': '891', 'Vary': 'Accept-Encoding,Origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
2024-05-24 08:12:08,434   DEBUG MainThread azul.terra: … with response body b'{"total":1881,"filteredTotal":1,"items":[{"id":"f4accfc6-d9e4-49b1-a590-6a580b4d305f","name":"ANVIL_T2T_CHRY_20240301_ANV5_20...'
2024-05-24 08:12:08,435    INFO MainThread azul.terra: Making GET request to 'https://data.terra.bio/api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f'
2024-05-24 08:12:08,435   DEBUG MainThread azul.terra: … without request body
2024-05-24 08:12:09,363    INFO MainThread azul.terra: Got 200 response after 0.927s from GET to https://data.terra.bio/api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f
2024-05-24 08:12:09,363   DEBUG MainThread azul.terra: … with response headers HTTPHeaderDict({'Date': 'Fri, 24 May 2024 15:12:08 GMT', 'Server': 'Apache', 'X-Frame-Options': 'SAMEORIGIN', 'Access-Control-Allow-Headers': 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Accept,Referer,X-App-Id,Origin', 'Access-Control-Allow-Methods': 'GET,POST,DELETE,PUT,PATCH,OPTIONS,HEAD', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000;includeSubDomains', 'Cache-Control': 'no-cache,no-store,must-revalidate', 'X-Request-ID': 'pozGvk8g', 'Content-Type': 'application/json', 'Content-Length': '37926', 'Vary': 'Accept-Encoding,Origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
2024-05-24 08:12:09,364   DEBUG MainThread azul.terra: … with response body b'{"id":"f4accfc6-d9e4-49b1-a590-6a580b4d305f","name":"ANVIL_T2T_CHRY_20240301_ANV5_202403040508","description":"Full view snap...'
2024-05-24 08:12:09,381    INFO MainThread azul.deployment: Allocated new Boto3 client for 'es' with ID 4425261456
2024-05-24 08:12:09,943   DEBUG MainThread azul.es: Creating ES client [vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443]
2024-05-24 08:12:09,951    INFO MainThread azul.deployment: Allocated new Boto3 client for 'sts' with ID 4425729680
2024-05-24 08:12:09,961    INFO MainThread botocore.credentials: Found credentials in environment variables.
2024-05-24 08:12:09,962    INFO MainThread azul.azulclient: Deindexing sources {'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'} from catalog 'anvil6'
2024-05-24 08:12:09,962   DEBUG MainThread azul.azulclient: Using query: {'query': {'bool': {'should': [{'terms': {'sources.id.keyword': ['f4accfc6-d9e4-49b1-a590-6a580b4d305f']}}, {'terms': {'source.id.keyword': ['f4accfc6-d9e4-49b1-a590-6a580b4d305f']}}]}}}
2024-05-24 08:12:09,962    INFO MainThread elasticsearch: Making POST request to https://vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443/azul_v2_anvilprod_anvil6_activities,azul_v2_anvilprod_anvil6_activities_aggregate,azul_v2_anvilprod_anvil6_biosamples,azul_v2_anvilprod_anvil6_biosamples_aggregate,azul_v2_anvilprod_anvil6_bundles,azul_v2_anvilprod_anvil6_bundles_aggregate,azul_v2_anvilprod_anvil6_datasets,azul_v2_anvilprod_anvil6_datasets_aggregate,azul_v2_anvilprod_anvil6_diagnoses,azul_v2_anvilprod_anvil6_diagnoses_aggregate,azul_v2_anvilprod_anvil6_donors,azul_v2_anvilprod_anvil6_donors_aggregate,azul_v2_anvilprod_anvil6_files,azul_v2_anvilprod_anvil6_files_aggregate,azul_v2_anvilprod_anvil6_replica/_delete_by_query?slices=auto
2024-05-24 08:12:09,962    INFO MainThread elasticsearch: … with request body b'{"query":{"bool":{"should":[{"terms":{"sources.id.keyword":["f4accfc6-d9e4-49b1-a590-6a580b4d305f"]}},{"terms":{"source.id.ke...'
2024-05-24 08:13:10,328 WARNING MainThread elasticsearch: Got no response after 60.366s from POST to https://vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443/azul_v2_anvilprod_anvil6_activities,azul_v2_anvilprod_anvil6_activities_aggregate,azul_v2_anvilprod_anvil6_biosamples,azul_v2_anvilprod_anvil6_biosamples_aggregate,azul_v2_anvilprod_anvil6_bundles,azul_v2_anvilprod_anvil6_bundles_aggregate,azul_v2_anvilprod_anvil6_datasets,azul_v2_anvilprod_anvil6_datasets_aggregate,azul_v2_anvilprod_anvil6_diagnoses,azul_v2_anvilprod_anvil6_diagnoses_aggregate,azul_v2_anvilprod_anvil6_donors,azul_v2_anvilprod_anvil6_donors_aggregate,azul_v2_anvilprod_anvil6_files,azul_v2_anvilprod_anvil6_files_aggregate,azul_v2_anvilprod_anvil6_replica/_delete_by_query?slices=auto
Traceback (most recent call last):
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 286, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/socket.py", line 706, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/ssl.py", line 1314, in recv_into
    return self.read(nbytes, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/ssl.py", line 1166, in read
    return self._sslobj.read(len, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/connection/http_urllib3.py", line 255, in perform_request
    response = self.pool.urlopen(
               ^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/src/azul/es.py", line 198, in urlopen
    return self._inner.urlopen(method, url, body, headers=request.headers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 525, in increment
    raise six.reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 469, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 358, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com', port=443): Read timed out. (read timeout=60)
2024-05-24 08:13:10,341 WARNING MainThread elasticsearch: … without response body
Traceback (most recent call last):
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 286, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/socket.py", line 706, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/ssl.py", line 1314, in recv_into
    return self.read(nbytes, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/.pyenv/versions/3.11.9/lib/python3.11/ssl.py", line 1166, in read
    return self._sslobj.read(len, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/connection/http_urllib3.py", line 255, in perform_request
    response = self.pool.urlopen(
               ^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/src/azul/es.py", line 198, in urlopen
    return self._inner.urlopen(method, url, body, headers=request.headers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 525, in increment
    raise six.reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 469, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 358, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/achave11/Pycharm/Azul/azul.stable/scripts/reindex.py", line 206, in <module>
    main(sys.argv[1:])
  File "/Users/achave11/Pycharm/Azul/azul.stable/scripts/reindex.py", line 170, in main
    azul.deindex(catalog, sources)
  File "/Users/achave11/Pycharm/Azul/azul.stable/src/azul/azulclient.py", line 441, in deindex
    response = es_client.delete_by_query(index=indices, body=query, slices='auto')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/client/utils.py", line 347, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/client/__init__.py", line 738, in delete_by_query
    return self.transport.perform_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/transport.py", line 466, in perform_request
    raise e
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/transport.py", line 427, in perform_request
    status, headers_response, data = connection.perform_request(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/src/azul/es.py", line 78, in perform_request
    return super().perform_request(method, url, params, body, timeout, ignore, headers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/connection/http_urllib3.py", line 279, in perform_request
    raise ConnectionTimeout("TIMEOUT", str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com', port=443): Read timed out. (read timeout=60))

Retrying this command shortly after the first run returned 409 responses for each of the indices in ElasticSearch:

2024-05-24 08:13:51,015    INFO MainThread elasticsearch: Making POST request to https://vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443/azul_v2_anvilprod_anvil6_activities,azul_v2_anvilprod_anvil6_activities_aggregate,azul_v2_anvilprod_anvil6_biosamples,azul_v2_anvilprod_anvil6_biosamples_aggregate,azul_v2_anvilprod_anvil6_bundles,azul_v2_anvilprod_anvil6_bundles_aggregate,azul_v2_anvilprod_anvil6_datasets,azul_v2_anvilprod_anvil6_datasets_aggregate,azul_v2_anvilprod_anvil6_diagnoses,azul_v2_anvilprod_anvil6_diagnoses_aggregate,azul_v2_anvilprod_anvil6_donors,azul_v2_anvilprod_anvil6_donors_aggregate,azul_v2_anvilprod_anvil6_files,azul_v2_anvilprod_anvil6_files_aggregate,azul_v2_anvilprod_anvil6_replica/_delete_by_query?slices=auto
2024-05-24 08:13:51,015    INFO MainThread elasticsearch: … with request body b'{"query":{"bool":{"should":[{"terms":{"sources.id.keyword":["f4accfc6-d9e4-49b1-a590-6a580b4d305f"]}},{"terms":{"source.id.ke...'
2024-05-24 08:13:52,234 WARNING MainThread elasticsearch: Got 409 response after 1.218s from POST to https://vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443/azul_v2_anvilprod_anvil6_activities,azul_v2_anvilprod_anvil6_activities_aggregate,azul_v2_anvilprod_anvil6_biosamples,azul_v2_anvilprod_anvil6_biosamples_aggregate,azul_v2_anvilprod_anvil6_bundles,azul_v2_anvilprod_anvil6_bundles_aggregate,azul_v2_anvilprod_anvil6_datasets,azul_v2_anvilprod_anvil6_datasets_aggregate,azul_v2_anvilprod_anvil6_diagnoses,azul_v2_anvilprod_anvil6_diagnoses_aggregate,azul_v2_anvilprod_anvil6_donors,azul_v2_anvilprod_anvil6_donors_aggregate,azul_v2_anvilprod_anvil6_files,azul_v2_anvilprod_anvil6_files_aggregate,azul_v2_anvilprod_anvil6_replica/_delete_by_query?slices=auto
2024-05-24 08:13:52,235 WARNING MainThread elasticsearch: … with response body '{"took":261,"timed_out":false,"total":215899,"deleted":0,"batches":1,"version_conflicts":1000,"noops":0,"retries":{"bulk":0,"se…'
Traceback (most recent call last):
  File "/Users/achave11/Pycharm/Azul/azul.stable/scripts/reindex.py", line 206, in <module>
    main(sys.argv[1:])
  File "/Users/achave11/Pycharm/Azul/azul.stable/scripts/reindex.py", line 170, in main
    azul.deindex(catalog, sources)
  File "/Users/achave11/Pycharm/Azul/azul.stable/src/azul/azulclient.py", line 441, in deindex
    response = es_client.delete_by_query(index=indices, body=query, slices='auto')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/client/utils.py", line 347, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/client/__init__.py", line 738, in delete_by_query
    return self.transport.perform_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/transport.py", line 466, in perform_request
    raise e
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/transport.py", line 427, in perform_request
    status, headers_response, data = connection.perform_request(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/src/azul/es.py", line 78, in perform_request
    return super().perform_request(method, url, params, body, timeout, ignore, headers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/connection/http_urllib3.py", line 291, in perform_request
    self._raise_error(response.status, raw_data)
  File "/Users/achave11/Pycharm/Azul/azul.stable/.venv/lib/python3.11/site-packages/elasticsearch/connection/base.py", line 328, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.ConflictError: ConflictError(409, '{"took":261,"timed_out":false,"total":215899,"deleted":0,"batches":1,"version_conflicts":1000,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"azul_v2_anvilprod_anvil6_files_aggregate","type":"_doc","id":"60b15ba7-1ad7-40f9-9ef9-e79ec57310d6","cause":{"type":"version_conflict_engine_exception","reason":"[60b15ba7-1ad7-40f9-9ef9-e79ec57310d6]: version conflict, required seqNo [2029195], primary term [1]. but no document was found","index_uuid":"cbM2JoAUSUa6jHuNA5mAmA","shard":"0","index":"azul_v2_anvilprod_anvil6_files_aggregate"},"status":409},{"index":"azul_v2_anvilprod_anvil6_files_aggregate","type":"_doc","id":"5c20fd4d-5e3d-46e7-9aed-cf925159e36a","cause":{"type":"version_conflict_engine_exception","reason":"[5c20fd4d-5e3d-46e7-9aed-cf925159e36a]: version conflict, required seqNo [2029196], primary term [1]. but no document was found","index_uuid":"cbM2JoAUSUa6jHuNA5mAmA","shard":"0","index":"azul_v2_anvilprod_anvil6_files_aggregate"},"status":409},{"index":"azul_v2_anvilprod_anvil6_files_aggregate","type":"_doc","id":"60b3a527-cd46-4dbf-9bd6-d85549754369","cause":{"type":"version_conflict_engine_exception","reason":"[60b3a527-cd46-4dbf-9bd6-d85549754369]: version conflict, required seqNo [2029197], primary term [1]. but no document was found","index_uuid":"cbM2JoAUSUa6jHuNA5mAmA","shard":"0","index":"azul_v2_anvilprod_anvil6_files_aggregate"},"status":409},{"index":"azul_v2_anvilprod_anvil6_files_aggregate","type":"_doc","id":"5ee75279-6373-4ee0-b148-b39baa8b40b5","cause":{"type":"version_conflict_engine_exception","reason":"[5ee75279-6373-4ee0-b148-b39baa8b40b5]: version conflict, required seqNo [2029198], primary term [1]. but no document was found","index_uuid":"cbM2JoAUSUa6jHuNA5mAmA","shard":"0","index":"azul_v2_anvilprod_anvil6_files_aggregate"},"status":409},{"index":"azul_v2_anvilprod_anvil6_files_aggregate","type":"_doc","id":"601cd728-fed4-4336-9c4d-2caa91172d0c","cause":{"type":"version_conflict_engine_exception","reason":"[601cd728-fed4-4336-9c4d-2caa91172d0c]: version conflict, required seqNo [2029199], primary term [1]. but no document was found","index_uuid":"cbM2JoAUSUa6jHuNA5mAmA","shard":"0","index":"azul_v2_anvilprod_anvil6_files_aggregate"},"status":409},{"index":"azul_v2_anvilprod_anvil6_files_aggregate","type":"_doc","id":"5c28569a-eb57-484b-9363-5c7adb8b5eb5","cause":{"type":"version_conflict_engine_exception","reason":"[5c28569a-eb57-484b-9363-5c7adb8b5eb5]: version conflict, required seqNo [2029200], primary term [1]. but no document was found","index_uuid":"cbM2JoAUSUa6jHuNA5mAmA","shard":"0","index":"azul_v2_anvilprod_anvil6_files_aggregate"},"status":409},…'))
@achave11-ucsc achave11-ucsc added the orange [process] Done by the Azul team label May 24, 2024
@achave11-ucsc achave11-ucsc changed the title Deindexing large snapshots with reindex.py causes TimeoutError Deindexing large snapshots with reindex.py causes a TimeoutError May 24, 2024
@achave11-ucsc
Copy link
Member Author

Re-running the command hours after the second attempt, actually succeeded,

❯ python scripts/reindex.py --deindex --catalogs anvil6 --sources 'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'
2024-05-24 14:00:17,352   DEBUG MainThread __main__: Source glob 'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3' matched sources ['tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'] in catalog 'anvil6'
2024-05-24 14:00:17,358    INFO MainThread botocore.credentials: Found credentials in shared credentials file: ~/.aws/credentials
2024-05-24 14:00:17,394    INFO MainThread azul.deployment: Allocated new Boto3 client for 'secretsmanager' with ID 4379854800
2024-05-24 14:00:18,191    INFO MainThread azul.terra: Making GET request to 'https://data.terra.bio/api/repository/v1/snapshots?filter=ANVIL_T2T_CHRY_20240301_ANV5_202403040508&limit=2'
2024-05-24 14:00:18,192   DEBUG MainThread azul.terra: … without request body
2024-05-24 14:00:22,369    INFO MainThread azul.terra: Got 200 response after 4.177s from GET to https://data.terra.bio/api/repository/v1/snapshots?filter=ANVIL_T2T_CHRY_20240301_ANV5_202403040508&limit=2
2024-05-24 14:00:22,369   DEBUG MainThread azul.terra: … with response headers HTTPHeaderDict({'Date': 'Fri, 24 May 2024 21:00:22 GMT', 'Server': 'Apache', 'X-Frame-Options': 'SAMEORIGIN', 'Access-Control-Allow-Headers': 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Accept,Referer,X-App-Id,Origin', 'Access-Control-Allow-Methods': 'GET,POST,DELETE,PUT,PATCH,OPTIONS,HEAD', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000;includeSubDomains', 'Cache-Control': 'no-cache,no-store,must-revalidate', 'X-Request-ID': 'pV5Mb5bB', 'Content-Type': 'application/json', 'Content-Length': '891', 'Vary': 'Accept-Encoding,Origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
2024-05-24 14:00:22,370   DEBUG MainThread azul.terra: … with response body b'{"total":1737,"filteredTotal":1,"items":[{"id":"f4accfc6-d9e4-49b1-a590-6a580b4d305f","name":"ANVIL_T2T_CHRY_20240301_ANV5_20...'
2024-05-24 14:00:22,371    INFO MainThread azul.terra: Making GET request to 'https://data.terra.bio/api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f'
2024-05-24 14:00:22,371   DEBUG MainThread azul.terra: … without request body
2024-05-24 14:00:42,374 WARNING MainThread urllib3.connectionpool: Retrying (_LimitedRetry(total=None, connect=2, read=2, redirect=0, status=2)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='data.terra.bio', port=443): Read timed out. (read timeout=20)")': /api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f
2024-05-24 14:01:02,554 WARNING MainThread urllib3.connectionpool: Retrying (_LimitedRetry(total=None, connect=2, read=1, redirect=0, status=2)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='data.terra.bio', port=443): Read timed out. (read timeout=20)")': /api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f
2024-05-24 14:01:20,613    INFO MainThread azul.terra: Got 200 response after 58.242s from GET to https://data.terra.bio/api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f
2024-05-24 14:01:20,613   DEBUG MainThread azul.terra: … with response headers HTTPHeaderDict({'Date': 'Fri, 24 May 2024 21:01:19 GMT', 'Server': 'Apache', 'X-Frame-Options': 'SAMEORIGIN', 'Access-Control-Allow-Headers': 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Accept,Referer,X-App-Id,Origin', 'Access-Control-Allow-Methods': 'GET,POST,DELETE,PUT,PATCH,OPTIONS,HEAD', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000;includeSubDomains', 'Cache-Control': 'no-cache,no-store,must-revalidate', 'X-Request-ID': 'a8q1P7JK', 'Content-Type': 'application/json', 'Content-Length': '37926', 'Vary': 'Accept-Encoding,Origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
2024-05-24 14:01:20,614   DEBUG MainThread azul.terra: … with response body b'{"id":"f4accfc6-d9e4-49b1-a590-6a580b4d305f","name":"ANVIL_T2T_CHRY_20240301_ANV5_202403040508","description":"Full view snap...'
2024-05-24 14:01:20,632    INFO MainThread azul.deployment: Allocated new Boto3 client for 'es' with ID 4381189520
2024-05-24 14:01:21,197   DEBUG MainThread azul.es: Creating ES client [vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443]
2024-05-24 14:01:21,205    INFO MainThread azul.deployment: Allocated new Boto3 client for 'sts' with ID 4381725392
2024-05-24 14:01:21,215    INFO MainThread botocore.credentials: Found credentials in environment variables.
2024-05-24 14:01:21,215    INFO MainThread azul.azulclient: Deindexing sources {'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'} from catalog 'anvil6'
2024-05-24 14:01:21,215   DEBUG MainThread azul.azulclient: Using query: {'query': {'bool': {'should': [{'terms': {'sources.id.keyword': ['f4accfc6-d9e4-49b1-a590-6a580b4d305f']}}, {'terms': {'source.id.keyword': ['f4accfc6-d9e4-49b1-a590-6a580b4d305f']}}]}}}
2024-05-24 14:01:21,216    INFO MainThread elasticsearch: Making POST request to https://vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443/azul_v2_anvilprod_anvil6_activities,azul_v2_anvilprod_anvil6_activities_aggregate,azul_v2_anvilprod_anvil6_biosamples,azul_v2_anvilprod_anvil6_biosamples_aggregate,azul_v2_anvilprod_anvil6_bundles,azul_v2_anvilprod_anvil6_bundles_aggregate,azul_v2_anvilprod_anvil6_datasets,azul_v2_anvilprod_anvil6_datasets_aggregate,azul_v2_anvilprod_anvil6_diagnoses,azul_v2_anvilprod_anvil6_diagnoses_aggregate,azul_v2_anvilprod_anvil6_donors,azul_v2_anvilprod_anvil6_donors_aggregate,azul_v2_anvilprod_anvil6_files,azul_v2_anvilprod_anvil6_files_aggregate,azul_v2_anvilprod_anvil6_replica/_delete_by_query?slices=auto
2024-05-24 14:01:21,216    INFO MainThread elasticsearch: … with request body b'{"query":{"bool":{"should":[{"terms":{"sources.id.keyword":["f4accfc6-d9e4-49b1-a590-6a580b4d305f"]}},{"terms":{"source.id.ke...'
2024-05-24 14:01:21,782    INFO MainThread elasticsearch: Got 200 response after 0.566s from POST to https://vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443/azul_v2_anvilprod_anvil6_activities,azul_v2_anvilprod_anvil6_activities_aggregate,azul_v2_anvilprod_anvil6_biosamples,azul_v2_anvilprod_anvil6_biosamples_aggregate,azul_v2_anvilprod_anvil6_bundles,azul_v2_anvilprod_anvil6_bundles_aggregate,azul_v2_anvilprod_anvil6_datasets,azul_v2_anvilprod_anvil6_datasets_aggregate,azul_v2_anvilprod_anvil6_diagnoses,azul_v2_anvilprod_anvil6_diagnoses_aggregate,azul_v2_anvilprod_anvil6_donors,azul_v2_anvilprod_anvil6_donors_aggregate,azul_v2_anvilprod_anvil6_files,azul_v2_anvilprod_anvil6_files_aggregate,azul_v2_anvilprod_anvil6_replica/_delete_by_query?slices=auto
2024-05-24 14:01:21,782    INFO MainThread elasticsearch: … with response body '{"took":42,"timed_out":false,"total":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},…'

@dsotirho-ucsc
Copy link
Contributor

@hannes-ucsc: "The solution is most likely to partition the deletion requests so that no request takes longer than 30 seconds, which is a safe margin away from the client timeout of one minute. There may be other solutions. Assignee to consider those. At the moment, the work-around is to retry until the request returns a 200."

@hannes-ucsc hannes-ucsc added needs design [process] Solution to issue has yet to be devised bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost infra [subject] Project infrastructure like CI/CD, build and deployment scripts - [priority] Medium labels Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost infra [subject] Project infrastructure like CI/CD, build and deployment scripts needs design [process] Solution to issue has yet to be devised orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

3 participants