Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry count has no effect in thin mode when using an address list and running into a timeout #420

Open
quantuminternet opened this issue Nov 12, 2024 · 2 comments
Labels
bug Something isn't working patch available

Comments

@quantuminternet
Copy link

  1. What versions are you using?

Database: 19.0.0.0.0
platform.platform: Linux-5.4.0-148-generic-x86_64-with-glibc2.31
sys.maxsize > 2**32: True
platform.python_version: 3.12.6
oracledb.version: 2.4.1

I have not seen anything in the changelog that suggests that this has been addressed in 2.5.0

  1. Is it an error or a hang or a crash?
    error

  2. What error(s) or behavior you are seeing?
    We have a list of alternative IP addresses for an Oracle database. When one of them fails on connect with a timeout and retry_count is set to something greater than zero, we expect that the connection is attempted again with one of the other addresses. This works as expected in thick mode, but fails in thin mode: In thin mode, an exception is raised after the first attempt.

import oracledb
import time

# connection details have been redacted
working_ip = '[IP of Oracle database]'
broken_ip = '[IP that produces a timeout]'
service_name = '[Oracle service name]'
user = '[my user]'
password = '[my_password]'

dsn = f'(DESCRIPTION=(RETRY_COUNT=3)(CONNECT_TIMEOUT=2)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST={broken_ip})(PORT=33001))(ADDRESS=(PROTOCOL=TCP)(HOST={working_ip})(PORT=33001))(LOAD_BALANCE=yes))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME={service_name})))'

for i in range(10):
    start = time.monotonic()
    success = False
    try:
        with oracledb.connect(dsn=dsn, user=user, password=password, tcp_connect_timeout=2, retry_count=3) as con:
            cursor = con.cursor()
            cursor.execute('select * from dual')
            r = cursor.fetchall()
            success = True
    except Exception as ex:
        print(ex)
    print(f'success: {success}, time: {time.monotonic() - start}')

This gives the following result:

DPY-6005: cannot connect to database (CONNECTION_ID=zcENDxX4Xx2538CUve3T2Q==).
timed out
success: False, time: 2.003529854118824
success: True, time: 0.15504495054483414
success: True, time: 0.15987224131822586
DPY-6005: cannot connect to database (CONNECTION_ID=MF8hAFPk2DrulkI9T3Xf1Q==).
timed out
success: False, time: 2.001703642308712
DPY-6005: cannot connect to database (CONNECTION_ID=v6Zs3XftGMih6hB45cJklA==).
timed out
success: False, time: 2.0028985515236855
success: True, time: 0.1483408883213997
DPY-6005: cannot connect to database (CONNECTION_ID=pa6I5LldNbNobwc2RXNuUg==).
timed out
success: False, time: 2.0020845904946327
DPY-6005: cannot connect to database (CONNECTION_ID=FE43mMP9hjofXzAcIm5P+g==).
timed out
success: False, time: 2.0028276592493057
DPY-6005: cannot connect to database (CONNECTION_ID=jJKdT0p8T+nOOH5j0Qu0ww==).
timed out
success: False, time: 2.002876453101635
success: True, time: 0.40784119069576263

So it looks like sometimes it suceeds on first try, but sometimes it fails on first try and immediately raises an exception, instead of trying a second time. Since it takes 2 seconds until the exception is raised (instead of 8), we can exclude that the first IP is tried multiple times

Running the same code in thick mode gives the expected result: Sometimes it succeeds on first try, sometimes it runs into a timeout after 2 seconds, but succeeds on second try

success: True, time: 0.18361841142177582
success: True, time: 0.19042250514030457
success: True, time: 2.1918497011065483
success: True, time: 2.1891980096697807
success: True, time: 0.18908774107694626
success: True, time: 2.182748392224312
success: True, time: 0.18147382885217667
success: True, time: 0.18108884245157242
success: True, time: 2.1901468336582184
success: True, time: 0.17620021849870682

Removing (LOAD_BALANCE=yes) results in the connection always failing after 2 seconds in thin mode and the connection always succeeding after 2 seconds in thick mode.

Putting (RETRY_COUNT=3) in the DSN or passing it directly as parameter to connect() does not seem to make a difference

  1. Does your application call init_oracle_client()?

The problem occurs in thin mode. If we call init_oracle_client, it works as expected

@quantuminternet quantuminternet added the bug Something isn't working label Nov 12, 2024
@anthony-tuininga
Copy link
Member

I can confirm the issue and will work on a solution. Thanks for reporting it!

anthony-tuininga added a commit that referenced this issue Nov 14, 2024
attempts, subject to the connection "retry_count" and "retry_delay"
parameters (#420).
@anthony-tuininga
Copy link
Member

anthony-tuininga commented Nov 14, 2024

I have pushed a patch that corrects this issue and have initated a build from which you can download pre-built development wheels once it completes. You can also build from source if you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch available
Projects
None yet
Development

No branches or pull requests

2 participants