Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection Reset Error Occurs Occasionally with Client-v2 0.7.2 #2070

Open
lelewolf opened this issue Jan 7, 2025 · 4 comments
Open

Connection Reset Error Occurs Occasionally with Client-v2 0.7.2 #2070

lelewolf opened this issue Jan 7, 2025 · 4 comments

Comments

@lelewolf
Copy link

lelewolf commented Jan 7, 2025

Describe the bug

There is an issue where the ClickHouse client fails to execute a query, resulting in a “Connection reset” error. The request is being terminated unexpectedly.

Steps to reproduce

1.	Run the query on the ClickHouse client with a high load or specific network conditions.
2.	Observe that the connection resets with a SocketException: Connection reset error.
3.	The issue happens after the socket connection times out or gets interrupted.

Expected behaviour

The client should execute the query successfully without encountering a connection reset error, even with high traffic or under timeout conditions.

Code example

package com.opay.finder.analysis.config;

import com.clickhouse.client.api.Client;
import com.clickhouse.client.config.ClickHouseClientOption;
import com.clickhouse.client.config.ClickHouseHealthCheckMethod;
import java.time.temporal.ChronoUnit;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @author lang
 * @description clickhouse-client配置类
 * @date 2024/10/24 10:58
 */
@Configuration
public class ClickHouseClientConfig {

  private final ClickHouseConfig clickHouseConfig;

  public ClickHouseClientConfig(ClickHouseConfig clickHouseConfig) {
    this.clickHouseConfig = clickHouseConfig;
  }

  @Bean
  public Client clickhouseClient() {
    return new Client.Builder()
        .addEndpoint(clickHouseConfig.getUrl())
        .setUsername(clickHouseConfig.getUsername())
        .setPassword(clickHouseConfig.getPassword())
        .setSocketTimeout(clickHouseConfig.getSocketTimeout(), ChronoUnit.HOURS)
        .setSocketKeepAlive(Boolean.TRUE)
        .setConnectTimeout(clickHouseConfig.getConnectionTimeout(), ChronoUnit.HOURS)
        .setConnectionTTL(clickHouseConfig.getConnectionTtl(), ChronoUnit.MINUTES)
        .setMaxConnections(clickHouseConfig.getMaxConnection())
        .enableConnectionPool(Boolean.TRUE)
        .setMaxRetries(3)
        .setOption(ClickHouseClientOption.ASYNC.getKey(), "false")
        .setOption(ClickHouseClientOption.AUTO_DISCOVERY.getKey(), "true")
//        .setSocketKeepAlive(true)
        .setOption(ClickHouseClientOption.LOAD_BALANCING_POLICY.getKey(), "roundRobin")
        .setOption(ClickHouseClientOption.HEALTH_CHECK_INTERVAL.getKey(), "60000")
        //这个研究一下,修改为获取系统当前负载的方式;默认是select 1
        .setOption(ClickHouseClientOption.HEALTH_CHECK_METHOD.getKey(), ClickHouseHealthCheckMethod.SELECT_ONE.name())
        .setConnectionRequestTimeout(clickHouseConfig.getConnectionRequestTimeout(), ChronoUnit.MINUTES)
        .build();
  }
}
### Error log
2025-01-07 07:10:43.143 opay-finder-web  [ForkJoinPool.commonPool-worker-129] DEBUG org.apache.hc.client5.http.wire [wire:106]- http-outgoing-399 << "[read] I/O error: Connection reset"
2025-01-07 07:10:43.144 opay-finder-web  [ForkJoinPool.commonPool-worker-129] DEBUG o.a.h.c.h.i.i.DefaultManagedHttpClientConnection [close:155]- http-outgoing-399 Close connection
2025-01-07 07:10:43.144 opay-finder-web  [ForkJoinPool.commonPool-worker-129] DEBUG o.a.h.c.h.i.c.InternalHttpClient [discardEndpoint:261]- ep-0000001120 endpoint closed
2025-01-07 07:10:43.144 opay-finder-web  [ForkJoinPool.commonPool-worker-129] DEBUG o.a.h.c.h.i.c.InternalHttpClient [discardEndpoint:265]- ep-0000001120 discarding endpoint
2025-01-07 07:10:43.144 opay-finder-web  [ForkJoinPool.commonPool-worker-129] DEBUG o.a.h.c.h.i.i.PoolingHttpClientConnectionManager [release:424]- ep-0000001120 releasing endpoint
2025-01-07 07:10:43.144 opay-finder-web  [ForkJoinPool.commonPool-worker-129] DEBUG o.a.h.c.h.i.i.PoolingHttpClientConnectionManager [release:455]- ep-0000001120 connection is not kept alive)
2025-01-07 07:10:43.144 opay-finder-web  [ForkJoinPool.commonPool-worker-129] DEBUG o.a.h.c.h.i.i.PoolingHttpClientConnectionManager [release:465]- ep-0000001120 connection released [route: {}->[http://10.166.16.117:8123]][total available: 1; route allocated: 3 of 15; total allocated: 3 of 15]
2025-01-07 07:10:43.145 opay-finder-web  [ForkJoinPool.commonPool-worker-129] ERROR c.o.f.a.b.i.QuerySchedulerServiceImpl [lambda$executeScheduledQuery$0:221]- 查询 [FX_PRO_10000009_FUNNEL_dF717CsZbAzhEql174CG9AUCYYSWXUWI_t00] 执行失败: Failed to execute request com.clickhouse.client.api.ClientException: Failed to execute request
	at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:404)
	at com.clickhouse.client.api.Client.lambda$query$11(Client.java:1706)
	at com.clickhouse.client.api.Client.runAsyncOperation(Client.java:2116)
	at com.clickhouse.client.api.Client.query(Client.java:1782)
	at com.clickhouse.client.api.Client.query(Client.java:1647)
	at com.opay.finder.analysis.biz.impl.ClickHouseQueryServiceImpl.executeQuery(ClickHouseQueryServiceImpl.java:57)
	at com.opay.finder.analysis.biz.impl.ClickHouseQueryServiceImpl.executeSQLQuery(ClickHouseQueryServiceImpl.java:165)
	at com.opay.finder.analysis.biz.impl.ClickHouseQueryServiceImpl.executeSQL(ClickHouseQueryServiceImpl.java:125)
	at com.opay.finder.analysis.biz.impl.QuerySchedulerServiceImpl.lambda$executeScheduledQuery$0(QuerySchedulerServiceImpl.java:217)
	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1796)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
Caused by: java.net.SocketException: Connection reset
	at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:318)
	at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346)
	at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796)
	at java.base/java.net.Socket$SocketInputStream.read(Socket.java:1099)
	at org.apache.hc.client5.http.impl.io.LoggingInputStream.read(LoggingInputStream.java:83)
	at org.apache.hc.core5.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:149)
	at org.apache.hc.core5.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
	at org.apache.hc.core5.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:250)
	at org.apache.hc.core5.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:56)
	at org.apache.hc.core5.http.impl.io.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:331)
	at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:193)
	at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.lambda$execute$0(InternalExecRuntime.java:236)
	at org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager$InternalConnectionEndpoint.execute(PoolingHttpClientConnectionManager.java:791)
	at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.execute(InternalExecRuntime.java:233)
	at org.apache.hc.client5.http.impl.classic.MainClientExec.execute(MainClientExec.java:121)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:199)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:150)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:174)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55)
	at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183)
	at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:377)
	... 15 common frames omitted

Configuration

Environment

  • Client version: clent-v2 0.7.2
  • Language version: Java version “21.0.5” (LTS, 2024-10-15)
  • OS: CentOS Linux 7 (Core)

ClickHouse server

  • ClickHouse Server version: version 23.3.2.1
  • ClickHouse Server non-default settings, if any:
  • CREATE TABLE statements for tables involved:
  • Sample data for all these tables, use clickhouse-obfuscator if necessary
@lelewolf lelewolf added the bug label Jan 7, 2025
@chernser
Copy link
Contributor

chernser commented Jan 7, 2025

Good day, @lelewolf !
Thank you for reporting! I will take a look into it.

@chernser chernser self-assigned this Jan 7, 2025
@chernser
Copy link
Contributor

chernser commented Jan 7, 2025

@lelewolf
I've got a few questions:

  • is problem occurs only when client is loaded? do you have a GC activity graph by a chance?
  • in what environment application is running?

Note:
client cannot prevent system to reset connection but can do a retry. Seem retry is not triggered on timeout.

@chernser chernser added this to the Priority Backlog milestone Jan 7, 2025
@lelewolf
Copy link
Author

lelewolf commented Jan 8, 2025

Hi @chernser,

I’m running the application in a Spring Boot environment, and the issue does not occur during client loading. Below, I’ve provided the gc.log, although I’m not sure if the problem is GC-related.

Currently, there’s a specific SQL query that always triggers this issue. However, when I execute the same SQL in DBeaver, it works fine and returns results as expected. This leads me to suspect that the issue might be related to the client or its configuration.
SELECT time_index, level_index, count(DISTINCT user_id) as event_users FROM ( SELECT (toUInt32((toUInt64(server_time / 1000) - 1733007600) / 86400)) AS time_index, hash_uid AS user_id, windowFunnel(86400)( toUInt64(server_time / 1000), event = 'ac_home_show', ( (event = 'ac_home_search_success_bene_click') OR (event = 'ac_home_recent_bene_click') OR (event = 'ac_home_saved_bene_click') OR (event = 'ac_home_next_click') OR (event = 'ac_bene_list_recent_click') OR (event = 'ac_bene_list_saved_click') OR (event = 'ac_bene_list_search_suc_click') ), event = 'ac_enter_amount_show', event = 'COMMON_pay_window_show', event = 'COMMON_order_response' ) AS level, arrayJoin(arrayEnumerate(arrayWithConstant(level, 1))) AS level_index FROM events_all WHERE app_id IN (10000009) AND ( event_date >= '2024-12-01' AND event_date <= '2024-12-31' AND toUInt64(server_time / 1000) >= 1733007600 AND toUInt64(server_time / 1000) <= 1735689599 ) AND ( (event = 'ac_home_show' AND (ifNull(string_params['user_type'], 'null') IN ('old_ac_user'))) OR ( (event = 'ac_home_search_success_bene_click') OR (event = 'ac_home_recent_bene_click') OR (event = 'ac_home_saved_bene_click') OR (event = 'ac_home_next_click') OR (event = 'ac_bene_list_recent_click') OR (event = 'ac_bene_list_saved_click') OR (event = 'ac_bene_list_search_suc_click') OR (event = 'ac_enter_amount_show') OR ( event = 'COMMON_pay_window_show' AND (ifNull(string_params['service_type'], 'null') IN ('bank')) ) OR ( event = 'COMMON_order_response' AND (ifNull(string_params['st'], 'null') IN ('0')) ) ) ) GROUP BY user_id, time_index ) GROUP BY time_index, level_index ORDER BY level_index, time_index ASC;

Here’s the gc.log file and the analysis result:
GC log:
gc.log

Analysis Result:

Let me know if you need additional details!

@lelewolf
Copy link
Author

lelewolf commented Jan 8, 2025

Hi @chernser,

After comparing DBeaver’s configuration with my own client configuration, I made some adjustments, and the “Connection reset” issue no longer occurs.

DBeaver’s configuration:
(Provide the relevant DBeaver configuration details here)
image

My latest configuration:
@Bean public Client clickhouseClient() { return new Client.Builder() .addEndpoint(clickHouseConfig.getUrl()) .setUsername(clickHouseConfig.getUsername()) .setPassword(clickHouseConfig.getPassword()) .setSocketTimeout(clickHouseConfig.getSocketTimeout(), ChronoUnit.HOURS) .setSocketKeepAlive(Boolean.TRUE) .setConnectTimeout(clickHouseConfig.getConnectionTimeout(), ChronoUnit.HOURS) .setConnectionTTL(clickHouseConfig.getConnectionTtl(), ChronoUnit.MINUTES) .setMaxConnections(clickHouseConfig.getMaxConnection()) .enableConnectionPool(Boolean.TRUE) .setMaxRetries(3) .setOption(ClickHouseClientOption.ASYNC.getKey(), "false") .setOption(ClickHouseClientOption.AUTO_DISCOVERY.getKey(), "true") .setOption(ClickHouseClientOption.LOAD_BALANCING_POLICY.getKey(), "roundRobin") .setOption(ClickHouseClientOption.HEALTH_CHECK_INTERVAL.getKey(), "60000") .useHttpCompression(Boolean.TRUE) .compressClientRequest(Boolean.TRUE) .setOption(ClickHouseHttpOption.CONNECTION_PROVIDER.getKey(), HttpConnectionProvider.HTTP_URL_CONNECTION.name()) //这个研究一下,修改为获取系统当前负载的方式;默认是select 1 .setOption(ClickHouseClientOption.HEALTH_CHECK_METHOD.getKey(), ClickHouseHealthCheckMethod.SELECT_ONE.name()) .setConnectionRequestTimeout(clickHouseConfig.getConnectionRequestTimeout(), ChronoUnit.MINUTES) .build(); }
I downgraded the clickhouse-java version from 0.7.2 to 0.7.1-patch1.

Let me know if you need further details or assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants