Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: an edge case of kill connctions casued by unnecessary shared_ptr ref count #2904

Merged

Conversation

cheniujh
Copy link
Collaborator

@cheniujh cheniujh commented Sep 20, 2024

观察到一个edge case
最后发生活动那个conn,在发生timeout kill,或者被client kill命令杀连接之后,client确实无法继续通信了,但是server端也没有及时close fd,需要等到后面有新的网络收发包发生,前面这个conn的fd才会被关闭,进一步观察发现,最后发生活动的这个conn其会拥有一个额外的shared_ptr引用计数, 导致pika无法及时将其关闭(目前pika判断shared_ptr的ref count为1后才会close connction.fd)

原因
发现罪魁祸首是epoll循环中的std::shared_ptr in_conn,该对象用于每次有事件就绪时,临时去指向就绪的conn对象,但处理完事件之后又不置空,所以哪怕离开了事件处理分支(处理完了本次事件),他也会指向最后发生事件的那个conn对象,导致其引用计数加1。

解决
更改in_conn定义的位置,该变量只在事件处理分支中会被用到,那就在该分支内定义它即可,确保离开该分支后in_conn对象也会析构,释放引用计数。

Observation of an edge case: After a connection (conn) that was last active experiences a timeout or is killed by a client command, the client indeed cannot continue communication. However, the server side does not immediately close the file descriptor (fd); it has to wait until there is a new connection or network packet transfer, at which point the fd for the previous conn will be closed. Further observation reveals that the last active conn has an additional shared_ptr reference count, which prevents Pika from closing it promptly (currently, Pika only closes the fd when the ref count is 1).

Cause: The culprit was identified as the std::shared_ptr<NetConn> in_conn in the epoll loop. This object is used to temporarily point to the ready conn object whenever an event is ready. However, it is not reset after processing the event, so even after leaving the event handling branch (after processing the current event), it continues to point to the last active conn, increasing its reference count by 1.

Solution: Change the definition of in_conn to be within the event handling branch. This way, it will only be used in that branch, ensuring that once it exits the branch, the in_conn object will be destructed, releasing the reference count.

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Sep 20, 2024
Copy link

coderabbitai bot commented Sep 20, 2024

Walkthrough

The changes involve the modification of the in_conn variable within the WorkerThread::ThreadMain function in src/net/src/worker_thread.cc. The variable, which is a shared pointer to NetConn, has been moved from a broader scope at the beginning of the function to a more localized scope within an else block. This adjustment affects the visibility and lifetime of the variable without altering the overall logic of the function.

Changes

File Path Change Summary
src/net/src/worker_thread.cc std::shared_ptr<NetConn> in_conn declaration moved from the beginning of the function to within an else block.

Sequence Diagram(s)

sequenceDiagram
    participant WorkerThread
    participant NetConn

    WorkerThread->>NetConn: Create instance (previous flow)
    Note over WorkerThread: `in_conn` is accessible throughout the function
    WorkerThread->>NetConn: Create instance (new flow)
    Note over WorkerThread: `in_conn` is only accessible within the `else` block
Loading

Poem

In the code where bunnies hop,
in_conn now won't drop.
Scoped within an else gleam,
A shared pointer, a coder's dream!
Hopping through the logic flow,
A cleaner path for all to know! 🐇✨


Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b2bde42 and 1186231.

Files selected for processing (1)
  • src/net/src/worker_thread.cc (2 hunks)
Additional comments not posted (1)
src/net/src/worker_thread.cc (1)

158-158: LGTM! This change improves resource management.

Moving the declaration of in_conn within the event handling block ensures that the shared pointer is destructed and the reference count is released once the event handling is complete. This allows Pika to close inactive connections promptly, addressing the issue of delayed resource cleanup.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@chejinge chejinge merged commit 6d4c9ca into OpenAtomFoundation:unstable Oct 14, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
☢️ Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants