Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dapr stop does not kill additional processes spun up by an app process #1184

Open
mukundansundar opened this issue Jan 26, 2023 · 15 comments
Open
Labels
kind/bug Something isn't working pinned
Milestone

Comments

@mukundansundar
Copy link
Collaborator

mukundansundar commented Jan 26, 2023

Version

This is with latest master build of CLI

Expected Behavior

When an application is run in golang using the go run command, and it is started using dapr run -f, it is expected that dapr stop -f will properly interrupt and kill the go run process and associated app process.

Actual Behavior

dapr stop -f stops the go run process, but another process thats started by go run still keeps running
eg: /var/folders/8n/vhq7f8w1419g3t4ww_46_tlr0000gn/T/go-build2324290313/b001/exe/app

Steps to Reproduce the Problem

When the distributed calc application in quickstarts is run using the new dapr run -f template

version: 1
apps:
  - appDirPath: ./go/
    appID: addapp
    appPort: 6000
    daprHTTPPort: 3503
    command: ["go", "run", "app.go"]
  - appID: divideapp
    appDirPath: ./node/
    appPort: 4000
    daprHTTPPort: 3502
    command: ["node", "app.js"]
  - appID: multiplyapp
    appDirPath: ./python/
    appPort: 5001
    daprHTTPPort: 3501
    command: ["flask", "run"]
    env:
      FLASK_RUN_PORT: 5001
  - appID: subtractapp
    appDirPath: ./csharp/
    appPort: 7001
    daprHTTPPort: 3504
    env:
      ASPNETCORE_URLS: 'http://localhost:7001'
    command: ["dotnet", "./bin/Debug/netcoreapp7.0/Subtract.dll"]
  - appID: frontendapp
    appDirPath: ./react-calculator/
    appPort: 8080
    daprHTTPPort: 3507
    command: ["node", "server.js"]

Note : netcoreapp3.1 needs to be changed to netcoreapp7.0 for the existing quickstart to work

In the above scenario, go run app.go starts one process which then starts the app process separately.

When dapr stop -f is called using the run template file, it only kills the go run app.go process and not the binary app process forked from it.

But when a binary is built using say go build -o test-app and that binary is run as ./test-app, dapr stop -f kills the application process.

Have tried, send os.Interrupt, syscall.SIGTERM but that does not work as expected.

In dapr stop -f, kill command with process ID is used to kill the process.

ps output with pid and ppid

ps aj | grep go
user 94976 37431 94975      0    2 S+   s001    0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox go
user 94930 94924 94924      0    1 S+   s005    0:00.21 go run app.go
user 94952 94930 94924      0    1 S+   s005    0:00.01 /var/folders/8n/vhq7f8w1419g3t4ww_46_tlr0000gn/T/go-build2832959224/b001/exe/app

Note that both go run app.go and ....../exe/app belong to the same process group, 94924

Release Note

RELEASE NOTE:

@mukundansundar mukundansundar added the kind/bug Something isn't working label Jan 26, 2023
@akhilac1
Copy link

@mukundansundar -Just a suggestion- can you please add a 'ps' output for the different stages of this issue showing PID and parent PID - to help comprehend the issue better.

@mukundansundar
Copy link
Collaborator Author

@akhilac1 updated

@mukundansundar
Copy link
Collaborator Author

This exists in normal dapr stop also

Run dapr run --app-id test -- go run app.go
Following are the output of various commands

-> ps aj | grep go
user 97405 37431 97405      0    1 S+   s001    0:00.06 dapr run --app-id test -- go run app.go
user 97416 97405 97405      0    1 S+   s001    0:00.30 go run app.go
user 97437 97416 97405      0    1 S+   s001    0:00.02 /var/folders/8n/vhq7f8w1419g3t4ww_46_tlr0000gn/T/go-build3898998650/b001/exe/app
user 97449 16232 97448      0    2 S+   s005    0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox go
➜ dapr list                              
  APP ID  HTTP PORT  GRPC PORT  APP PORT  COMMAND        AGE  CREATED              DAPRD PID  CLI PID  
  test    54962      54963      0         go run app.go  19s  2023-01-26 11:37.26  97410      97405    
➜  dapr stop test            
✅  app stopped successfully: test
➜   ps aj | grep go
user 97437     1 97405      0    0 S    s001    0:00.02 /var/folders/8n/vhq7f8w1419g3t4ww_46_tlr0000gn/T/go-build3898998650/b001/exe/app
user 97561 16232 97560      0    2 S+   s005    0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox go

@mukundansundar mukundansundar changed the title dapr stop -f does not kill a go run process correctly dapr stop does not kill a go run process correctly Jan 27, 2023
@mukundansundar mukundansundar changed the title dapr stop does not kill a go run process correctly dapr stop does not kill additional processes spun up by an app process Jan 27, 2023
@pravinpushkar
Copy link
Contributor

@mukundansundar Right now we use kill command with the process PID.. which does not stops the grand children(/var/folders/8n/vhq7f8w1419g3t4ww_46_tlr0000gn/T/go-build3898998650/b001/exe/app in our case). We can use kill with process group PID to stop all children.. This will behave similar to ctrl +c.

Screenshot 2023-01-27 at 2 08 56 PM

@pravinpushkar
Copy link
Contributor

@mukundansundar Right now we use kill command with the process PID.. which does not stops the grand children(/var/folders/8n/vhq7f8w1419g3t4ww_46_tlr0000gn/T/go-build3898998650/b001/exe/app in our case). We can use kill with process group PID to stop all children.. This will behave similar to ctrl +c.

Screenshot 2023-01-27 at 2 08 56 PM

This change should also be fine with the normal dapr stop but few tests were started failing when I made the change for normal dapr stop. Need to investigate for normal dapr stop.

@mukundansundar
Copy link
Collaborator Author

For normal stop also it should be the same thing ... We need to be able to make changes for Windows also. In Windows we use taskkill which also has a similar parameter to kill the process tree.

@pravinpushkar
Copy link
Contributor

For normal stop also it should be the same thing

yes, it should be same. But the tests failed consistently on local and github runner. Something to do with tests can also be a possibility.

@pravinpushkar
Copy link
Contributor

I figured out the failure scenario. Nothing to do with the logic but since in the tests processes are started by make file. So, the process group contains every process including the tests starter script. So it kills the whole tests. Need to create a new process group whenever we start the dapr run -f or normal dapr run.

Have made the changes only for dapr run -f for now in this PR - #1181
We will check separately for normal dapr stop as that have to be fixed for windows also.

@mukundansundar
Copy link
Collaborator Author

partially fixed for the dapr run -f feature in PR #1181

@mukundansundar mukundansundar added this to the v1.11 milestone Feb 5, 2023
@dapr-bot
Copy link
Collaborator

dapr-bot commented Mar 7, 2023

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale label Apr 13, 2023
@mukundansundar mukundansundar modified the milestones: v1.11, v1.12 Apr 20, 2023
@dapr-bot
Copy link
Collaborator

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale label May 27, 2023
@dapr-bot
Copy link
Collaborator

dapr-bot commented Jun 3, 2023

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

@dapr-bot dapr-bot closed this as completed Jun 3, 2023
@mukundansundar mukundansundar reopened this Jun 4, 2023
@mukundansundar mukundansundar modified the milestones: v1.12, v1.13 Sep 15, 2023
@mukundansundar mukundansundar modified the milestones: v1.13, v1.14 Feb 8, 2024
@antontroshin antontroshin modified the milestones: v1.14, v1.15 Jul 30, 2024
@piotr-przebieracz
Copy link

Hello,
My team and I are currently testing various building blocks of Dapr and we believe we came across this bug. We were wondering how we should stop Dapr instances and their associated apps. We tried dapr stop CLI command but unfortunately it looks like its behavior is inconsistent and there are situations where it does not kill the associated application. To be more specific we noticed two issues with dapr stop CLI command.

  1. We prepared sample application written in JavaScript. When we ran it using dapr run CLI command on Windows and then tried to stop it using dapr stop CLI command we can see that the node process with our application is still running even though we are getting following messages in the console output:

    Exited Dapr successfully.
    Exited App successfully.
    

    This behavior can be even reproduced by running sample applications from Dapr Quickstarts (for example order processor application from this sample: https://docs.dapr.io/getting-started/quickstarts/pubsub-quickstart/). It doesn't matter if we are running Dapr using a Multi-App Run template file or from console by invoking dapr run CLI command. We can see that this issue is more than one 1.5 years old (milestone is modified - from version v1.12 to v1.15 as of now). Are there any plans to resolve this issue? I think it is a serious problem even in a development environment (when the NodeJS process won’t be stopped, some resources like e.g. network ports will still be blocked and running the process again will fail), not to mention the production environment. We also reproduced the same issue with the same source code on Linux to ensure it is not related to the operating system.

  2. Another issue is that if we are using Multi-App Run (with dapr.yaml file) and then we try to stop one of the applications (using --app-id command line argument or directly from the dapr dashboard) all the apps from the original dapr.yaml file are being stopped (even though the NodeJS processes are still running - see issue number 1). We also noticed that in the documentation for run CLI command (https://docs.dapr.io/reference/cli/dapr-run/) there is an information that --run-file (or -f) argument is in alpha state and it's supported only on Linux/MacOS - is it true and is our issue somehow related to this? We got the impression from multiple samples and quickstarts that using dapr.yaml is the preferred solution - there are no mentions that this option should not be used on Windows.

    What's more, we tried to reproduce the same issue on Linux and we failed - it looks like all the apps from the original dapr.yaml are being stopped only on Windows. However, we noticed another issue related to the Multi-App Run template file on Linux while stopping checkout-sdk application from the sample mentioned above (https://docs.dapr.io/getting-started/quickstarts/pubsub-quickstart/). We get the following output from dapr stop CLI command:
    ✅ app stopped successfully: checkout-sdk
    but we can see that the application is still trying to publish data using pub/sub API:

    == APP - checkout-sdk == Published data: {"orderId":38}
    == APP - checkout-sdk == 2024-08-20T15:47:24.748Z ERROR [HTTPClient, PubSub] publish failed: FetchError: request to http://127.0.0.1:43239/v1.0/publish/orderpubsub/orders? failed, reason: connect ECONNREFUSED 127.0.0.1:43239
    

    This doesn't happen on Linux if we are not using Multi-App Run file.

Thanks in advance for all your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working pinned
Projects
None yet
Development

No branches or pull requests

6 participants