Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd behavior with job status updates on long-running scripts #4177

Open
bt-ndollimount opened this issue Dec 24, 2024 · 0 comments
Open

Odd behavior with job status updates on long-running scripts #4177

bt-ndollimount opened this issue Dec 24, 2024 · 0 comments
Labels
bug Something isn't working PowerShell Universal Issue relates to PowerShell Universal. requires triage Issue has not yet been verified by the development team. v5 Version 5 issue.

Comments

@bt-ndollimount
Copy link

bt-ndollimount commented Dec 24, 2024

Description of Issue

We have a few scripts that take an hour or more to complete. These scripts are written to use parallel jobs and progress is written back while running. We've noticed since a while now that sometimes (more often in later versions) the job output will hit a point where it looks like the script has restarted. We will get the output written as if it restarted and we then start to get 2 progress amounts and it will flip-flop back and forth between the two. Sometimes if it goes long enough, there will be a third. Also, the job start time restarts to reflect the time the issue occurs and counts from there.

Checking in the Hangfire diagnostics, the current job will show that it started recently as opposed to the original job start time. When looking at the details, it displays at the top, "Looks like the job was aborted...".

My suspicion is that Hangfire is losing connection to the original process and assumes that it was aborted so it kicks off a new instance as a retry, but then you have processes running the same script, updating the output and progress to the same job, resulting in the job output showing the output of both scripts and the progress bouncing back and forth between the two.

We can see the two processes in Task Manager as well. The PID reported remains that of the first one; checking the start times of the processes, 15544 was exactly a half hour prior to 29260.

I never had enough details previously to confidently report it but it seems pretty obvious now what is happening.

UPDATE
We now have a third one spawned and it's also exactly a half hour after the second one spawned so that seems to be more than a coincidence. Added screenshots of the process properties showing the start times.

Version

5.1.12

Severity

Medium

Hosting Method

MSI (Windows Service)

Operating System

Windows

Database

PostgreSQL

Licensed

Yes

Features

Git Sync

Additional Environment data

Amazon Linux
PowerShell 7.4.6
PostgreSQL 16

Screenshots/Animations

image
image
image
image

image
image
image

2024-12-24_11-29-45.mp4
@bt-ndollimount bt-ndollimount added bug Something isn't working PowerShell Universal Issue relates to PowerShell Universal. requires triage Issue has not yet been verified by the development team. v5 Version 5 issue. labels Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PowerShell Universal Issue relates to PowerShell Universal. requires triage Issue has not yet been verified by the development team. v5 Version 5 issue.
Projects
None yet
Development

No branches or pull requests

1 participant