mirror of
https://gitea.com/gitea/act_runner.git
synced 2026-06-15 14:24:22 +02:00
Cancelling a job on a Linux/macOS host runner can leave the spawned process tree running and hang the runner — the same failure mode fixed for Windows in #1011, just on the other platforms. Steps are launched as process-group leaders (`Setpgid`, or `Setsid` for the PTY path), but the default `exec.CommandContext` cancellation only kills the **direct child**. When a step launches a shell that starts a child which in turn spawns further background processes, cancelling the job leaves the descendants running. Because those orphans inherited the step's stdout/stderr pipe, the read end never hits EOF and `cmd.Wait()` blocks forever. Because the step executor never returns: - the orphaned processes keep running (the cancelled work is not actually stopped), and - end-of-job cleanup is never reached, so the runner appears to go offline / stop picking up jobs. ## Fix Apply the same tree-kill approach as Windows, using the Unix counterpart of a Job Object: the **process group**. - Add a Unix `processKiller` (`process_unix.go`) that captures the step's PGID (== PID, since the step is launched as a group leader) and sends `SIGKILL` to the whole group on cancellation. This also closes the inherited pipe handles so `cmd.Wait()` can return. `ESRCH` (group already gone) is not treated as an error. - Restrict the previous no-op stub (`process_other.go`) to `plan9` and have it fall back to a single-process kill, preserving plan9's prior behaviour. - Wire `cmd.Cancel` (tree kill) and `cmd.WaitDelay` (10s) **unconditionally** in `exec()` instead of Windows-only. `WaitDelay` also covers a step that backgrounds a process holding the pipe open after the main process exits. Reviewed-on: https://gitea.com/gitea/runner/pulls/1025 Reviewed-by: Zettat123 <39446+zettat123@noreply.gitea.com>
2.1 KiB
2.1 KiB