mirror of
https://gitea.com/gitea/act_runner.git
synced 2026-06-09 18:44:23 +02:00
## Problem Cancelling a job on a Windows host runner can leave the spawned process tree running and hang the runner. When a step launches a shell that starts a child which in turn spawns further GUI/background processes, cancelling the job kills only the direct child (the default `exec.CommandContext` behaviour). The surviving descendants inherited the step's stdout/stderr pipe, so the read end never hit EOF and `cmd.Wait()` blocked forever. Because the step executor never returned: - the orphaned processes kept running (the cancelled work was not actually stopped), and - end-of-job cleanup (`Remove` → `terminateRunningProcesses`) was never reached, so the runner appeared to go offline / stop picking up jobs. `CREATE_NEW_PROCESS_GROUP` does not help here — it affects Ctrl-C signal delivery, not handle inheritance or tree termination. ## Fix - Assign each Windows step process to a **Job Object** immediately after `cmd.Start()`. Descendants created afterwards are automatically part of the job. - Override `cmd.Cancel` to `TerminateJobObject`, so cancellation kills the **entire descendant tree** atomically. This also closes the inherited pipe handles, so `cmd.Wait()` can return. - Set `cmd.WaitDelay` (10s) as a safety net: once the process has exited, Wait force-closes the pipes and returns rather than blocking forever — covering the case where the job-object setup fails (e.g. nested-job restrictions), in which we fall back to the previous single-process kill. - The Job Object is created **without** `JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE`, so closing the handle on normal completion does not kill legitimate background processes; the tree is only torn down on explicit cancel. Implemented behind `runtime.GOOS == "windows"` with a Windows-only `processKiller` (Job Object) and no-op stubs elsewhere, so non-Windows behaviour (default cancellation + `Setpgid`) is unchanged. ## Changes - `act/container/process_windows.go` — Job Object `processKiller` (create / assign / terminate). - `act/container/process_other.go` — no-op stubs (`//go:build !windows`). - `act/container/host_environment.go` — wire `cmd.Cancel` (tree kill) and `cmd.WaitDelay` into `exec()`. - `go.mod` / `go.sum` — promote `golang.org/x/sys` to a direct dependency. ## Testing I fully tested it already ## Notes Follow-up to the Windows leftover-process reaping in #996: that sweep now actually runs on cancellation because the step no longer hangs before reaching it. Reviewed-on: https://gitea.com/gitea/runner/pulls/1011 Reviewed-by: techknowlogick <9+techknowlogick@noreply.gitea.com>
72 lines
2.2 KiB
Go
72 lines
2.2 KiB
Go
// Copyright 2026 The Gitea Authors. All rights reserved.
|
|
// SPDX-License-Identifier: MIT
|
|
|
|
package container
|
|
|
|
import (
|
|
"os"
|
|
|
|
"golang.org/x/sys/windows"
|
|
)
|
|
|
|
// processKiller terminates a step process together with its entire descendant
|
|
// tree via a Windows Job Object.
|
|
//
|
|
// Background: a step often launches a process tree (a shell that starts a
|
|
// child which in turn spawns further GUI or background processes). The default
|
|
// exec.CommandContext cancellation only kills the direct child, so cancelling a
|
|
// job left the rest of the tree running. Because those orphans inherited the
|
|
// step's stdout/stderr pipe, cmd.Wait() also blocked forever and the runner hung.
|
|
//
|
|
// Assigning the step process to a Job Object lets us kill the whole tree
|
|
// atomically on cancellation (TerminateJobObject), which also closes the
|
|
// inherited pipe handles so cmd.Wait() can return.
|
|
type processKiller struct {
|
|
job windows.Handle
|
|
}
|
|
|
|
// newProcessKiller creates a Job Object and assigns p (an already-started
|
|
// process) to it. Children spawned by p afterwards are automatically part of
|
|
// the job. The job does NOT use JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, so closing
|
|
// the handle on normal completion does not kill legitimate background
|
|
// processes; the tree is only torn down by an explicit Kill (cancellation).
|
|
func newProcessKiller(p *os.Process) (*processKiller, error) {
|
|
job, err := windows.CreateJobObject(nil, nil)
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
h, err := windows.OpenProcess(windows.PROCESS_SET_QUOTA|windows.PROCESS_TERMINATE, false, uint32(p.Pid))
|
|
if err != nil {
|
|
windows.CloseHandle(job)
|
|
return nil, err
|
|
}
|
|
defer windows.CloseHandle(h)
|
|
|
|
if err := windows.AssignProcessToJobObject(job, h); err != nil {
|
|
windows.CloseHandle(job)
|
|
return nil, err
|
|
}
|
|
|
|
return &processKiller{job: job}, nil
|
|
}
|
|
|
|
// Kill terminates every process currently assigned to the job (the step process
|
|
// and all of its descendants).
|
|
func (k *processKiller) Kill() error {
|
|
if k == nil || k.job == 0 {
|
|
return nil
|
|
}
|
|
return windows.TerminateJobObject(k.job, 1)
|
|
}
|
|
|
|
// Close releases the job handle. It does not terminate the processes.
|
|
func (k *processKiller) Close() error {
|
|
if k == nil || k.job == 0 {
|
|
return nil
|
|
}
|
|
h := k.job
|
|
k.job = 0
|
|
return windows.CloseHandle(h)
|
|
}
|