Commit Graph

8 Commits

Author SHA1 Message Date
Nicolas
c749e52bb7 fix(cleanup): kill Windows step process tree on cancel to avoid hang (#1011)
## Problem

Cancelling a job on a Windows host runner can leave the spawned process
tree running and hang the runner. When a step launches a shell that
starts a child which in turn spawns further GUI/background processes,
cancelling the job kills only the direct child (the default
`exec.CommandContext` behaviour). The surviving descendants inherited
the step's stdout/stderr pipe, so the read end never hit EOF and
`cmd.Wait()` blocked forever.

Because the step executor never returned:
- the orphaned processes kept running (the cancelled work was not
  actually stopped), and
- end-of-job cleanup (`Remove` → `terminateRunningProcesses`) was never
  reached, so the runner appeared to go offline / stop picking up jobs.

`CREATE_NEW_PROCESS_GROUP` does not help here — it affects Ctrl-C signal
delivery, not handle inheritance or tree termination.

## Fix

- Assign each Windows step process to a **Job Object** immediately after
  `cmd.Start()`. Descendants created afterwards are automatically part
  of the job.
- Override `cmd.Cancel` to `TerminateJobObject`, so cancellation kills
  the **entire descendant tree** atomically. This also closes the
  inherited pipe handles, so `cmd.Wait()` can return.
- Set `cmd.WaitDelay` (10s) as a safety net: once the process has
  exited, Wait force-closes the pipes and returns rather than blocking
  forever — covering the case where the job-object setup fails (e.g.
  nested-job restrictions), in which we fall back to the previous
  single-process kill.
- The Job Object is created **without** `JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE`,
  so closing the handle on normal completion does not kill legitimate
  background processes; the tree is only torn down on explicit cancel.

Implemented behind `runtime.GOOS == "windows"` with a Windows-only
`processKiller` (Job Object) and no-op stubs elsewhere, so non-Windows
behaviour (default cancellation + `Setpgid`) is unchanged.

## Changes

- `act/container/process_windows.go` — Job Object `processKiller`
  (create / assign / terminate).
- `act/container/process_other.go` — no-op stubs (`//go:build !windows`).
- `act/container/host_environment.go` — wire `cmd.Cancel` (tree kill)
  and `cmd.WaitDelay` into `exec()`.
- `go.mod` / `go.sum` — promote `golang.org/x/sys` to a direct
  dependency.

## Testing

I fully tested it already

## Notes

Follow-up to the Windows leftover-process reaping in #996: that sweep
now actually runs on cancellation because the step no longer hangs
before reaching it.

Reviewed-on: https://gitea.com/gitea/runner/pulls/1011
Reviewed-by: techknowlogick <9+techknowlogick@noreply.gitea.com>
2026-06-02 16:53:27 +00:00
Nicolas
0b9f251b6a fix: deliver cancel ack and reap leftover Windows job processes (#996)
## Summary

- When Gitea cancels a job, the reporter cancels its own task context; the final Close() flush then aborted on that same cancelled context and Gitea never received the runner's acknowledgement (missing tail logs and final state).
- On Windows the cancelled context also neutralised terminateRunningProcesses, leaving step grandchildren alive in the workspace, holding file handles, so the runner could no longer clean up and pick up new work.
- Reporter.Close() now flushes on a detached, bounded context via a new rpcCtx() helper and configurable Runner.ReportCloseTimeout (default 10s).
- terminateRunningProcesses now PowerShell-enumerates Win32_Process and taskkill /T /F's every process whose ExecutablePath or CommandLine references the job's workspace directories, on a detached context.
- The daemon heartbeat loop still exits on <-r.ctx.Done(): the runner is intentionally seen as offline by Gitea during cleanup so it isn't handed a new task overlapping the in-progress teardown.

## Test plan

- [x] go test ./internal/pkg/report/... ./act/container/ -run 'TestReporter_ServerCancelStillFlushesFinal|TestBuildWindowsWorkspaceKillScript'
- [x] make fmt && make lint-go - 0 issues
- [x] GOOS=windows go build ./... - clean
- [x] Manual on a Windows runner: trigger a long-running workflow, cancel from Gitea UI; verify (a) the job ends with tail logs + cancelled state in Gitea, (b) workspace cleans up, (c) the runner picks up a new job without restart.

Authored-by: bircni
🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: silverwind <me@silverwind.io>
Reviewed-on: https://gitea.com/gitea/runner/pulls/996
Reviewed-by: silverwind <2021+silverwind@noreply.gitea.com>
2026-05-24 10:01:01 +00:00
Nicolas
8a99506fed Fix host cleanup, volume allowlist, cache upload, and action host edge cases (#970)
## Summary
- prevent host-mode execution from deleting caller-owned workdirs
- harden `valid_volumes` checks against `..` and symlink escapes
- return immediately after artifact cache upload write failures
- default implicit remote action clone hosts to `GitHubInstance`/`github.com`

Authored with assistance from OpenAI Codex GPT-5.

---------

Co-authored-by: silverwind <me@silverwind.io>
Reviewed-on: https://gitea.com/gitea/runner/pulls/970
Reviewed-by: silverwind <2021+silverwind@noreply.gitea.com>
2026-05-17 12:53:04 +00:00
silverwind
3c5f03ff8f feat: make pseudo-TTY allocation opt-in (#961)
Fixes #956.

Pseudo-TTY allocation is now an explicit, runner-wide opt-in via `runner.allocate_pty`, applied to both host and docker backends. Default is off, matching GitHub `actions/runner`.

```yaml
runner:
  allocate_pty: false  # default
```

**Before:** the host backend hardcoded `if true /* allocate Terminal */` and the docker backend used `term.IsTerminal(os.Stdout.Fd())`. As a result, `docker build` (and other TTY-aware tools) saw a TTY and emitted cursor-control redraw frames that flooded captured logs with thousands of duplicate-looking progress lines — only on host-mode runners in production, and on docker-mode runners when the daemon happened to be launched from a shell rather than a service.

**After:** both backends consult `Config.AllocatePTY`. The `term.IsTerminal` heuristic is gone, so behavior no longer depends on whether the daemon has a controlling terminal.

**Reproduction:** running `docker build` through `HostEnvironment.Exec` with output captured to a buffer:

| | Before (`if true`) | After (`AllocatePTY=false`) |
|---|---:|---:|
| bytes captured | 18,167 | 1,048 |
| ANSI CSI sequences | 556 | 0 |
| cursor-up `\e[1A` | 181 | 0 |

**Side fix:** `ptyWriter.AutoStop` is now `atomic.Bool`. The field is written from the exec goroutine after `cmd.Wait()` and read from the `copyPtyOutput` goroutine via `ptyWriter.Write`; existing tests never tripped the race detector because their commands produced no output before exit. The new host-mode test does.

---
This PR was written with the help of Claude Opus 4.7

---------

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: Nicolas <bircni@icloud.com>
Reviewed-on: https://gitea.com/gitea/runner/pulls/961
Reviewed-by: Nicolas <bircni@icloud.com>
Co-authored-by: silverwind <2021+silverwind@noreply.gitea.com>
Co-committed-by: silverwind <2021+silverwind@noreply.gitea.com>
2026-05-15 18:11:39 +00:00
silverwind
594c9ade7c Align step failure log output with GitHub Actions (#927)
Fixes #926.

Before:

<img src="/attachments/a5ae9221-eee2-410a-964e-6103ce126df4" alt="image.png" width="400">

After:

<img width="400" alt="image.png" src="attachments/2f2d67c4-6080-4ec3-9ae5-df33e6479920">

Also gets rid of a bunch of emojis in the logging and the obsolete link to `nektos/act` and align some other error messages.

---
This PR was written with the help of Claude Opus 4.7

---------

Co-authored-by: Nicolas <bircni@icloud.com>
Reviewed-on: https://gitea.com/gitea/runner/pulls/927
Reviewed-by: Nicolas <bircni@icloud.com>
Co-authored-by: silverwind <me@silverwind.io>
Co-committed-by: silverwind <me@silverwind.io>
2026-05-05 20:17:32 +00:00
Nicolas
a22119cf88 fix(host): correct host workspace cleanup on Windows (#883)
## Summary
- Fix host-mode cleanup to remove the job **workspace** directory after a run (instead of leaving checkouts behind).
- On Windows, track step process PIDs and terminate remaining process trees during teardown before attempting workspace deletion (prevents file-lock failures).
- Skip workspace deletion when `bind_workdir` is enabled to avoid conflicting with runner-level task directory cleanup.

## Implementation details
- `HostEnvironment` now records PIDs for started commands and best-effort terminates them on Windows during `Remove()`.
- Workspace removal uses a small retry loop on Windows to handle transient locks.
- `BindWorkdir` is propagated into `HostEnvironment` so cleanup behavior matches runner configuration.

---------

Co-authored-by: silverwind <me@silverwind.io>
Co-authored-by: silverwind <2021+silverwind@noreply.gitea.com>
Reviewed-on: https://gitea.com/gitea/runner/pulls/883
Reviewed-by: silverwind <2021+silverwind@noreply.gitea.com>
2026-05-05 18:28:12 +00:00
Lunny Xiao
13dc9386fe Rename act_runner to runner (#850)
## Consumer-facing breaking changes

- **Go module path**: `gitea.com/gitea/act_runner` → `gitea.com/gitea/runner`. Anything importing `act/...` or `internal/...` packages (notably Gitea itself) must update imports.
- **Binary name**: `act_runner` → `gitea-runner`. Wrapper scripts, systemd units, init scripts, and documentation referencing the binary by `act_runner` will break.
- **Docker image**: `gitea/act_runner` → `gitea/runner` (incl. `*-dind-rootless` variants). Users pulling `gitea/act_runner:nightly` etc. will get stale images. Note: the image name is `gitea/runner`, not `gitea/gitea-runner`.
- **Release artifact paths**: S3 directory `act_runner/{{.Version}}` → `gitea-runner/{{.Version}}`, and artifact filenames change with the new project name. Existing download URLs break.
- **Metrics namespace**: changed from `act_runner` to `gitea_runner` (e.g. `act_runner_jobs_total` → `gitea_runner_jobs_total`); existing monitors/dashboards must be updated.
- **ldflags version path**: `gitea.com/gitea/act_runner/internal/pkg/ver.version` → `gitea.com/gitea/runner/internal/pkg/ver.version`. Affects anyone building with custom ldflags.
- **Kubernetes example resource names**: `act-runner` / `act-runner-vol` → `runner` / `runner-vol`. Users who copied the manifests verbatim will see resource churn on apply.
- **s6 service name**: `scripts/s6/act_runner/` → `scripts/s6/gitea-runner/` (image-internal; only matters for downstream image overrides).

Unchanged: YAML config field names, env vars (`GITEA_*`), CLI flags/subcommands, registration file format.
---------

Co-authored-by: silverwind <me@silverwind.io>
Reviewed-on: https://gitea.com/gitea/runner/pulls/850
Reviewed-by: Zettat123 <39446+zettat123@noreply.gitea.com>
Reviewed-by: silverwind <2021+silverwind@noreply.gitea.com>
Reviewed-by: Nicolas <bircni@icloud.com>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-committed-by: Lunny Xiao <xiaolunwen@gmail.com>
2026-04-30 20:12:51 +00:00
silverwind
fab2d6ae04 Merge gitea/act into act/
Merges the `gitea.com/gitea/act` fork into this repository as the `act/`
directory and consumes it as a local package. The `replace github.com/nektos/act
=> gitea.com/gitea/act` directive is removed; act's dependencies are merged
into the root `go.mod`.

- Imports rewritten: `github.com/nektos/act/pkg/...` → `gitea.com/gitea/act_runner/act/...`
  (flattened — `pkg/` boundary dropped to match the layout forgejo-runner adopted).
- Dropped act's CLI (`cmd/`, `main.go`) and all upstream project files; kept
  the library tree + `LICENSE`.
- Added `// Copyright <year> The Gitea Authors ...` / `// Copyright <year> nektos`
  headers to 104 `.go` files.
- Pre-existing act lint violations annotated inline with
  `//nolint:<linter> // pre-existing issue from nektos/act`.
  `.golangci.yml` is unchanged vs `main`.
- Makefile test target: `-race -short` (matches forgejo-runner).
- Pre-existing integration test failures fixed: race in parallel executor
  (atomic counters); TestSetupEnv / command_test / expression_test /
  run_context_test updated to match gitea fork runtime; TestJobExecutor and
  TestActionCache gated on `testing.Short()`.

Full `gitea/act` commit history is reachable via the second parent.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
2026-04-22 22:29:06 +02:00