perf: reduce runner-to-server connection load with adaptive reporting and polling

- Replace fixed 1s RunDaemon timer with event-driven select loop using
  separate log (3s) and state (5s) tickers for periodic flush
- Add batch-size threshold (default 100 rows) to flush logs immediately
  during bursty output like npm install
- Add max-latency timer (default 5s) to guarantee single log lines are
  delivered within a bounded time
- Trigger immediate flush on step transitions (start/stop) and job
  result for responsive frontend UX
- Skip ReportLog when no pending rows and ReportState when state is
  unchanged to eliminate no-op HTTP requests
- Replace fixed-rate polling with exponential backoff and jitter to
  prevent thundering herd on idle runners
- Tune HTTP client with MaxIdleConnsPerHost=10 and share a single
  http.Client between Ping and Runner service clients
- Add configurable options: log_report_interval, log_report_max_latency,
  log_report_batch_size, state_report_interval, fetch_interval_max

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Bo-Yi Wu
2026-04-10 22:41:38 +08:00
parent 90c1275f0e
commit ec07b8c00b
8 changed files with 304 additions and 83 deletions

2
go.mod
View File

@@ -16,7 +16,7 @@ require (
github.com/stretchr/testify v1.11.1
go.yaml.in/yaml/v4 v4.0.0-rc.3
golang.org/x/term v0.40.0
golang.org/x/time v0.14.0
golang.org/x/time v0.14.0 // indirect
google.golang.org/protobuf v1.36.11
gopkg.in/yaml.v3 v3.0.1
gotest.tools/v3 v3.5.2