Troubleshooting¶
Plugin not starting¶
Symptom: The kubefence DaemonSet pods are not running, or the plugin
container is crash-looping.
Step 1 — Check kernel version
The kernel must be 5.13 or later. If the kernel is older, kubefence refuses to start with:
Upgrade the node kernel or use a node image with a compatible kernel.
Step 2 — Check NRI socket
The NRI socket must exist. If it is missing, NRI is not enabled in containerd.
The kubefence-node-setup DaemonSet should enable it automatically; check
whether node-setup has completed successfully:
kubectl rollout status daemonset/kubefence-node-setup -n kube-system
kubectl logs -n kube-system -l app.kubernetes.io/component=node-setup --tail=50
Step 3 — Check plugin logs
Look for startup errors. Common messages:
"reading config: ..."— config file not found or not mounted"runtime_classes must not be empty"—config.runtimeClassesis empty in Helm values"nono_bin_path must not be empty"—config.nonoBinPathnot set"stat /opt/nono-nri/nono: no such file or directory"— nono binary not copied by node-setup
Tip
Check node-setup logs first. If node-setup has not completed, the nono binary will not be present at the expected host path.
Containers not sandboxed¶
Symptom: Pods are running but nono is not injecting — /proc/1/cmdline
shows the original command without the nono prefix.
Step 1 — Verify RuntimeClass is set
If this is empty, the pod is using the default runtime and will not be
intercepted by kubefence. Add runtimeClassName: kata-nono-sandbox (Kata) or
runtimeClassName: nono-runc (runc) to the pod spec.
Step 2 — Check plugin logs for the pod
If you see a log entry with "decision":"skip", the plugin received the event
but chose not to inject. The "reason" field explains why:
"runtime class not in config"— the pod's RuntimeClass handler is not inconfig.runtimeClasses
Verify that the RuntimeClass handler exactly matches one of the entries in
config.runtimeClasses. The match is case-sensitive.
Step 3 — Verify nono binary on host
If the binary is absent, node-setup has not completed. Check node-setup status and logs as described in the previous section.
Kata VM issues¶
Symptom: Kata pods fail to start, or nono fails inside the VM with
"Landlock is not supported by the current kernel".
Step 1 — Check KVM availability
If /dev/kvm does not exist, the node does not have KVM hardware acceleration
or it is not enabled. Kata requires KVM.
Step 2 — Check /dev/shm size
Kata uses /dev/shm as a memory backend for NUMA configuration. The default
64 MB is too small for typical Kata VM sizes.
If /dev/shm is smaller than the VM memory size, Kata pods will fail to
start. On Kind nodes, remount with a larger size:
Warning
This change is not persistent across node restarts on Kind clusters.
Step 3 — Check kata-deploy rollout
The kubefence kata-setup DaemonSet waits for kata-deploy to complete before proceeding. If kata-deploy is not rolled out, the Landlock kernel will not be installed.
Step 4 — Check kata-setup logs
Look for errors in kernel or rootfs installation.
Step 5 — Nested KVM (Kind clusters)
If running Kata inside a Kind cluster (nested KVM), also set:
This is required for nested-KVM stability. Without it, Kata VMs may crash or hang intermittently.
Log interpretation¶
The kubefence plugin emits structured JSON logs for every container event.
Key fields:
| Field | Description |
|---|---|
msg |
Event type: "injected", "skip", "container-stopping", "container-removed" |
decision |
Either "inject" or "skip" |
container_id |
The container ID (truncated) |
pod |
Pod name |
namespace |
Pod namespace |
profile |
nono profile applied (or would have been applied) |
runtime_handler |
RuntimeClass handler name from the pod spec |
reason |
For "skip" decisions, the reason the container was not sandboxed |
level |
INFO for normal decisions; WARN for non-critical errors |
Example — injection:
{
"time": "2025-01-15T10:23:45Z",
"level": "INFO",
"msg": "injected",
"decision": "inject",
"container_id": "abc123def456",
"pod": "my-agent",
"namespace": "default",
"profile": "claude-code",
"runtime_handler": "kata-nono-qemu"
}
Example — skip:
{
"time": "2025-01-15T10:23:46Z",
"level": "INFO",
"msg": "skip",
"decision": "skip",
"container_id": "xyz789",
"pod": "nginx-pod",
"namespace": "default",
"profile": "",
"runtime_handler": "runc",
"reason": "runtime class not in config"
}
Example — non-critical warning:
{
"time": "2025-01-15T10:24:00Z",
"level": "WARN",
"msg": "failed to write state metadata",
"container_id": "abc123",
"error": "mkdir /var/run/nono-nri/...: permission denied"
}
Note
WARN level entries indicate non-critical errors. The container is still
sandboxed when a state write fails — the failure only affects audit metadata
and cleanup, not the nono injection itself.
NRI SDK internals emit their own logs in logrus format
(time="..." level=info msg="..."). These are from the SDK, not the
kubefence plugin.