Troubleshoot Networking
MetalLB Not Assigning IPs
Symptoms: LoadBalancer Services stuck in Pending with no external IP.
Diagnosis:
# On the tenant cluster
kubectl get pods -n metallb-system
kubectl get ipaddresspool -n metallb-system
kubectl get l2advertisement -n metallb-system
Solutions:
-
No IP pool configured -- Verify the TenantCluster has a
loadBalancerPoolor that IPAM allocated a range. Check IPAllocation:kubectl get ipallocation -n butler-system -l butler.butlerlabs.dev/tenant=<cluster-name> -
IP pool exhausted -- All IPs in the MetalLB pool are assigned. For elastic IPAM, Butler detects the Pending Service and creates a growth allocation automatically. If growth does not fire, check whether the per-tenant quota (
maxLoadBalancerIPs) has been reached, or whether the controller can reach the tenant API server. -
L2 advertisement missing -- MetalLB L2 mode requires an L2Advertisement resource matching the IPAddressPool. Butler creates this automatically; if missing, check the butler-controller logs.
-
MetalLB pool drift -- If the MetalLB
default-poolon the tenant does not match the management-side IPAllocations, the controller corrects it automatically on the next reconcile via server-side apply. Manual edits todefault-poolare overwritten. Operators who need custom pools should create additional IPAddressPool resources with different names.
IPAM Allocation Failures
Symptoms: IPAllocation stuck in Pending or transitions to Failed.
Diagnosis:
kubectl describe ipallocation -n butler-system -l butler.butlerlabs.dev/tenant=<cluster-name>
kubectl describe networkpool -n butler-system <pool-name>
Solutions:
- Pool exhausted -- Check the NetworkPool status for
allocatedIPsvstotalIPs. Check theCapacityExhaustedcondition on the pool. Create a new NetworkPool or expand the CIDR. - Range conflict -- The requested range overlaps with an existing allocation or reserved range. Check the NetworkPool's
reservedlist. - Quota reached -- The tenant has reached
quotaPerTenant.maxLoadBalancerIPs. Check the ProviderConfig for quota settings and the total allocated IPs for the tenant:kubectl get ipallocation -n butler-system \
-l butler.butlerlabs.dev/tenant=<cluster-name> \
-o custom-columns='NAME:.metadata.name,COUNT:.spec.count,PHASE:.status.phase'
Growth Allocation Not Firing
Symptoms: Tenant LB Service stuck Pending, but no growth IPAllocation created.
Diagnosis:
# Confirm elastic IPAM mode
kubectl get providerconfig <name> -n butler-system \
-o jsonpath='{.spec.network.loadBalancer.allocationMode}'
# Check quota vs current allocations
kubectl get providerconfig <name> -n butler-system \
-o jsonpath='{.spec.network.quotaPerTenant.maxLoadBalancerIPs}'
kubectl get ipallocation -n butler-system \
-l butler.butlerlabs.dev/tenant=<cluster-name>
# Check controller logs for the tenant
kubectl logs -n butler-system -l app.kubernetes.io/name=butler-controller --tail=100 \
| grep <cluster-name>
Solutions:
- Not elastic mode -- Growth only fires with
allocationMode: elastic. Static mode allocates once at creation. - Quota reached -- Total allocated IPs equal
maxLoadBalancerIPs. Increase the quota or delete unused Services. - Service too new -- The controller waits 30 seconds after Service creation before treating it as a demand signal. Wait and check again.
- In-flight supply covers demand -- If a growth allocation was recently created and is still Pending or Allocated but not yet consumed by a Service, the controller deducts it from the demand count. This is normal during MetalLB propagation (up to ~37 seconds). Check
kubectl get ipallocation -n butler-system -l butler.butlerlabs.dev/tenant=<cluster-name>for recent growth allocations in Pending or Allocated phase. - Tenant API unreachable -- The controller skips elastic IPAM when it cannot connect to the tenant. Check tenant control plane health.
Console Not Loading
Symptoms: Blank page or API errors in the browser.
Diagnosis:
kubectl get pods -n butler-system -l app=butler-console
kubectl get pods -n butler-system -l app=butler-server
kubectl logs -n butler-system deploy/butler-server --tail=50
Solutions:
- Server pod not running -- Check butler-server pod events for crash reasons.
- Ingress misconfiguration -- Verify the ingress or LoadBalancer Service for the console has an external IP.
- CORS errors -- If accessing the console from a different hostname than expected, check the butler-server CORS configuration.