Kubernetes - SIG Autoscaling Bi-Weekly Karpenter Working Group for 2024-12-19T21:59:31Z.mp4
The Carpenter working group meeting on December 19th covered updates on community triage sessions for the Upstream repo and AWS provider, which involve reviewing open PRs and issues with the community. These sessions are recorded and will be held regularly, though not during the holidays. A significant discussion point was an issue with the Azure provider related to supporting budgets and asynchronous drift. The problem involves long-running operations in Azure that delay the return of provider IDs, affecting the functionality of asynchronous instance launches. The group discussed potential solutions, including modifying the Azure provider's operation model to handle errors before the registration TTL expires. Another topic was the introduction of a degraded status condition for node pools to improve observability of failures, particularly those caused by external misconfigurations. This aims to help users understand why nodes fail to join clusters by tracking historical success and failure data. The meeting concluded with a brief mention of the status of the node overlay PR, which needs to be revisited.
Key Points:
- Community triage sessions for Upstream repo and AWS provider are ongoing and recorded for future reference.
- Azure provider issue involves long-running operations delaying provider ID returns, affecting asynchronous instance launches.
- Potential solution for Azure involves handling errors before registration TTL expires to improve operation efficiency.
- Introduction of degraded status condition for node pools to enhance observability of failures due to external misconfigurations.
- Node overlay PR needs to be revisited to address unresolved issues.