KubeCon + CloudNativeCon NA 2024 - Salt Lake City

AI & ML (19 videos)
API & ML (1 videos)
Accessibility (1 videos)
Community (2 videos)
Compute (55 videos)
Containers (1 videos)
Contributor Experience (1 videos)
DNS (1 videos)
Data & Analytics (14 videos)
Database (1 videos)
Developer Experience (56 videos)
Edge Computing (1 videos)
GitOps (4 videos)
Governance (1 videos)
Keynote (18 videos)
Kubernetes (1 videos)
Networking (4 videos)
Observability (51 videos)
Open Source (2 videos)
Scaling (57 videos)
Scheduling (15 videos)
Security (57 videos)
Serverless (2 videos)
Storage (1 videos)
Sustainability (3 videos)
Uncategorized (1 videos)

AI & ML

Keynote: Centralizing & Simplifying Enterprise AI Workflows with Envoy AI Gateway - Alexa Griffith

The Envoy AI Gateway is a new open-source project that simplifies and centralizes enterprise AI workflows by providing a unified API, usage limiting, and authentication management across multiple cloud environments. This collaborative effort between Bloomberg and Tetrate leverages the robust and scalable foundation of the Envoy Gateway to optimize and handle AI workloads, bridging the gap between AI infrastructure and innovation.

Harnessing the Power of Envoy Proxy for Building an LLM Gateway - Idit Levine, Solo.io

This talk explores the use of Envoy Proxy as an AI Gateway to address the unique challenges of deploying and managing large language models (LLMs) in enterprise environments. The speakers discuss credential management, prompt governance, cost control, and other key features enabled by Envoy that help organizations harness the power of LLMs while maintaining security, efficiency, and scalability.

Tutorial: Get the Most Out of Your GPUs on Kubernetes with the GPU Operator

The tutorial provides a hands-on introduction to the Nvidia GPU Operator, which simplifies the deployment and management of GPU-accelerated applications on Kubernetes. Participants learn how to set up GPU time-sharing and resource management strategies, as well as deploy and interact with a large language model using the Nvidia GPU resources.

Production AI at Scale: Cloudera’s Journey in Building a Robust Inference Pl... Z. Thanga, P. Ableda

The presentation outlines Cloudera's journey in building a robust and scalable AI inference platform, highlighting the key requirements, the evaluation of open-source options, and the architecture of the resulting solution based on Kubeflow. The platform aims to provide enterprise-grade scalability, security, and flexibility to serve a variety of machine learning models, including traditional models and large language models, while enabling seamless integration with Cloudera's existing data and AI ecosystem.

Architecting the Future of AI: From Cloud-Native Orchestration to Advanced LLMOps - L (Xiaoxuan) Liu

This talk presents a unified approach to efficiently deploy and manage large language models (LLMs) using a combination of Kubernetes, Ray, and VRM. The speaker discusses the challenges of scaling, complexity, and serving efficiency in generative AI workflows, and introduces solutions using Kubernetes orchestration, Ray AI libraries, and the VRM inference engine.

AI and ML: Let’s Talk About the Boring (yet Critical!) Operational Side- Rob Koch & Milad Vafaeifard

This talk explores the critical operational aspects of AI and machine learning, highlighting the importance of reliable infrastructure, including compute resources, data separation, and observability. The speakers discuss how service meshes, such as Linkerd, can simplify these challenges and allow engineers to focus on AI and ML innovations, while also addressing security and hardware management concerns.

Advanced Model Serving Techniques with Ray on Kubernetes - Andrew Sy Kim & Kai-Hsun Chen

The talk covers advanced model serving techniques with Ray on Kubernetes, including online and offline inference, model sharding, and the new Ray Compile Graph feature. The speakers discuss how Ray and Kubernetes integration can unlock new ways to optimize inference performance, utilization, and cost.

Keynote: NVIDIA Case Study: The Many Facets of Building + Delivering AI in the Cloud N... Chris Lamb

This keynote presentation explores the multifaceted collaboration between NVIDIA and the cloud-native community in delivering AI solutions. It highlights the complex challenges of optimizing heterogeneous distributed systems for training and deploying AI applications, and the opportunities for the ecosystem to work together to solve these problems.

Keynote: Paving the Way for AI Through Platform Engineering - Kasper Borg Nissen

This keynote discusses how platform engineering can pave the way for democratizing generative AI within organizations. The speaker shares Lunar's experience in leveraging platform engineering principles to integrate generative AI capabilities into their platform, enabling developers to access and utilize AI in a controlled and scalable manner.

Managing and Distributing AI Models Using OCI Standards and Harbor - Steven Zou & Steven Ren

This talk presents the idea of using the OCI standards and the Harbor project to manage and distribute AI models within the Kubernetes ecosystem. The speaker discusses the motivations, background, and implementation details of this approach, including a live demo showcasing the capabilities of Harbor for managing AI model artifacts.

Democratizing AI Model Training on Kubernetes with Kubeflow TrainJob and... A. Velichkevich, Y. Iwai

The presentation discusses a new project called Kubeflow Training V2 that aims to democratize AI model training on Kubernetes by providing a simple and scalable Python-based interface for data scientists to train and fine-tune models, while consolidating efforts between the Kubernetes and Kubeflow communities to streamline the infrastructure and management of these workloads.

Cloud-Native AI: Wasm in Portable, Secure AI/ML Workloads - Miley Fu, Second State

The talk explores the use of WebAssembly (Wasm) to run AI and machine learning workloads in a portable, secure, and lightweight manner. The presenters demonstrate how Wasm-based runtimes can support multiple AI models, including large language models, on low-end hardware, and discuss the advantages of Wasm over traditional Python-based frameworks for inference tasks.

Building Massive-Scale Generative AI Services with Kubernetes and Open Source - John McBride

This talk discusses the challenges and lessons learned in building a massive-scale generative AI service on Kubernetes, focusing on the use of open-source language models, GPU management, and cost optimization strategies. The speaker shares insights on the architectural choices, such as using the GitHub events fire hose, time series databases, and vector search, to enable the AI-powered features of the 'Star Search' application.

Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines... M. Kaushik, S.K. Merla

The presentation discusses best practices for deploying and managing large language model (LLM) inference, retrieval-augmented generation (RAG), and fine-tuning pipelines on Kubernetes. It covers topics such as model management, scheduling, observability, and autoscaling, as well as emerging technologies like multi-adapter serving and AI agents.

CRI-O: First Class AI Model Teleportation - OCI Volume Mounts in CRI-O and Kubernetes | PLT

The video discusses the CRI-O runtime and its integration with Kubernetes, focusing on a new feature that allows AI models to be transported between containers using OCI images. The speaker highlights the advantages of this approach, including the ability to leverage existing OCI tooling and Kubernetes features, and encourages the audience to try out the feature and provide feedback.

WasmEdge: Cross-Platform, High-Performance, Lightweight, Embeddable Multi-Modal LLM Runtime | PLT

WasmEdge is a WebAssembly runtime that enables cross-platform, high-performance, and lightweight deployment of multi-modal large language models on edge devices and production servers. The talk discusses the limitations of existing solutions like Python, Lama, and C++ frameworks, and presents WasmEdge as a Rust-based, cloud-native alternative that provides a wide selection of AI models, easy embeddability, and upstream support in major Linux distributions.

From Vectors to Pods: Integrating AI with Cloud Native - Panel

The panel discussion explores how the cloud-native ecosystem can learn from AI to improve resource scheduling, automated testing, and cost optimization. Challenges around data ethics, software supply chain, and integrating AI workflows with Kubernetes are also discussed, highlighting the need for collaboration between data scientists and infrastructure teams.

API & ML

Still Don't Do What Charlie Don't Does - Making CRD Changes Safer - Nick Young, Isovalent

The talk discusses strategies for making safe and backwards-compatible changes to Kubernetes Custom Resource Definitions (CRDs). It covers important concepts like versioning, storage versions, and validation rules, and provides practical advice to avoid common pitfalls that can break existing CRD users.

Accessibility

TAG Contributor Strategy: Beyond the Checkbox: Humanizing Accessibility | TAG Lightning Talk

The video discusses the importance of accessibility and inclusivity in technology and communities, highlighting the speaker's personal journey and the empathetic approach needed to create truly inclusive environments. The speaker challenges the audience to foster a culture of inclusion and respect, where everyone feels valued and belongs, beyond just meeting technical standards.

Community

Meet the CNCF Code of Conduct Committee - Panel

The CNCF Code of Conduct Committee, comprising elected community members and staff, is responsible for addressing incidents and promoting inclusivity within the CNCF community. The committee's processes, including investigation, mediation, and determination of outcomes, aim to restore and support the community while upholding the principles of the CNCF Code of Conduct.

Project Lightning Talk: Closing - Jorge Castro, CNCF

In this lightning talk, Jorge Castro from the CNCF (Cloud Native Computing Foundation) provides a concise wrap-up of the conference session, highlighting the key takeaways and inviting the audience to explore the Project Pavilion at the upcoming CubeCon event. The talk emphasizes the interactive nature of the pavilion, where attendees can engage with various projects and attend demonstrations on the demo stage.

Compute

Day 3 - KubeCon + CloudNativeCon North America Highlights

The video highlights the key aspects of the KubeCon + CloudNativeCon North America conference, emphasizing the vibrant networking opportunities, the open and welcoming community, and the potential for sparking innovative ideas. The video also mentions the upcoming KubeCon + CloudNativeCon event in London next year, suggesting a sense of excitement and anticipation for the future of the conference.

Elastic Data Streaming: Autoscaling Apache Kafka - Jakub Scholz, Red Hat

The talk discusses the challenges and solutions for autoscaling Apache Kafka on Kubernetes, including the use of tiered storage and rebalancing techniques to efficiently scale the cluster up and down. The speaker highlights the importance of proper configuration and monitoring to ensure smooth autoscaling without disrupting the service.

All-Your-GPUs-Are-Belong-to-Us: An Inside Look at NVIDIA's Self-Healin... R. Hallisey & P. Prokop PL

This talk provides an inside look at how NVIDIA maintains the GeForce NOW infrastructure, which involves managing a large fleet of GPUs and Kubernetes clusters. The presenters discuss their approach to maintaining this infrastructure, including the development of a Notify Maintenance API that enables coordinated and automated maintenance workflows across their data centers.

Distributed Cache Empowers AI/ML Workloads on Kubernetes Cluster - Yuichiro Ueno & Toru Komatsu

The presenters discuss a distributed cache system, called Simple Cache Service (SCS), that they developed to address the challenges of AI and ML workloads on a Kubernetes cluster. They describe how SCS leverages Kubernetes features, consistent hashing, and other optimization techniques to achieve high performance and scalability for their on-premises AI and ML use cases.

Elevating Kubeflow Spark Operator's Future: Best Practices and Enhancements- Vara Bonthu, Chaoran Yu

This talk covers the evolution of the Kubeflow Spark Operator, including its migration from Google to the Kubeflow community, the addition of new features and enhancements, and best practices for running Spark on Kubernetes. The speakers discuss the internal components of the Spark Operator, customization options like pod templates and Kubernetes scheduling, and provide insights on security, multi-tenancy, and observability considerations.

How Google Built a New Cloud on Top of Kubernetes - Jie Yu & Prashanth Venugopal, Google

The presentation discusses how Google built a new private cloud platform, called Google Distributed Cloud Air Gap (GDC Air Gap), on top of open-source technologies like Kubernetes. The key design principles include using multi-cluster and namespace-based architecture, embracing Kubernetes resource models, and treating containers and VMs equally in the networking layer.

Navigating the Cgroup Transition: Bridging the Gap Between Kubernetes and User Expec... S. Kunkerkar

This talk provides a comprehensive overview of the transition from Cgroup V1 to V2 in the Kubernetes ecosystem. It discusses the benefits of Cgroup V2, the migration process, the impact on the ecosystem, and the future outlook, highlighting the importance of this transition for managing advanced workloads in Kubernetes.

Understanding Kubernetes Networking in 30 Minutes - Ricardo Katz & James Strong

This talk provides a comprehensive introduction to Kubernetes networking, covering the underlying Linux networking concepts, the role of the pause container, container networking, and the Kubernetes networking abstractions such as Services and DNS. The presenters demonstrate these concepts through live demos, highlighting the key components like CNI, kube-proxy, and network policies that make up the Kubernetes networking ecosystem.

What's New in SIG-Windows - Mark Rossetti, Microsoft & Aravindh Puthiyaparambil, Softdrive

The presentation covers the history and recent developments in the SIG-Windows community, including new features like memory pressure eviction, CPU and memory affinity, and graceful node shutdown support for Windows nodes. The speakers also highlight key contributors and encourage the audience to get involved in the SIG-Windows efforts.

Bare Metal Kubernetes with KOps: Gathering Community Wisdom - Justin Santa Barbara & Ciprian Hacman

The presentation discusses the challenges and solutions for running Kubernetes on bare metal, covering topics such as etcd management, service discovery, networking, and storage. The speakers propose forming a working group within the Sig Cluster Lifecycle community to address these issues in a vendor-neutral manner, inviting audience participation and feedback.

Intro & Deep Dive - Kubernetes Infrastructure - Arnaud Meukam, Independent & Mahamed Ali, Cisco

The talk provides an overview of the Kubernetes infrastructure, including the critical services it manages, such as the project's image registry, the custom-built CI system, and the infrastructure for serving binary assets. It also discusses the recent migration of the CI system to community-owned accounts, the plans for migrating the image registry, and the ongoing efforts to improve the observability and cost-efficiency of the infrastructure.

Kubernetes WG Device Management - Advancing K8s Support for GPUs - J. Belamaric, P. Ohly, K. Klues

The video discusses the new Kubernetes Working Group for Device Management, which aims to enable simple and efficient configuration, sharing, and allocation of accelerators and other specialized devices. The main focus is on Dynamic Resource Allocation (Dr), a new API that provides a richer way to describe and request devices, allowing for more flexible and efficient scheduling of GPU, FPGA, and network resources.

SIG-Node: Intro and Deep Dive - Sergey Kanzhelev & Dawn Chen, Google; Mrunal Patel, Red Hat

The video discusses the latest developments and future plans for SIG-Node, a special interest group within the Kubernetes community. It covers major feature enhancements such as device resource management, in-place pod resizing, and pod-level resource specification, as well as smaller improvements to enhance the overall experience for Kubernetes administrators and users.

Building Resilience for Large-Scale AI Training: GPU Management, Fa... G. Ashokavardhanan, A. Eldeib

The talk discusses building resilience for large-scale AI training, focusing on GPU management, failure detection, and mitigation strategies. The presenters cover application-level checkpointing, infrastructure-level health checks, and emerging technologies like CUDA checkpoint to enable seamless recovery from GPU failures and maintain high cluster utilization.

Platform Performance Optimization for AI - a Resource Management Perspective- A. Kervinen, D. Narang

This presentation explores platform performance optimization for AI workloads, particularly focused on optimizing model inference from a resource management perspective. The speakers demonstrate techniques for instrumenting Python libraries, collecting system metrics, and leveraging resource management policies to maximize hardware utilization and balance latency, throughput, and resource usage for AI inference tasks.

Container Image Workflows at Scale with Buildpacks - Jesse Brown & Terence Lee, Heroku

This talk explores how Buildpacks can enable efficient container image workflows at scale, focusing on their ability to provide standardized, reproducible builds and facilitate software supply chain security. The speakers discuss Buildpacks' benefits in terms of build efficiency, OS updates, and software bill of materials generation, as well as the broader Buildpacks ecosystem and how to leverage it in Kubernetes using the kpack operator.

A Tale of 2 Drivers: GPU Configuration on the Fly Using DRA- Alay Patel & Varun Ramachandra Sekar US

This talk presents a novel approach to GPU configuration on the fly using Device Resource Allocation (DRA) in the Kubernetes ecosystem. The speakers discuss the limitations of traditional device plugins and how DRA solves these issues, enabling more efficient and flexible GPU management in their cloud gaming platform, GeForce NOW.

Deep Dive Into Generic Control Planes and Kcp - Stefan Schimanski & Mangirdas Judeikis

This talk provides a deep dive into the concept of generic control planes and the Kubernetes Control Plane (KCP) project. It covers the foundational work of generic control planes, the architecture and components of KCP, and how it enables the creation of customizable and composable Kubernetes-like API servers.

Making Kubernetes Simpler for Accelerated Workloads - Panel

The panel discusses the challenges and strategies for simplifying Kubernetes for accelerated workloads, including managing specialized hardware, optimizing resource utilization, and ensuring secure multi-tenancy. The panelists share their experiences and insights on adapting Kubernetes to support AI and machine learning workloads in enterprise environments.

Architecting Tomorrow: The Heterogeneous Compute Resources for New Types of Workloads - A. Kanevskiy

This talk explores the evolving hardware landscape, including the increasing heterogeneity of compute resources, and how it impacts the design and optimization of modern workloads. The speaker delves into the nuances of CPU architectures, memory hierarchies, and emerging interconnect technologies, highlighting the need for more granular and adaptive approaches in the Kubernetes ecosystem to effectively leverage these hardware advancements.

Building Reliable Cross-Cloud Kubernetes Clusters on Spot Instances with Drafter and.. F. Pojtinger

The talk presents a solution for building reliable cross-cloud Kubernetes clusters on spot instances, leveraging technologies like PVM, Silo, Drafter, and Conduit to enable live migration of virtual machines between cloud providers with zero downtime. The speaker demonstrates how this approach can be used to run a Kubernetes cluster across different cloud environments, allowing for seamless migration and scaling of workloads based on cost and availability.

Operationalizing High-Performance GPU Clusters in Kubernetes: Lessons Learned fr... W. Gleich, W. Wu

This talk discusses the operational challenges of running large-scale GPU clusters for training large language models in Kubernetes. The presenters share lessons learned from their experience at Databricks, including strategies for passive monitoring, RDMA network management, and active health checks to mitigate node failures and improve training efficiency.

Keynote: Engineering the Future of Generative AI Platforms on Kubernetes - Aparna Sinha

This keynote discusses the engineering challenges and opportunities in building a generative AI platform on Kubernetes. The speaker highlights the need for a modular and extensible platform that can cater to the diverse requirements of data analysts, data scientists, software engineers, and operations teams, while also addressing the unique constraints and risks associated with generative AI models.

Tackling GPU Shortages and High Costs by Harnessing Hybrid Kubernetes Clusters - X. Dong, A. Pucher

This talk presents a novel approach to building scalable and cost-effective AI infrastructure using hybrid Kubernetes clusters. By leveraging wire-guard overlay networking and harnessing GPU resources from various cloud providers, the authors demonstrate how to overcome GPU shortages and high costs, while maintaining the benefits of a unified Kubernetes platform.

What if Kubernetes Was a Compiler Target? - David Morrison & Tim Goodwin

This talk explores the idea of treating Kubernetes as a compiler target, allowing developers to write their applications as a single program that can then be compiled into a distributed system. The presenters demonstrate a prototype implementation using Go that translates Go routines and channels into Kubernetes pods and network requests, providing a higher-level abstraction for building distributed applications.

Which GPU Sharing Strategy Is Right for You? A Comprehensive Benchmark Study Us... K. Klues, Y. Chen

This talk presents a comprehensive benchmark study on different GPU sharing strategies, including time slicing, MPS (Multi-Process Service), and MIG (Multi-Instance GPU), to help determine the optimal strategy for various workloads. The study evaluates the performance, isolation, and fault tolerance of these strategies, providing insights on when to use each approach based on the specific requirements of the application and deployment environment.

Running WebAssembly (Wasm) Workloads Side-by-Side with Container Workloads - Jiaxiao Zhou, Microsoft

This talk explores the potential of running WebAssembly (Wasm) workloads side-by-side with container workloads in Kubernetes. The speaker discusses the advantages of Wasm, such as its small footprint and fast startup times, and demonstrates how Wasm can be integrated into the Kubernetes ecosystem using standardized OCI artifact formats and container runtime shims.

Load-Aware GPU Fractioning for LLM Inference on Kubernetes - Olivier Tardieu & Yue Zhu, IBM

This talk presents a method for predicting the performance of large language models (LLMs) on GPUs, and demonstrates how this can be used to automatically scale and size LLM deployments on Kubernetes. The approach combines analytical modeling of LLM performance with GPU fractioning techniques to efficiently utilize GPU resources.

Unlocking Potential of Large Models in Production - Yuan Tang, Red Hat & Adam Tetelman, NVIDIA

The talk discusses the challenges and solutions for deploying and managing large language models (LLMs) in production environments. It provides an overview of the key components and architecture required for a production-ready LLM system, and introduces the open-source project Kserve as a platform for addressing these challenges.

Modernization of Intuit Payroll Enterprise Using Event Driven Architect... H. Maarimuthu, V. Maurice

This presentation discusses the modernization of Intuit's online payroll system using an event-driven architecture, specifically the Numa flow platform. It highlights the benefits of Numa flow, including improved scalability, resilience, and cost optimization, as well as the integration of the system with various message sources and the ability to modularize the pipeline based on application needs.

Platform Engineering in Financial Institutions: The Practitioner Panel - Panel

The panel discussion explores how platform engineering is implemented in financial institutions, addressing challenges such as regulatory compliance, developer happiness, and measuring success. The panelists share their experiences and strategies for navigating the balance between innovation, standardization, and meeting regulatory requirements within their organizations.

Goodbye etcd! Running Kubernetes on Distributed PostgreSQL - Denis Magda, Yugabyte

This talk demonstrates how to run Kubernetes on a distributed PostgreSQL database, using the k8s-compatible database project called YugabyteDB. The speaker discusses the limitations of etcd, the default metadata store for Kubernetes, and how the k8s-compatible database layer called Kind can be used to replace etcd with a scalable, highly available PostgreSQL-compatible database.

Love thy (Noisy) Neighbor: Strategies for Mitigating Performance Interference in Cloud-N... J. Perry

This talk provides an overview of the problem of memory Noisy Neighbor in cloud-native applications, its impact on performance and efficiency, and strategies for mitigating it using techniques like vertical pod autoscaling and resource control. The speaker highlights the need for practical systems that can generalize across the Kubernetes ecosystem and proposes the development of a memory collector tool to improve observability and enable the community to tackle this challenge.

What Containerd 2.0 Means for You - Samuel Karp, Google

This presentation provides an overview of the new features and changes in Containerd 2.0, including the stabilization of experimental features, new defaults, and support for node-level customizations. The speaker also discusses the upgrade process, deprecations, and the upcoming roadmap for Containerd 2.1.

Faster Containerized LLM Serving via Knowledge Sharing - Junchen Jiang & Zhou Sun

This video presents a system called LM Cache that enables faster and more efficient serving of large language models (LLMs) for long-context applications. The key insight is to use a knowledge-sharing abstraction called KV Cache, which allows the model's internal understanding of the long context to be stored and reused, leading to significant speed-ups and cost savings compared to traditional approaches.

Discover CNCF TAG Runtime: From AI, WASM, OS, Edge to Workloads in the Heart of Salt Lake City

The presentation provides an overview of the CNCF TAG Runtime, highlighting its role in scaling contributions from the community and advising the CNCF Technical Oversight Committee (TOC) on runtime-related projects. It covers the various working groups within TAG Runtime, such as Cloud Native AI, WebAssembly, and Special Purpose OS, and encourages audience participation in the group's activities.

Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar

This talk provides an introduction to distributed machine learning workloads using the Ray open-source framework on Kubernetes. It covers the key concepts of Ray, such as tasks and actors, as well as the benefits and challenges of running distributed computing on Kubernetes, and demonstrates how to deploy Ray-based applications on a Kubernetes cluster.

KubeVirt: Enhancements and the Road Ahead - Vladik Romanovsky & David Vossel, Red Hat

The video discusses the recent enhancements and future roadmap of the KubeVirt project, which allows running virtual machines alongside Kubernetes pods. It highlights the project's focus on improving code quality, community engagement, feature development, and integration with the broader Kubernetes ecosystem.

CRI-O Features for Fun and Profit - Peter Hunt & Sohan Kunkerkar, Red Hat

The talk covers the latest features and improvements in the CRI-O container runtime, including support for Sigstore image verification, a more efficient solution for handling corrupted container images, and the decision to make crun the default runtime. Additionally, the speakers discuss the new image volume source feature that allows mounting OCI images as volumes in Kubernetes pods.

Vitess: Introduction, New Features & Running in Production- D. Sigireddi, D. Perkins, S. Vijayakumar

This video provides an introduction to Vitess, a database sharding solution, and discusses its new features and production deployment. The speakers cover Vitess' key benefits, such as improved scalability, high availability, and ease of operations, as well as its architecture and upcoming enhancements.

WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes - E.A. Gutierrez, Y. Tang

The video presentation discusses the work of the Kubernetes Working Group Serving, which aims to enhance workload controllers, investigate orchestration for scalability, and optimize resource sharing for AI/ML inference workloads on Kubernetes. The working group is divided into several work streams, including orchestration, multi-host, auto-scaling, and disaster recovery, and is actively collaborating with other ecosystem projects like KServe to address the challenges of running large language models in production.

Testing Kubernetes Without Kubernetes: A Networking Deep Dive - John Howard, Solo.io

The speaker discusses a novel approach to testing Kubernetes-based applications without actually deploying on a Kubernetes cluster. By leveraging Linux namespaces and custom networking setups, the speaker demonstrates how to create a fast, debuggable, and setup-free testing environment that closely mimics the Kubernetes environment.

Exploring KubeEdge: Architecture, Use Cases, and Project Graduation Updates - Y. Ding, H. Zhang

This talk provides an overview of the KubeEdge project, including its architecture, use cases, and recent updates. The presentation highlights KubeEdge's journey to CNCF graduation, its support for edge computing scenarios, and its advancements in areas like device management, OTA upgrades, and edge AI capabilities.

The Future of DBaaS on Kubernetes - M. Logan, S. Pronin, D. Sigireddi, G. Bartolini

This video discusses the increasing adoption of databases on Kubernetes, highlighting the maturity of the ecosystem, the benefits of standardization and cost savings, and the emergence of database-as-a-service (DBaaS) solutions. The panelists share customer stories, best practices, and potential challenges, as well as their insights on the future direction of databases on Kubernetes.

Kubernetes (SIG Storage): What's Coming in Kubernetes Storage | Project Lightning Talk

The video covers three exciting projects from the Kubernetes SIG Storage team: CSI for block and file storage, COI for object storage, and change block tracking for efficient volume backups. These initiatives aim to enhance the storage capabilities and management within the Kubernetes ecosystem.

Buildpacks: Container Builds at Scale with Buildpacks | Project Lightning Talk

This talk introduces Buildpacks, a technology that allows turning source code into container images without the need for a Dockerfile. The speakers discuss how Buildpacks are used by companies like Heroku, Salesforce, Google, and VMware to manage container images at scale, and demonstrate how to get started with Buildpacks using the 'pack' CLI tool.

wasmCloud: Declarative WebAssembly Orchestration for Cloud Native Applications | PLT

The video presents wasmCloud, a declarative WebAssembly orchestration platform for cloud-native applications. It highlights wasmCloud's key features, including its adoption by the CNCF, community growth, support for standards, and the ability to deploy applications across various cloud and edge environments.

Metal3: Metal3 Magics! What's New and Exciting? | Project Lightning Talk

Metal3 is a project that brings Kubernetes to bare metal, enabling seamless provisioning and lifecycle management of bare metal servers. The presentation highlights Metal3's recent developments, including encrypted provisioning, standalone operator for easier configuration, UI support, and the ability to handle high workloads of up to 1,000 bare metal nodes.

Kuma: What’s New in Kuma? | Project Lightning Talk

Kuma, an open-source service mesh, has recently released version 2.9.0 with numerous improvements, including namespaced policies, MeshService, MeshPassthrough, and MeshTLS. The presentation highlights these new features and how they enhance Kuma's capabilities in multi-cloud, multi-cluster, and multi-mesh environments.

Linkerd: Adding Federated Services to Linkerd - Design Considerations and Debates | PLT

This talk discusses the design considerations and debates involved in adding federated services to the Linkerd service mesh. The speaker highlights the key features of Linkerd, such as its simplicity, security, and lightweight nature, and how these principles guided the development of the federated services functionality.

CNCF Runtime TAG: CNCF Runtime TAG and the Cloud Native Runtime Landspace: AI, WASM, OS, Edge, Wo...

The CNCF Runtime TAG is responsible for scaling contributions and providing domain expertise across the diverse CNCF project landscape, with a focus on runtimes, workloads, and emerging technologies like AI, WebAssembly, and edge computing. The TAG holds regular public meetings, collaborates with the CNCF TOC, and encourages community involvement through working groups, white papers, and project engagement.

Lightning Talk: Running Kind Clusters with GPU Support Using Nvkind - Evan Lezar, NVIDIA

This talk introduces nvkind, a tool that extends the popular kind (Kubernetes in Docker) tool to enable running Kubernetes clusters with GPU support. The speaker discusses the motivation for nvkind, its key features, and future improvements, highlighting the challenges of managing GPU access in containerized environments.

Accelerating Adoption: What’s New in Envoy Gateway and Why It Matters | Project Lightning Talk

The presentation discusses the latest developments in the Envoy Gateway project, including new features such as active-passive failover, request authorization, and standalone mode. The adoption of Envoy Gateway is growing, with increasing Helm chart downloads, GitHub activity, and integration with other CNCF projects, showcasing its importance in the cloud-native ecosystem.

Conquering Configuration Constraints: Real-World Patterns for Distributing Data... Daniel Hrabovcak

The presentation discusses the challenges and best practices in managing configurations for applications running on Kubernetes. It explores the use of ConfigMaps, Secrets, and Custom Resources, highlighting the pitfalls and solutions for distributing data at scale while overcoming configuration constraints.

ARM-Wrestling: Overcoming CPU Migration Challenges to Reduce Costs- Laurent Bernaille, Eric Mountain

This talk discusses how Datadog migrated their infrastructure to ARM-based instances to reduce costs, the challenges they faced, and their strategies for scaling the migration across their organization. They highlight the importance of investing in CI/CD early, the need for comprehensive benchmarking, and the ongoing evolution of the ARM ecosystem.

Containers

What’s Going on in the Containerd Neighborhood? - P. Estes, S. Karp, A. Suda, M. Brown, K. Ashok

This presentation provides an overview of the Containerd project, its evolution, and the broader ecosystem it has enabled. The panel discussion covers the motivations behind Containerd, its value proposition, the extension points it provides, and the ongoing development and community involvement around the project.

Contributor Experience

Contributing to Kubernetes in Its Second Decade - SIG ContribEx Style! - Panel

This panel discusses the growth and organization of the Kubernetes community, highlighting the various special interest groups (SIGs) and contribution opportunities. It also introduces the new contributor orientation program and the social media channels managed by the SIG ContribEx team to engage with the community.

DNS

DNS Deep Dive in Kubernetes with CoreDNS - Jingming Guo, Airbnb

This talk provides a deep dive into the DNS functionality in Kubernetes, focusing on the use of CoreDNS as the cluster DNS server. The speaker discusses the DNS resolution process, CoreDNS configurations, and various use cases and solutions implemented at Airbnb, including multi-cluster DNS scenarios and challenges.

Data & Analytics

Measuring All the Costs with OpenCost Plugins - Alex Meijer, Stackwatch

The presentation discusses the open-source project OpenCost and its recent developments, particularly the introduction of OpenCost Plugins. The speaker highlights the vision of OpenCost becoming the open standard for visualizing all cloud-native spending, enabled by the community-driven Focus specification for consumption-based billing data.

Cloud Native Storage: The CNCF Storage TAG Projects, Technology & Landscape- A. Chircop, R. Spazzoli

This talk provides an overview of the CNCF Storage Technical Advisory Group (TAG), its role in the CNCF ecosystem, and the key cloud-native storage projects and technologies it covers. It also delves into the group's work on storage landscape, performance, and disaster recovery white papers, highlighting the importance of understanding storage attributes and patterns for running different types of data workloads on Kubernetes.

Kubernetes Data Protection WG Deep Dive - Dave Smith-Uchida, Veeam

The video discusses the Kubernetes Data Protection Working Group, which aims to address the limitations in day-two operations for stateful workloads in Kubernetes, such as data protection. The group is working on projects like consistent group snapshots, backup repositories, and change block tracking to enhance data protection capabilities in the Kubernetes ecosystem.

Effective Data Platforming with Open Source Tools For Faster Insights - Priyanka J. Naik

The presentation outlines an effective data platforming approach using open-source tools from the CNCF ecosystem, focusing on modularity, scalability, and stakeholder requirements. The speaker discusses the journey of data engineering, architectural aims, tool selection, and the various stages of platform development, emphasizing the importance of understanding data, leveraging stream processing, and ensuring data quality throughout the pipeline.

AIStore as a Fast Tier Storage Solution: Enhancing Petascale Deep Learning... A. Gaikwad & A. Wilson

This presentation introduces AIStore, a fast-tier storage solution that enhances petascale deep learning workloads across remote cloud backends. The talk covers the current challenges with loading data from the cloud, the key features and architecture of AIStore, and benchmarks demonstrating its scalability and performance advantages over direct cloud storage access.

Building Resilience: Effective Backup and Disaster Recovery for Vec... P. Navarathna, S. Subramanian

This talk discusses the importance of building resilience in AI applications through effective backup and disaster recovery strategies for Vector databases deployed on Kubernetes. The presenters showcase a practical demonstration using the open-source project Canister to protect and restore a book recommendation chatbot's Vector database, highlighting the crucial role of data protection in ensuring business continuity and efficient AI-powered applications.

Object Storage Is All You Need - Justin Cormack, Docker

The talk explores the exciting developments in object storage, a foundational technology in cloud-native infrastructure. It highlights how object storage is enabling the creation of innovative applications, databases, and storage solutions that leverage its simplicity, scalability, and reliability.

Medical Research Computing Infrastructure on Hybrid Kubernetes - Jennings Zhang

This talk presents the development of a medical research computing infrastructure on a hybrid Kubernetes platform at Boston Children's Hospital. The infrastructure, called CHRIS, enables clinicians and researchers to easily leverage cloud-based compute resources to run advanced image analysis pipelines, improving patient care through better fetal brain MRI reconstruction and analysis.

Rook: Intro and Deep Dive with Ceph Storage - T. Nielsen, A. Clewett, B. Gardner, S. Rai

Rook is an open-source operator that brings Ceph storage into Kubernetes, allowing users to manage storage as a Kubernetes application. The presentation covers Rook's features, deployment considerations, maintenance best practices, community involvement, and upcoming developments, as well as a deep dive into using Rook and Ceph for application-level disaster recovery.

Strimzi: Data Streaming on Kubernetes with Apache Kafka - Jakub Scholz & Yaodong Yang

This talk provides an overview of the Strimzi project, which focuses on running Apache Kafka on Kubernetes. The presentation covers the latest features and updates, including the migration from Zookeeper to Kafka Raft, support for tiered storage, and plans for future enhancements like improved certificate management and integration with Gateway API.

Effective Data Platforming with Open Source Tools For Faster Insights - Priyanka J. Naik

The presentation covers the evolution of data engineering, the need for efficient and robust data platform architectures, and the use of open-source tools from the CNCF ecosystem to build a scalable, modular, and secure data platform. The speaker discusses the key architectural principles, stakeholder requirements, and the step-by-step approach to implementing a data platform using tools like Kafka, Flink, and Strimzi.

Strimzi: Strimzi and the Future of Apache Kafka on Kubernetes | Project Lightning Talk

The video discusses Strimzi, an incubating project that focuses on running Apache Kafka on Kubernetes using the operator pattern. It highlights the project's efforts to manage the entire application lifecycle, including installation, upgrades, reconfiguration, monitoring, and security, while integrating with other cloud-native projects.

TAG Storage: CNCF Storage TAG and the Cloud Native Storage Landscape | Project Lightning Talk

The video discusses the CNCF Storage Technical Advisory Group (TAG), which helps scale the CNCF Technical Oversight Committee (TOC) by providing subject matter expertise, engaging with the user community, and producing educational materials on the cloud-native storage landscape. The TAG Storage group is open to all interested parties, including project maintainers, builders, and users, and holds regular open calls to discuss the latest developments and challenges in the cloud-native storage ecosystem.

Object, Block, or File Storage? Choosing the Right Cloud Storage to Integr... M. Becker, T. McDonald

This presentation explores the considerations and tradeoffs involved in choosing the right cloud storage solution for your applications, whether it be object, block, or file storage. The speakers provide guidance on leveraging the container storage interface (CSI) and community resources to make informed decisions that optimize performance, cost, and reliability for your persistent workloads.

Database

The Hard Truth About GitOps and Database Rollbacks - Rotem Tamir, Ariga

This talk discusses the challenges of implementing database rollbacks in a GitOps workflow, and presents the Atlas operator as a solution that leverages Kubernetes custom resources and controllers to manage schema changes and provide a robust rollback mechanism. The key takeaways are the importance of rollbacks for maintaining low mean time to recovery (MTTR), the limitations of traditional migration tools, and how the operator pattern can address the incompatibility between stateful resources and the GitOps philosophy.

Developer Experience

Keynote: Kubernetes Family Feud: A Decade of Architecture and Evolution - Hosted by Tim Hockin

In this keynote, Tim Hockin hosts a Kubernetes Family Feud game, where two teams of Kubernetes experts compete to guess the community's top answers to various Kubernetes-related questions. The game showcases the evolution and architecture of Kubernetes over the past decade, highlighting the challenges and advancements in the ecosystem.

Keynote: Honoring the Past to Forge Ahead - Gail Frederick, CTO and SVP, Heroku

The keynote speech highlights the evolution of cloud computing and the need to update the 12-Factor App Manifesto, a set of best practices for cloud-native application development. The speaker invites the audience to participate in the open-sourcing of the 12-Factor App project, aiming to refine and modernize these principles to address the changing technology landscape and the rise of new development practices.

Keynote: Above the Clouds: Mountainous Achievements with End Users - Taylor Dolezal

The video discusses the challenges of navigating the complex cloud-native landscape and the efforts of the CNCF End User Technical Advisory Board to address them. The board is working on publishing reference architectures, establishing project health metrics, and facilitating collaboration between end-users and project maintainers to strengthen the connection between the end-user community and the CNCF ecosystem.

Database DevOps: CD for Stateful Applications - Stephen Atwell & Christopher Crow

This talk demonstrates how to leverage continuous delivery for stateful applications, such as databases, by automating database schema changes, data migration, and application deployment within a Kubernetes-based CI/CD pipeline. The presenters showcase techniques for ensuring zero downtime, enabling backwards compatibility, and providing robust rollback capabilities to manage the complexity of database-driven applications.

Keynote: Cloud Native’s Next Decade: Stable, Secure, and...Ready for Disruption? - Nikhita Raghunath

The talk explores the next decade of cloud-native technologies, highlighting the need to address emerging challenges in security, AI, and quantum computing. It emphasizes the importance of proactive preparation and collaboration within the industry to ensure the stability, security, and continued disruption of cloud-native ecosystems.

Engineering a Kubernetes Operator: Lessons Learned from Versions 1 to 5 - Andrew L'Ecuyer

This talk explores the lessons learned by the author in building five versions of a Kubernetes operator for PostgreSQL, focusing on key areas such as high availability, upgrades, and disaster recovery. The presentation highlights how the operator's architecture evolved to leverage existing solutions within the PostgreSQL and Kubernetes ecosystems, ultimately enabling robust management of PostgreSQL deployments at scale.

From Chaos to Harmony, Transforming ML Engineering: A Kubernetes Adoption Journey - P.N. Kejser

The presentation outlines the journey of JP/politiken Media Group in adopting Kubernetes to build a robust machine learning engineering platform. By leveraging Kubernetes, the team was able to empower their ML specialists, streamline their development process, and achieve greater flexibility, scalability, and security in their infrastructure.

Evolving Reddit’s Infrastructure via Principled Platform Abstractions - Karan Thukral & Harvey Xia

The talk discusses how Reddit's infrastructure team evolved their platform by leveraging Kubernetes-based abstractions and automation to address the challenges of rapid growth and global expansion. The team developed a principled approach to platform design, including custom resource definitions, multi-cluster management, and an open-source SDK called Achilles, which enabled their infrastructure engineers to build reliable and scalable automation.

Creating Paved Paths for Platform Engineers - Panel

This panel discusses the challenges platform engineering teams face in navigating the expansive CNCF landscape and building effective reference architectures. The panelists emphasize the importance of focusing on business value, treating platform development as a product, and enabling a culture of contribution and self-service rather than mandating usage.

End User TAB Town Hall - Moderated by Taylor Dolezal

This video discusses the End User TAB (Technical Advisory Board) of the Cloud Native Computing Foundation (CNCF) and its initiatives to better engage with end users. The TAB aims to facilitate knowledge sharing, provide project feedback, and assess project health to support the adoption and success of cloud-native technologies by end-user organizations.

Scratching the Surface: Simulating K8s in MIT Scratch - Mitch Connors, Microsoft & Jude Connors

This talk presents a creative approach to teaching the fundamental concepts of Kubernetes using the MIT Scratch programming language and its derivative, Berkeley Snap. The presenters demonstrate how to simulate the scheduler, replicator, and deployment controller of Kubernetes in a visual and interactive way, providing a new perspective for both newcomers and experienced community members.

You're Overpaying for CI - Kyle Penfound, Dagger

The talk discusses how developers are overpaying for continuous integration (CI) services and proposes a solution to run CI pipelines directly on developer machines, reducing costs and improving developer productivity. The speaker presents a three-phase approach to transition from the current CI setup to a more efficient and portable pipeline that can be run locally and in the CI environment with the same guarantees.

Bring the Joy Back to Deployments! - Murriel McCabe, Google Cloud & Elizabeth Ponce, Airbnb

This talk provides a comprehensive overview of the various tools and approaches for streamlining the deployment process of containerized applications to Kubernetes. The presenters cover fundamental concepts of DevOps, CI/CD, and popular open-source tools like Scaffold, Customize, Helm, Jenkins, Tekton, Argo, and Jenkins X, highlighting their unique features and considerations for choosing the right tool for your deployment needs.

Kubernetes SIG Storage: Intro & Deep Dive - Michelle Au, Xing Yang & Hemant Kumar

The video provides an in-depth overview of the Kubernetes SIG Storage, covering its role, recent feature releases, and upcoming developments. It highlights the group's efforts to enhance storage capabilities, improve performance, and address long-standing issues, with a focus on features such as volume reconstruction, SE Linux relabeling, and volume expansion.

SIG-Apps: Powering Applications with High-Volume Data and APIs - Maciej Szulik & Janet Kuo

This talk provides a comprehensive overview of the history and evolution of the Kubernetes SIG-Apps group, which has been instrumental in developing and maintaining the key application-related APIs and controllers within the Kubernetes ecosystem over the past decade. The speakers highlight the group's significant contributions, including the development of critical workload controllers, the introduction of custom resource definitions, and the ongoing efforts to improve performance, scalability, and usability of Kubernetes for a wide range of application workloads.

The Path to Helm 4 - Matt Farina, SUSE & Andrew Block, Red Hat

The presenters discussed the evolution of Helm, the popular Kubernetes package manager, from its beginnings in 2015 to the upcoming release of Helm 4. They outlined the key drivers for Helm 4, including technical debt, API versioning, and the need to leverage new ecosystem capabilities, while ensuring backwards compatibility and a smooth migration path for users.

Navigate Cross SIG Collaborations with SIG Docs - Panel

The panel discusses the structure of the Kubernetes community, the role of SIG Docs in maintaining and updating the Kubernetes documentation, and the collaboration between SIG Docs and other SIGs, such as SIG Release and SIG Security, to ensure the documentation is comprehensive and up-to-date. The panel also provides information on how to get involved in contributing to the Kubernetes documentation.

Applications, Platforms, and Infrastructure Oh My! What Is the TAG App Delivery Doing to Support You

The video discusses the various working groups under the TAG App Delivery, including Application Development, Platforms, and Infrastructure Lifecycle, and how they collaborate to address the challenges faced by developers, platform engineers, and infrastructure teams in the cloud-native ecosystem. The panelists highlight the importance of cross-group collaboration, the development of standards and best practices, and the potential for new working groups to emerge as needed within the TAG App Delivery.

Artifact Hub: Discover, Analyze, and Share Cloud Native Artifacts - Matt Farina, SUSE

Artifact Hub is a centralized platform that allows users to discover, analyze, and share a wide range of cloud-native artifacts, including Helm charts, container images, policies, and more. The presentation covers the key features of Artifact Hub, such as search and discovery, artifact analysis, and integration with other tools, highlighting how it addresses the challenges of finding and evaluating distributed cloud-native resources.

Guiding Kubernetes: The Steering Committee's Role in Project Evolution - Maciej Szulik

The presentation provides an overview of the Kubernetes Steering Committee's role in guiding the project's evolution. It highlights the diverse community, technical challenges, non-technical challenges, and the importance of sustainability in ensuring the project's long-term success.

XRegistry - Looking Beyond CloudEvents - Calum Murray, University of Toronto

The presentation discusses the challenges of event-driven systems and how Cloud Events and XRegistry address them. It explains the key concepts of Cloud Events, Cloud Events SQL, and the extensible metadata management capabilities of XRegistry.

How to Get Started Contributing in the CNCF - Destiny O'Connor & Riaan Kleinhans

This talk provides a comprehensive overview of how to get started contributing to open-source projects within the Cloud Native Computing Foundation (CNCF) ecosystem. The presenters, Destiny O'Connor and Riaan Kleinhans, discuss the benefits of open-source contribution, the various CNCF community groups, and practical steps for new contributors to find and engage with projects, including GitHub, project boards, and pull requests.

What's New with Kubectl and Kustomize … and How You Can Help! - Eddie Zaneski & Arda Guclu

This talk provides an overview of recent developments in the Kubectl and Kustomize tools, including improvements to API discovery, interactive delete functionality, and the introduction of a new Kubectl Preference Configuration (QRC) file. The presenters also discuss future plans, such as support for multiple Kubeconfig files and enhancements to server-side apply, and encourage audience participation and contributions to the Kubernetes project.

How to Expand Your IDP: The New Building Blocks of Backstage- Ben Lambert & Patrik Oldsberg, Spotify

The talk covers the latest updates and developments in the Backstage project, including new project areas, SIGs, and community initiatives. The presenters also discuss the goals and strategies for building plugins, the new frontend system, and the backend service interfaces to enhance the plugin builder experience.

Extending the Gateway API: The Power and Challenges of Policies - Kate Osborn, NGINX

The talk discusses the extensibility features of the Gateway API, particularly the policy attachment mechanism, which allows for consistent and portable customization of routing behavior across different Gateway API implementations. It highlights the challenges of complexity and discoverability in the policy attachment system and the efforts of the Gateway API community to address them.

Taming Your Application’s Environments - Marcos Lilljedahl & Mauricio "Salaboy" Salatino

This presentation explores how to tame application environments by abstracting infrastructure with tools like Dapper and building programmable CI/CD pipelines with Dagger. The speakers demonstrate how to create a unified experience for developers to deploy and manage their applications across different environments, from local development to production.

Tutorial: A Mad Scientist's Guide to Automating CNI with Generative AI - Doug Smith, Red Hat, Inc

The presenter, Doug Smith from Red Hat, demonstrates a tutorial on automating the Container Networking Interface (CNI) using generative AI. He showcases a science experiment where he uses large language models to generate CNI configurations, which are then tested and validated in a Kubernetes environment using tools like kind, multis-cni, and whereabouts.

Beyond 'Can You Mentor Me?' - Crafting the Contribution Ladder - Panel

This panel discussion explores strategies for retaining open-source contributors and fostering their growth within the community. The speakers share their personal experiences with mentorship programs, offer guidance on selecting the right project to contribute to, and discuss common challenges faced by contributors as they progress through the contributor ladder.

Create & Distribute a Plugin for Kubernetes (Kubectl) in Few Minutes? Easy! 🙂 - A. Vache, G. Acas

This talk discusses the creation and distribution of plugins for Kubernetes' command-line interface, Kubectl. The presenters demonstrate how to easily create a plugin, package it using the Krew plugin manager, and share it with the community or within a private organization.

Gateway API: What's New, What's Next? - C. Kim, N. Young, M. Lavacca, G. Cassolato

The Gateway API maintainers provided an update on the latest developments, including the new release cycle, policy enhancements, and user experience improvements. They also discussed upcoming features and sought feedback from the community to better address their needs.

Shifting Gears: Leveraging CNCF Tools to Streamline Operations at Toyota C... B. Phillips, R. Heckel

This talk discusses how Toyota Connected North America (TCNA) leveraged CNCF tools to streamline their operations and address organizational and technical challenges. The team adopted a unified platform called Maestro, which utilizes Backstage and Argo to enable collaboration, standardization, and self-service for their developers across multiple autonomous teams.

Pick My Project! Lessons Learned from Interviewing 20+ End Users for Clo... S. Akintayo, B. Mulligan

This talk discusses the lessons learned from interviewing over 20 end-users for cloud-native case studies, focusing on how projects can drive adoption by solving specific business challenges and setting up users for future needs. The speaker shares insights on creating compelling case studies and effectively promoting them to showcase the real-world impact of the project.

The Maintainer Monologues - Sarah Christoff, Jason Hall, Scott Rigby, Karen Chu & Ryan Nowak

The panel discusses their experiences as maintainers of open-source projects, including navigating project dynamics, avoiding burnout, and building relationships with other projects. The panelists share practical tips and insights on fostering healthy open-source communities and sustaining long-term involvement in these projects.

Tutorial: Simplify and Optimize Your YAML with YAMLScript - Ingy döt Net, YAML LLC

The video discusses the development of YAMLScript, a programming language built on top of YAML that aims to simplify and optimize YAML usage, particularly in the context of Kubernetes and Helm charts. The presenter highlights YAMLScript's features, such as its ability to integrate with existing YAML-based tools and its potential to improve the maintainability and readability of YAML configurations.

Gamifying Cloud Native: How to Design and Build an Educational Game for Your... C. Murray, Z. Husain

This talk discusses how to design and build an educational game to explain abstract cloud-native concepts, particularly in the context of Knative Eventing. The speakers share their process of using co-design and information design techniques, such as analogies and visual representations, to create a game that helps newcomers understand the architecture and components of Knative Eventing.

Platform Engineering for Software Developers and Architects - Daniel Bryant, Syntasso

The presentation explores the evolution of platform engineering, highlighting the symbiotic relationship between platform architecture and software architecture. It emphasizes the importance of a product-focused approach, effective APIs, abstractions, and automation in building platforms that enable developers to deliver features faster with reduced coordination.

This Platform Goes to 11: Boost Developer Productivity with Lessons from Salesforce - Joe Kutner

The speaker discusses how Salesforce's internal developer platform, Hyperforce, was rebuilt to improve developer productivity by adopting principles from Heroku, such as unified interfaces, extension points, meeting developers where they are, and ephemeralization. The talk also covers how Salesforce measures developer productivity using the SPACE framework and the 12-Factor App as a guide for cloud-native development.

Navigating the Future: Exploring the Latest in Kubernetes Dashboard Dev... M. Maciaszczyk, S. Florek

The presentation provides an overview of the recent developments in the Kubernetes Dashboard project, including the new architecture, the Standalone API, the resource list cache, and various user experience enhancements. The speakers also outline the project's future roadmap, which includes updates to the Angular framework, support for the Gateway API, and integration of additional metrics providers.

Crossplane Intro and Deep Dive - The Cloud Native Control Plane Fram... J. Watts, M. Anderson-Trocme

Crossplane is a cloud-native control plane that helps organizations provision and manage infrastructure resources across multiple cloud providers and on-premises environments. The presentation covers the basics of Crossplane, including how it extends the Kubernetes API to represent infrastructure resources, and then delves into more advanced topics such as building higher-level abstractions, using functions for composing resources, and improvements to the package management system.

What's New in Operator Framework?! - B. Palmer, R. Gottipati, L. Mohanty, A. Meszaros

This talk provides an overview of the upcoming Operator Framework version 1.0, highlighting key design decisions such as simplifying the API, improving security, and making the installation and upgrade process more declarative and predictable. The presenters also discuss plans for future enhancements, including support for additional packaging formats and improved tooling for operator authors and cluster administrators.

Ten Years of gRPC: Looking Back and Looking Forward - Kevin Nilson & Israel Shapiro

This talk provides a comprehensive overview of the past, present, and future of gRPC, a popular open-source remote procedure call (RPC) framework. It highlights the rapid growth and adoption of gRPC across various industries, the ongoing efforts to improve documentation and community engagement, and the exciting developments in language support, observability, and governance that are shaping the future of this powerful technology.

The Missing Talk About API Versioning & Evolution in Your Developer Pl... S. Schimanski, S. Urbaniak

This talk provides a comprehensive overview of the challenges and best practices for managing API versioning and evolution in Kubernetes-based developer platforms. It covers the intricacies of Kubernetes' versioning model, the importance of maintaining compatibility, and various patterns and tools to navigate the complexities of evolving APIs without breaking existing users.

With Great Flexibility Comes Great Complexity: Inspect Your Gateway API... M. Lavacca, G. Ghildiyal

The presentation explores the complexities and challenges of working with the Kubernetes Gateway API, a powerful yet flexible solution that supersedes the Ingress API. The speakers introduce Gateway Kubectl, a command-line tool designed to simplify the management and inspection of Gateway API resources, providing users with the tools and knowledge to effectively navigate the Gateway API ecosystem.

Nothing but NATS - Going Beyond Cloud Native - Byron Ruth & Kevin Hoffman, Synadia

Nats, a simple and scalable messaging system, has evolved beyond a traditional message bus to provide a foundation for building modern, cloud-native applications. The presentation introduces Nex, a Nats-based execution engine that enables developers to deploy and manage applications seamlessly across cloud and edge environments, leveraging Nats' inherent connectivity and communication capabilities.

Welcome & Introduction: A Hitchhiker's Guide to the CNCF Landscape- Katherine Druckman, Lori Lorusso

This talk provides a comprehensive introduction to the CNCF landscape, highlighting the importance of navigating the vast ecosystem of cloud-native projects beyond just Kubernetes. The speakers offer practical tips and strategies for exploring the CNCF landscape, evaluating project health, and finding hidden gems that can solve specific problems.

Vitess: Arewefastyet: Benchmarking Vitess and Mentorship Stories | Project Lightning Talk

This talk provides an overview of the Vitess project, a cloud-native database system built around MySQL, and its benchmarking tool, RV Fast Yet. It also discusses the mentorship experience of a software engineer who contributed to the revamping of the RV Fast Yet website as part of the LFX mentoring program.

gRPC: The gRPC "Standard Library" | Project Lightning Talk

This talk provides an overview of the gRPC 'standard library', a collection of standardized RPC services with pre-built implementations that can be easily integrated into gRPC-based applications. The services discussed include health checking, reflection, channel debugging, and client status discovery, all of which enhance the functionality and observability of gRPC-based systems.

Crossplane: The Many Layers of Crossplane - A Lightning Tour | Project Lightning Talk

Crossplane is a control plane framework that allows developers to build platforms based on Kubernetes and integrate with various cloud providers, source control management tools, and observability tools. The talk highlights the benefits of using Crossplane, such as faster deployments, maintained security and cost standards, and simplified continuous deployment, as well as the exciting new features and community growth around Crossplane.

SlimToolkit: Improving DX with Containers - Making it Easy to Understand, Optimize, and Debug You...

SlimToolkit is a tool that helps developers create minimal, production-ready container images by automating the process of discovering and including only the necessary components. The tool also provides a unique debugging capability for these minimal containers, allowing developers to easily understand, optimize, and debug their container-based applications.

Kubernetes (SIG-CLI): How Do We Improve kubectl Without Breaking Users? | Project Lightning Talk

This talk discusses the challenges of improving the Kubernetes command-line interface (kubectl) without breaking existing user workflows and pipelines. The speaker highlights the team's efforts to maintain a stable API surface, introduce opt-in features, and leverage the new 'kubeconfig' (kubc) functionality to separate user preferences from cluster configuration.

Open Policy Agent (OPA): That's One Small Bump for OPA, but One Giant Leap for Policy as Code | PLT

Open Policy Agent (OPA) is a powerful tool that allows you to define and manage policy as code, providing a consistent and expressive way to enforce rules across applications and platforms. The upcoming OPA 10 release represents a significant milestone for the project, introducing new features and improvements that will further enhance the capabilities of policy as code.

Lightning Talk: `Kubectl Debug` Lacks an `IDE` Option. Let’s Fix That! - Mario Loriedo, Red Hat

The talk discusses the limitations of the Kubectl Debug command, specifically the lack of an IDE option for debugging applications running in Kubernetes pods. The speaker introduces a Kubectl plugin called 'Kubectl Debug IDE' that aims to address this issue by providing an IDE-based debugging experience within the browser.

Lightning Talk: CloudEvents as APIs - Evan Anderson, Stacklok

The talk explores the use of CloudEvents, a CNCF project that standardizes asynchronous messaging, in various software engineering patterns such as event distribution, work queues, and audit logs. The speaker highlights how CloudEvents can help decouple systems, manage asynchronous tasks, and provide a standardized way to record and share event data.

Project Overview: A Hitchhiker's Guide to the CNCF Landscape - Katherine Druckman and Lori Lorusso

This talk provides a comprehensive overview of the CNCF (Cloud Native Computing Foundation) landscape, highlighting the importance of understanding this vast ecosystem of cloud-native projects. The presenters guide the audience through navigating the CNCF landscape, offering insights on project stages, filtering options, and opportunities for involvement, ultimately empowering attendees to make informed decisions and contribute to the growth of the cloud-native community.

Edge Computing

Experience in Designing & Implementing a Cloud Native Framework... Braulio Dumba & Gloire Rubambiza

The presenters share their experience in designing and implementing a cloud-native framework for digital agriculture, leveraging technologies like Kubernetes, Cube Stella, and the Software-Defined Farm paradigm. They discuss the challenges they faced, such as managing multiple farms, intermittent connectivity, and privacy concerns, and how they addressed these issues to enable scalable and secure data analytics across different farm environments.

GitOps

GitOps at Production Scale with Flux - Leigh Capili, Flox & Priyanka Ravi, G-Research

This talk provides an overview of the GitOps platform Flux, including its architecture, extensibility, and scaling strategies. The speakers discuss how Flux can be customized and scaled to meet the needs of large organizations, with a focus on security, performance, and multi-tenancy.

Tutorial: No Mess Rollouts with Gateway API: Leveraging Gateway API and... N. Polshakova, L. Gadban

This tutorial covers a comprehensive overview of using the Argo Rollouts platform to implement no-mess rollouts, including integrating with various traffic routing providers like Istio and the Gateway API. The presentation demonstrates how to leverage Argo Rollouts' advanced deployment strategies, such as Canary and Blue-Green, as well as automatic rollback based on custom metrics analysis.

Mastering ApplicationSet: Advanced Argo CD Automation - Alexander Matyushentsev, Akuity

This talk explores the advanced Argo CD automation tool, Application Set, which provides a powerful and flexible way to manage large numbers of Argo CD applications. The speaker discusses the challenges of managing thousands of applications, the various solutions that have been developed, and the strengths and trade-offs of the Application Set approach, including its declarative nature, support for complex use cases, and the need for improved tooling and troubleshooting capabilities.

Flux: What's Flux and What's New? | Project Lightning Talk

The video discusses Flux, a project that enables GitOps and progressive delivery at scale. It highlights Flux's features, including its security, scalability, and integration with various platforms, as well as its sub-project Flagger, which implements advanced deployment patterns like canaries and A/B testing.

Governance

Public Technical Oversight Committee (TOC) Meeting - Moderated by Chris Aniszczyk

This video provides an overview of the Technical Oversight Committee (TOC) of the Cloud Native Computing Foundation (CNCF), including its role, responsibilities, and the process for projects to progress through the CNCF landscape. The discussion covers topics such as the evolution of CNCF tags, the importance of community involvement, and the challenges of scaling the TOC's work as the number of CNCF projects continues to grow.

Keynote

Keynote: Closing Remarks

The keynote provided an overview of the exciting events and activities planned for the remainder of the conference. It highlighted the premiere of a documentary, the celebration of a new project graduate, and the introduction of a community hub aimed at supporting underrepresented attendees.

Keynote: Opening Remarks - Chris Aniszczyk, CTO, Cloud Native Computing Foundation

The keynote address by Chris Aniszczyk, CTO of the Cloud Native Computing Foundation, provides an overview of the organization's initiatives and the growth of the cloud-native community. The talk highlights new partnerships, training programs, and upcoming conferences that showcase the global expansion and continued evolution of the cloud-native ecosystem.

Keynote: Closing Remarks

The keynote speaker expresses gratitude to the community, co-chair Nikita, and all the speakers and sponsors who made the conference a success. They announce the closing of the solution showcase and look forward to welcoming the new co-chair, while encouraging attendees to continue participating in future events.

Keynote: Four Cloud Native Technology Areas to Watch For - Lin Sun & Karena Angell

The keynote explores four key areas in the cloud-native technology landscape that are expected to see significant developments: cloud-native AI, cost optimization and sustainability, multi-cluster and multi-tenancy, and project simplification. The speakers highlight the need for projects to focus on production readiness, security, observability, and abstraction of complexity to drive greater adoption and enable the next decade of cloud-native evolution.

Keynote: Welcome + Opening Remarks - Priyanka Sharma, Chris Aniszczyk, Joanna Lee & Jim Zemlin

The keynote session welcomes the audience to CubeCon, the largest open-source global conference, and discusses the growth of the cloud-native ecosystem. It also addresses the emerging legal challenges, such as patent trolls, that the community must collectively address to protect the success of open-source technologies.

Keynote: Opening Remarks Day 2 - Kasper Borg Nissen, Staff Platform Engineer, Lunar

This keynote address by Kasper Borg Nissen, a Staff Platform Engineer at Lunar, provides an overview of the second day of the Cube Cloud Native Con event. The talk is expected to cover the exciting events of the previous night, including the Cube Crawl and Cloud Native Fist, and introduce the first keynote speaker of the day, Taylor Dool, who will take the audience on a journey through the CNCF ecosystem to uncover end-user insights, progress, and the collaborative strength of the cloud-native community.

Keynote: Awards Ceremony

The keynote addresses the CNCF (Cloud Native Computing Foundation) Awards Ceremony, highlighting the contributions and achievements of various individuals and organizations within the CNCF community. The awards recognize top contributors, maintainers, documentarians, and those who have made significant behind-the-scenes efforts, as well as a new Lifetime Achievement Award presented to Tim Hawkins for his long-standing and impactful work in the community.

Day 1 - KubeCon + CloudNativeCon North America Highlights

The KubeCon + CloudNativeCon North America conference in Salt Lake City kicked off with a focus on leveraging cloud-native technology to drive advancements in the AI industry, defend against patent trolls, and ensure long-term security. The keynotes set the tone for the event, emphasizing the importance of embracing individuality and bringing unique perspectives to the community.

Day 0 - KubeCon + CloudNativeCon North America

The video content provides a glimpse into the opening day of KubeCon + CloudNativeCon North America 2024, where the excitement and anticipation for the upcoming week's events are palpable. The video sets the stage for the conference, inviting attendees to dive right into the collocated events and explore the cutting-edge technologies and innovations in the cloud-native ecosystem.

Keynote: Closing Remarks

The keynote speaker expressed deep gratitude towards the lineup of keynote speakers, sponsors, and the program committee for their invaluable contributions in making the first day of the CubeCon + Cloud Native Conf a resounding success. The speaker also shared important announcements, including the evening's activities and the need for attendees to provide feedback on the sessions, emphasizing the organizers' commitment to delivering an exceptional event experience.

Day 2 - KubeCon + CloudNativeCon North America Highlights

The KubeCon + CloudNativeCon North America conference has been a transformative experience, offering attendees the opportunity to engage with industry leaders, connect with maintainers, and contribute to open-source projects. The connections and insights gained at this event have the potential to significantly elevate one's career and deepen their involvement within the vibrant cloud-native community.

Day 0 - KubeCon + CloudNativeCon North America Highlights

The video highlights the excitement and anticipation surrounding the KubeCon + CloudNativeCon North America 2024 conference. The opening day sets the stage for the rest of the week, with a focus on collocated events that promise to engage and inspire the cloud-native community.

Keynote: Honoring the Past to Forge Ahead - Gail Frederick, CTO and SVP, Heroku

The keynote address highlights the evolution of the 12-Factor App Manifesto, a set of best practices for cloud-native application development, and the decision to open-source the project to involve the broader community in its modernization. The speaker invites attendees to contribute to the project, recognizing the need to update the principles to reflect the changing technology landscape and the rise of practices like telemetry, secrets management, and multi-application deployment.

Project Lightning Talk: Opening + Welcome - Jorge Castro, CNCF

The video presents an introduction to the Project Lightning Talks at the CubeCon conference, highlighting the chill Ops atmosphere, the schedule of 7-minute talks, and the importance of engaging with the open-source maintainers and projects in the Cloud Native ecosystem. The speaker encourages attendees to explore the conference, seek out maintainers, and visit the Project Pavilion to connect with the right people and have high-level conversations about the technology.

Keynote: Graduated Project Updates

This video features updates from various CNCF graduated projects, including Istio, Harbor, cert-manager, Falco, Sysdig, Spire, Argo, Flux, Linkerd, Envoy, Helm, Jaeger, and OpenPolicyAgent. The maintainers of these projects showcase the latest exciting features and improvements, highlighting the continued evolution and growth of the cloud-native ecosystem.

Keynote: A Decade of Kubernetes and Cloud Native – Are We There Yet? - Panel

This panel discussion reflects on the remarkable journey of Kubernetes and the cloud-native ecosystem over the past decade. The speakers highlight key milestones, challenges, and opportunities for the future, emphasizing the importance of simplicity, security, and adaptability as Kubernetes continues to evolve and support diverse workloads.

Kubernetes

Kubernetes SIG Architecture Intro and Updates - John Belamaric, Google & David Eads, Red Hat

This video discusses the role of the Kubernetes SIG Architecture group, which is responsible for maintaining the design principles, architectural guidelines, and development policies for the Kubernetes project. It covers topics such as API review, conformance testing, code organization, feature enhancement process, and production readiness reviews.

Networking

Solving the Kubernetes Networking API Rubik's Cube - D. Smith, S. Seetharaman, S. Utt, L. Lieberman

This talk provides an overview of the Kubernetes networking ecosystem, including the Sig Network working groups, container network interface (CNI), and emerging technologies like dynamic resource allocation (Dr) and multi-networking. It highlights how these developments are driven by the growing demands of AI/ML workloads and the need for more sophisticated networking capabilities in Kubernetes.

SIG Network Intro & Updates - Daman Arora, Shaun Crampton, Nadia Pinaeva, Dan Winship, Antonio Ojea,

This video provides an overview of the latest updates and developments in the Kubernetes SIG Network, including the introduction of the Admin Network Policy and Baseline Admin Network Policy APIs, improvements to the Q-Proxy service proxy, and the Policy Assistant tool for analyzing network policies. The presentation covers the motivations, implementations, and future plans for these features, highlighting the ongoing collaboration within the SIG Network community to address the evolving networking needs of Kubernetes users.

Seeing Double? Implementing Multicast with eBPF and Cilium - Louis DeLosSantos, Isovalent at Cisco

The talk discusses the implementation of multicast functionality using eBPF and Cilium in a Kubernetes cluster. The speaker covers the fundamentals of multicast, the challenges of implementing it in the eBPF data path, and the techniques used to achieve local and remote multicast delivery within the cluster.

CNI Updates and Direction! - Michael Zappa, Microsoft & Lionel Jouin, Ericsson

The presentation provides an overview of the Container Network Interface (CNI) updates and future direction, including the introduction of new maintainers, the addition of new verbs (status and GC), and the exploration of potential improvements such as the adoption of gRPC and better integration with Kubernetes. The presenters also discuss the challenges and limitations of the current CNI model and the plans to move the CNI and the Network Plumbing Working Group as an official SIG Network sub-project to improve organization and collaboration.

Observability

Mastering OpenTelemetry Collector Configuration - Steve Flanders, Cisco

The speaker provides a comprehensive overview of the OpenTelemetry Collector, its key components, and how to configure it effectively. He emphasizes the importance of understanding the project's structure, validating configurations, and leveraging the available documentation to ensure a successful deployment.

Now You See Me: Tame MTTR with Real-Time Anomaly Detection- Kruthika Prasanna Simha, Raj Bhensadadia

The talk explores how to leverage real-time anomaly detection methods to improve mean-time-to-respond (MTTR) by enhancing anomaly detection to provide root cause analysis. The presenters discuss various statistical and machine learning models for anomaly detection, as well as open-source tools and frameworks that can be used to build an end-to-end anomaly detection pipeline.

Low-Overhead, Zero-Instrumentation, Continuous Profiling for OpenTelemetry - Christos Kalkanis

This talk presents a low-overhead, zero-instrumentation continuous profiling solution for OpenTelemetry, built using eBPF technology. The profiler supports a wide range of programming languages, including both native and high-level runtimes, and provides seamless integration with the OpenTelemetry observability ecosystem.

Cognitive and Self-Adaptive System for Effective Distributed-Tracing in Appl... M. Tandon, A. Gusain

The talk presents a cognitive and self-adaptive system for effective distributed tracing in modern distributed systems. The proposed solution utilizes a tail-based sampling approach with a density-based clustering algorithm to intelligently identify and retain the most valuable traces, leading to improved observability and reduced storage costs.

Lessons Learned Adopting OpenTelemetry at Scale - Alex Arnell, Heroku / Salesforce

The talk discusses the challenges and lessons learned in adopting OpenTelemetry at scale within a large organization like Heroku, which processes 60 billion requests per day across 844 public repositories and multiple programming languages. The speaker shares insights on driving adoption through principles of influence, navigating semantic conventions, handling histograms, and leveraging tools like Terraform to manage the migration.

Multi-Zone Clusters Inside and Out - Tom Dean & Phil Henderson, Buoyant

This talk discusses the use of topology-aware routing in Kubernetes clusters to reduce cross-availability zone traffic and associated costs. The presenters highlight the importance of observability and metrics, using Linkerd service mesh, to monitor and manage the behavior of topology-aware routing under various failure scenarios.

One Gateway API to Rule Them All (and in the Cluster Configure Them) - Flynn, Buoyant

This talk explores the use of Gateway API to unify the management of different types of traffic in a Kubernetes cluster, including ingress, egress, and service-to-service communication. The speaker demonstrates how Linkerd, a service mesh, can leverage the Gateway API to provide a consistent approach to configuring and controlling traffic flows within the cluster.

How We Made OpenTelemetry Be Our Fitness Tracker for Your CI/CD Pipelines! - N. Woerner, A. Grabner

This talk showcases how the presenters used OpenTelemetry to gain observability into their CI/CD pipelines, enabling them to track deployment metrics, identify performance issues, and correlate pipeline data with application traces. By leveraging OpenTelemetry's flexibility and the GitLab webhook integration, they created a comprehensive solution to monitor the health of their software delivery lifecycle.

Scaling and Safeguarding the Heart of Kubernetes: Deep Dive Into etcd - Panel

The video discusses the scaling and safeguarding of the heart of Kubernetes, etcd. It covers the introduction of the etcd project, the mentorship program, the new operator working group, feature gates, and the compaction revision issue that was discovered and resolved.

Cilium: Connecting, Observing, and Securing Kubernetes and Beyond with eBPF - Panel

The video provides an overview of the Cilium project, a cloud-native networking, observability, and security solution built on top of eBPF. The presentation covers Cilium's growth, new features, and use cases from the perspectives of maintainers and users in the Kubernetes ecosystem.

CNCF TAG Network: Intro & Deep Dive - Lee Calcote, Layer5

The CNCF TAG Network aims to provide a vendor-neutral space for exploring innovative cloud-native networking solutions. The group discusses and promotes best practices, patterns, and standards to help the community navigate the evolving cloud-native networking ecosystem.

Fluent Bit: Better Pipelines for Observability - Eduardo Silva, Chronosphere

Fluent Bit is a high-performance, vendor-neutral, and scalable observability agent that collects, processes, and routes various types of telemetry data, including logs, metrics, and traces, to multiple destinations. The presentation covers the latest developments in Fluent Bit, including optimizations for JSON encoding, support for YAML configuration, and the introduction of experimental features like eBPF and profiling, which aim to enhance the observability capabilities of the tool.

Cortex Intro: Multi-Tenant Scalable Prometheus - Charlie Le, Apple & Daniel Blando, Amazon

This talk introduces Cortex, a multi-tenant scalable Prometheus solution, and discusses its key components, including the distributor, ingestor, querier, and compactor. It also highlights recent updates and features, such as the partition compactor, which aims to improve the efficiency of long-term storage and querying.

Celebrating Prometheus 3.0: A Deep Dive with the Maintainers - Richard Hartman & Josh Abreu

This video provides a deep dive into the latest version of Prometheus, version 3.0, with a focus on the new developer UI, native histograms, and the overhaul of the remote writer specification. The maintainers, Richard Hartman and Josh Abreu, discuss the key improvements and the rationale behind the design decisions, highlighting the importance of the Prometheus monitoring system in the cloud-native ecosystem.

Observability TAG Round-up and What’s New for AI Observability - Alolita Sharma & Chris Larsen

This talk provides an overview of the Observability Technical Advisory Group (TAG) within the Cloud Native Computing Foundation (CNCF), including its structure, responsibilities, and ongoing initiatives. The speakers discuss the growth of the CNCF, the evolution of observability technologies, and the TAG's efforts to support the development and adoption of open-source observability solutions, particularly in the context of AI and machine learning applications.

OpenTelemetry Project Update - A. Sharma, J. Paixão Kröhling, T. Young, M. Mclean, D. Dyla

The OpenTelemetry project provided updates on their community growth, roadmap priorities, and upcoming certification. They highlighted the significant contributions and adoption by major organizations, as well as their focus on stabilizing client instrumentation, logs, semantic conventions, and the OpenTelemetry Collector.

Distributed Tracing with Jaeger and OpenTelemetry - Jonah Kowall, Paessler & Pavol Loffay, Red Hat

This session provides an introduction to distributed tracing with Jaeger and OpenTelemetry, highlighting the importance of understanding the complex web of microservices in modern applications. The presenters demonstrate Jaeger's features, including trace visualization, error identification, and the integration of tracing with Prometheus metrics, as well as the upcoming Jaeger v2 release that is built on top of OpenTelemetry.

Emissary-Ingress: Version 4 and the Road Ahead - Flynn, Buoyant

Emissary-Ingress, a CNCF incubating project, is a developer-centric, self-service, and opinionated API Gateway that solves the ingress problem in Kubernetes clusters. The talk covers the project's history, its transition to a community-driven project, and the upcoming roadmap, including the release of version 4 and the ongoing efforts to improve the project's documentation and testing.

Life of a Packet: Ambient Edition - John Howard, Solo.io & Keith Mattix, Microsoft

The presenters discuss the architecture and inner workings of the Ambient service mesh, focusing on how traffic flows through the system, the use of Z-Tunnels and Waypoint proxies, and the benefits of their approach compared to traditional sidecar models. They provide a detailed technical overview of the various components and protocols involved in the Ambient mesh, highlighting the flexibility and performance advantages of their design.

Understanding How OpenTelemetry Network Uses eBPF for Network Observab... S.R. Shrivastava, J. Perry

The presentation discusses how OpenTelemetry Network uses eBPF for network observability, including the importance of network observability, the benefits of using eBPF, the architecture of OpenTelemetry Network, and the individual components that make up the system. The presenters also provide information on how to get involved with the project and demonstrate the capabilities of the system.

SIG Instrumentation Introduction and Deep Dive - H. Kang, D. Ashpole, R. Banker, D. Grisonnet

This video introduces the SIG Instrumentation group, which is responsible for observability signals in Kubernetes, including logs, metrics, and traces. The video covers the group's work on structured logging, metrics stability, and distributed tracing, as well as their plans for future improvements and how to get involved.

Watching the Watchers: How We Do Continuous Reliability at Grafana Labs - Nicole van der Hoeven

The talk explores how Grafana Labs, a leading observability company, has shifted its focus from just observability to continuous reliability. The speaker highlights Grafana's approach, which involves a resilient infrastructure design, comprehensive observability, effective recovery mechanisms, and a culture of engagement, to ensure the reliability and resilience of their systems.

The OTTL Cookbook: A Collection of Solutions to Common Problems - Tyler Helmuth & Evan Bradley

The OTTL Cookbook presentation covers the open collector, a flexible observability pipeline middleware, and how to use its domain-specific language OTTL to solve common problems in processing and transforming log and metric data. The presenters demonstrate several real-world examples and invite the audience to provide their own use cases for live troubleshooting and exploration of OTTL's capabilities.

From Observability to Performance - Nadia Pinaeva, Red Hat & Antonio Ojea, Google

This talk discusses the transition from observability to performance-driven metrics for services in Kubernetes. The speakers present a methodology for measuring user-facing performance metrics, such as programming latency, first packet latency, connection latency, and throughput, using contract events in the kernel to provide a more accurate picture of service performance.

Using OpenTelemetry for Deep Observability Within Messaging Queues - S.R. Shrivastava, E. Gupta

This talk discusses the use of OpenTelemetry for deep observability within messaging queues, such as Kafka and RabbitMQ. The presenters highlight the challenges faced in monitoring and troubleshooting messaging queues, and demonstrate how OpenTelemetry can help correlate traces, metrics, and logs to provide better insights and visibility into the system.

Unifying Observability: Correlating Metrics, Traces, and Logs with Exemplars an... K.P. Simha, C. Le

This talk introduces exemplars, a feature that allows correlating metrics, traces, and logs to provide deeper insights for troubleshooting and performance analysis. The speakers demonstrate how exemplars can be enabled using OpenTelemetry, configured in Prometheus, and visualized in Grafana to streamline root cause analysis and enhance overall observability.

Unlocking Cost Savings & New Possibilities: Your Guide to Prometheus Remote W... C. Styan, B. Płotka

The presentation discusses the new version 2.0 of the Prometheus Remote Write protocol, which aims to improve efficiency and add new features while maintaining the protocol's stateless and simple design. The key improvements include better handling of partial writes, support for new Prometheus features like native histograms and metadata, and a reduction in payload size through string interning.

Towards Zero Change Incidents: Intuit's Strategy for Implementing AI... A. Basu, S. Balasubramanian

The talk presents Intuit's strategy for implementing an AI-driven progressive delivery system using open-source tools to detect and prevent change-induced incidents in production. The key aspects covered include multivariate anomaly detection, real-time streaming data processing, and integrating the anomaly scores with Argo rollouts for automated canary deployments.

Optimizing LLM Performance in Kubernetes with OpenTelemetry - Ashok Chandrasekar & Liudmila Molkova

The talk discusses techniques for optimizing the performance of large language models (LLMs) running on Kubernetes, using OpenTelemetry for observability. It covers client-side and server-side metrics, as well as strategies for intelligent auto-scaling to handle varying workloads effectively.

Linkerd Update: Ingress, Egress, IPv6, Enhanced Multicluster, Rust, and More - William Morgan

The talk provides an update on the Linkerd open-source project, covering new features such as egress metrics, egress controls, rate limiting, and federated services. The speaker also discusses Linkerd's design principles, the use of Rust for its micro-proxies, and the project's sustainable business model.

Can Your Kubernetes Network Handle the Heat? Building Resilience wit... L. Lieberman, S. Seetharaman

This talk explores how AI can enhance existing chaos testing tools to improve the resilience of Kubernetes networks. The presenters demonstrate using AI-generated scenarios to test unexpected failure points and provide step-by-step instructions to inject controlled chaos into Kubernetes clusters.

Choose Your Own Adventure: The Observability Odyssey - Whitney Lee & Viktor Farcic

This talk explores the journey of observability in a production environment, covering the use of Prometheus and Pixie for metrics, Jaeger and Zipkin for tracing, and Argo Rollouts and Flagger for progressive delivery. The presenters engage the audience in an interactive session, allowing them to vote on the tools to be implemented in a live demo environment.

Tutorial: Live with Gateway API V1.2 - Flynn, Buoyant & Mike Morris, Microsoft

This tutorial provides an introduction to the Gateway API, a new API for Ingress traffic management in Kubernetes. The presenters demonstrate how Gateway API can be used to configure routing and traffic splitting for both HTTP and gRPC services, showcasing its flexibility and ease of use across different service mesh implementations.

Tutorial: OpenTelemetry Hands-on - Automatic and Manual Instrumentatio... T. Angerstein, T. Jernigan

This tutorial provides a hands-on introduction to OpenTelemetry, covering both automatic and manual instrumentation techniques for Java and Python applications. The presenters demonstrate how to set up observability using OpenTelemetry, including tracing with Jaeger and metrics with Prometheus, and show how to enrich the collected data with custom information to improve visibility into application behavior.

Shopify’s Open Source Approach to Network Monitoring with eBPF, Vector... S. Rabenhorst, M. Franklin

Shopify has developed an open-source approach to network monitoring using eBPF, Vector, and ClickHouse to capture and analyze network events, DNS queries, and TCP queue metrics across their Kubernetes infrastructure. The presented system provides a customized Grafana-based visualization tool that enables quick troubleshooting and insights into network performance within Shopify's environment.

OpenSearch: Navigating Innovation and Community Collabor... A. Bumstead, A. Jadhav, P. Priyadarshini

This video provides an overview of the OpenSearch project, highlighting its innovations in search performance, benchmarking, and observability features. The presenters discuss how OpenSearch enables data exploration, query processing, and visualization capabilities for a range of use cases, including search analytics, observability, and generative AI applications.

Achieving and Maintaining a Healthy CI with Zero Test Flakes - A. Ojea, M. Shepardson & B. Elder

The presenters discuss strategies for achieving and maintaining a healthy Continuous Integration (CI) pipeline with zero test flakes. They cover topics such as collaboration within the Kubernetes project, test design, infrastructure monitoring, and tooling to identify and resolve flaky tests.

Thanos: Intro and Updates - Ben Ye, Amazon Web Services

This talk provides an introduction to the Thanos project, a set of components that can be used with Prometheus to provide a scalable and highly available monitoring solution. The speaker covers the various components of Thanos, including the Sidecar, Querier, Ruler, and Receiver, as well as common pitfalls and updates to the project.

CoreDNS Plugins: A Deep Dive - John Belamaric, Google & Yong Tang, Ivanti

CoreDNS is a flexible, plugin-based DNS server that has become a key component of Kubernetes clusters. This talk provides a deep dive into the architecture of CoreDNS, its plugin ecosystem, and demonstrates how developers can create their own custom plugins to extend its functionality.

Live with the Experts! CNCF Ambassadors & the Best of KubeCon

This video recap of KubeCon 2022 in Salt Lake City features CNCF Ambassadors discussing the major trends and announcements from the conference, including the growth of AI/ML initiatives, the maturation of the cloud native ecosystem, and the expanding role of platform engineering. The panel also covers the importance of the CNCF community and the upcoming global KubeCon events.

Jaeger: Distributed Tracing with Jaeger and OpenTelemetry | Project Lightning Talk

Jaeger is a graduated CNCF project focused on observability and distributed tracing, which helps developers debug and optimize their distributed applications. The latest version of Jaeger, version 2, is now completely based on the OpenTelemetry collector, providing better sampling, support for Kafka, and new backend storage options.

Prometheus: Celebrating Prometheus 3.0: All You Need To Know! | Project Lightning Talk

The video discusses the release of Prometheus 3.0, also known as Prometheus Free Zero, highlighting key new features such as a refreshed UI, improved metric streaming protocol, and support for OpenTelemetry. The presentation covers performance improvements, breaking changes, and future plans for the project, inviting the audience to engage with the community and attend upcoming sessions.

OpenTelemetry: The OpenTelemetry Hero’s Journey: Working with Open Source Observability | PLT

The video discusses the challenges and opportunities of adopting OpenTelemetry, an open-source observability project. The speaker emphasizes the importance of OpenTelemetry becoming a ubiquitous standard, like USB, where users can easily plug in and leverage its capabilities without extensive knowledge of the underlying implementation.

OpenTelemetry: OpenTelemetry in Five Minutes | Project Lightning Talk

OpenTelemetry is a vendor-neutral, implementation-neutral observability framework that enables effective observability by making high-quality, portable telemetry ubiquitous. The project aims to provide a unified, interrelated braid of telemetry data that is self-referential, structured, and portable, allowing developers to instrument their applications and systems without worrying about the underlying implementation or vendor-specific requirements.

OpenTelemetry: The Future of Network Monitoring eBPF for Low-Level Insights | Project Lightning Talk

OpenTelemetry's use of eBPF for low-level network monitoring provides a powerful tool for observing and analyzing network traffic. The presentation covers the key components of the system, including the Kernel Collector, Kubelet Collector, and Cloud Collector, and demonstrates how the data is aggregated and enriched in the Reducer component to provide comprehensive network insights.

Inspektor Gadget: eBPF for Observability, Made Easy and Approachable | Project Lightning Talk

Inspector Gadget is a set of tools and a framework that makes eBPF for observability easy and approachable. It provides a package manager, orchestrator, and various pre-built 'gadgets' that simplify the process of collecting and exporting data from Kubernetes clusters and Linux hosts using eBPF.

Knative: Eventing Advances | Project Lightning Talk

Knative Eventing is a platform that enables the development of event-driven architectures on Kubernetes. This talk discusses three advanced Eventing features: security, job sync, and advanced event filtering, which address key aspects of modern cloud-native event architectures.

Istio: Why Choose Istio in 2025 | Project Lightning Talk

Istio is a secure networking fabric for distributed applications, providing security, observability, and traffic control without modifying the applications. The talk highlights the evolution of Istio, addressing user feedback to make the system more efficient, cost-effective, and simpler to operate, with the introduction of the Ambient Mesh feature.

Meshery: Visualizing Kubernetes Resource Relationships with Meshery | Project Lightning Talk

Meshery, a cloud-native management platform, offers extensible capabilities to visualize and manage Kubernetes resource relationships. The presentation highlights Meshery's ability to model and evaluate these relationships, providing a powerful tool for understanding and managing complex cloud-native infrastructure.

Lightning Talk: Minimizing Data Loss Within the OpenTelemetry (OTel) Collector - Alex Kats

This talk discusses the use of the OpenTelemetry (OTel) Collector's failover connector to minimize data loss in distributed data pipelines. The failover connector provides a health-based routing mechanism that allows the collector to automatically route data to the highest priority healthy pipeline, ensuring that data is not lost even in the face of prolonged downstream failures.

Divide and Conquer: Master GPU Partitioning and Visualize Savings with OpenCost - K. Yu & A. Ford

This presentation discusses the challenges of GPU utilization and cost optimization in Kubernetes environments. It covers GPU monitoring using NVIDIA DCGM, visualizing GPU costs with OpenCost, and implementing GPU partitioning techniques like time slicing to improve efficiency and reduce costs.

Open Source

Open Source 2.0: The Maintainers' Perspective - Panel

This panel discussion explores the evolving landscape of open source software, focusing on the role of companies in funding and sustaining open source projects. The panelists discuss the importance of aligning the interests of the community and the companies behind the projects, as well as the challenges of maintaining financial sustainability for open source maintainers.

Can You Put a Price Tag on Open Source? - Mario Fahlandt, Kubermatic & Bob Killen, CNCF

This talk discusses the value and challenges of contributing to open-source projects, highlighting the need for companies to develop a strategic approach to open-source engagement. The speakers provide insights on quantifying the benefits, aligning open-source efforts with business goals, and effectively communicating the value of open-source contributions to leadership.

Scaling

Better Pod Availability: A Survey of the Many Ways to Manage Workload Disruptions - Zach Loafman

The talk discusses the various ways to manage workload disruptions in Kubernetes, including involuntary disruptions like hardware or software failures, and voluntary disruptions like application owner actions or cluster administrative actions. The speaker highlights the challenges faced by applications that cannot tolerate disruptions, such as game servers or AI training jobs, and provides best practices for managing these 'slow yielding' applications.

Cash App's Journey Into a Multi-Cluster Ecosystem - Rachel Sheikh, Cash App

The presentation discusses Cash App's journey into a multi-cluster Kubernetes ecosystem, highlighting the challenges and strategies involved in migrating services across multiple clusters while maintaining zero downtime and service reliability. The speaker covers topics such as cluster provisioning, traffic management, observability, and automation to streamline the migration process and improve platform operations.

Cooperative Scheduling for Stateful Systems - Michael Youssef & Zhantong Shang, LinkedIn

This talk presents a cooperative scheduling approach for managing stateful systems on Kubernetes, where an operator handles the infrastructure-level tasks and application-specific logic is encapsulated in an Application Cluster Manager (ACM) microservice. The approach aims to provide a more unified and abstracted compute infrastructure that allows application teams to focus on their business logic without having to deal with the complexities of Kubernetes.

Engaging the KServe Community, The Impact of Integrating a Solutions with Standardized CNCF Projects

The panel discusses the current state and future plans of the KServe community, highlighting the challenges and accomplishments in integrating standardized CNCF projects for large language model (LLM) inference. The panelists emphasize the importance of community collaboration, open-source contributions, and addressing emerging issues such as cost optimization, performance, and security in the rapidly evolving LLM space.

Automated Multi-Cloud Large Scale K8s Cluster Lifecycle Management - Sourav Khandelwal, Databricks

This talk discusses how Databricks has built an automated way to manage the lifecycle of their Kubernetes clusters in a multi-cloud environment. The speaker explains how they leveraged Kubernetes operators and custom resource definitions to provision, upgrade, and manage their large-scale Kubernetes infrastructure in a scalable and reliable manner.

Orchestrating Quasi-Real Time Data Processing in the Computing Farm of the ATLAS Experi... G. Avolio

The talk presents the journey of the ATLAS experiment at CERN in orchestrating quasi-real-time data processing using Kubernetes. It highlights the challenges, performance optimizations, and lessons learned in scaling the system to handle the massive data throughput and low-latency requirements of the ATLAS experiment.

How the Tables Have Turned: Kubernetes Says Goodbye to Iptables - Casey Davenport & Dan Winship

The presenters discuss the transition from using iptables to nftables in the Kubernetes ecosystem, highlighting the limitations of iptables and the advantages of nftables, including better performance, scalability, and feature set. They also cover the process of migrating Calico and Kubernetes Proxy to nftables and the initial results, which show significant improvements in update latency and packet processing.

One Inventory to Rule Them All: Standardizing Multicluster Management - Corentin Debains, Ryan Zhang

This talk presents a new standardized API called 'cluster profile' that aims to unify the management of multiple Kubernetes clusters across different tools and platforms. The speakers discuss the motivation, design, and implementation of this API, as well as demonstrate its integration with existing tools like Argo and provide a roadmap for future work and community engagement.

Yahoo’s Kubernetes Journey from on-Prem to Multi-Cloud at Scale - N. Venkatachalam, P. Patel

This talk presents Yahoo's journey in adopting Kubernetes from on-premises to a multi-cloud environment at scale. It covers the development of the Omega platform, which abstracts away the complexity of Kubernetes deployment, and the challenges faced in networking, security, and cluster management during the migration to the public cloud.

Kubernetes Multi-Cluster Networking 101 - Niranjan Shankar, Microsoft & Ram Vennam, Solo.io

This talk provides a comprehensive overview of the challenges and solutions for Kubernetes multi-cluster networking, covering topics such as connectivity, service discovery, and security. The speakers discuss various approaches, including flat networks, multi-network topologies, service meshes, and Kubernetes-native solutions, highlighting the trade-offs and best practices for navigating the multi-cluster networking landscape.

How to Move from Ingress to Gateway API with Minimal Hassle - Keith Mattix, Microsoft

This talk provides a strategic approach for migrating from the Ingress API to the Gateway API in Kubernetes, emphasizing the importance of planning for failure, gradual transitions, and leveraging the Kubernetes community for best practices. The speaker offers specific tools and techniques to help make the migration process as smooth and seamless as possible.

Pod Power: Liberating Kubernetes Users from Container Resource Micromanagement - D. Narang, P. Hunt

This talk introduces a new feature called 'Pod Power' that aims to simplify Kubernetes resource management by allowing users to specify resource requests and limits at the Pod level, rather than at the individual container level. This approach provides greater flexibility and better resource utilization by enabling containers within a Pod to dynamically share unused resources.

Per-Node Api-Server Proxy: Expand the Cluster's Scale and Stability - Weizhou Lan & Iceber Gu

This talk presents a solution for expanding the scale and stability of Kubernetes clusters by introducing a transparent API server proxy. The proxy leverages eBPF to redirect API server requests, reducing the load on the API server and enabling horizontal scaling across nodes or zones.

Mastering Cell-Based Architecture: Practical Solutions and Best Practices - S. Vohra, A. Abeysinghe

This talk presents the concept of cell-based architecture, an architectural style that addresses application deployment and team architecture. The speakers discuss the practical implementation of cell-based architecture, including the challenges of cell coordination, security, isolation, and operational overhead, as well as best practices for defining cell boundaries and handling both greenfield and brownfield scenarios.

From Chaos to Calm: Building a Unified and Scalable CI/CD Pipeline at Akamai - Tomer Patel

This talk presents how Akamai built a unified and scalable CI/CD pipeline to address their challenges of long deployments, manual processes, and lack of visibility. The speaker discusses the key components of their solution, including local development environments, automated testing and approvals, and a comprehensive configuration management approach, which enabled them to achieve faster time-to-market and cost reductions.

Navigating Failures in Pods with Devices: Challenges and Solutions - Sergey Kanzhelev & Mrunal Patel

This talk discusses the challenges and solutions for navigating device failures in Kubernetes pods, focusing on the impact on AI/ML workloads. The speakers propose a gradual approach to address these issues, including improving Kubernetes infrastructure reliability, surfacing device failure information, and exploring in-place container restart and unscheduling capabilities.

SIG Autoscaling Projects Update - Jack Francis, Microsoft

This talk provides an overview of the projects under the Kubernetes SIG Autoscaling, including cluster autoscaler, Carpenter, horizontal pod autoscaler, vertical pod autoscaler, and multi-dimensional pod autoscaling. The talk highlights the complementary nature of these projects in achieving optimal cluster and workload scaling, and invites the community to contribute to the ongoing development and improvement of these projects.

The State of Cloud Native Business Value in 2024 - Panel

This panel discussion explores the state of cloud native business value in 2024, focusing on the challenges in communicating the value of cloud native technologies to business stakeholders, the importance of measuring ROI and key metrics, and the role of leadership in driving cultural and organizational change. The panel also discusses the intersection of cloud native and AI, and provides advice for organizations on their cloud native adoption journey.

Cluster API Deep Dive - Roadmap to API Graduation - Christian Schlotter & Vince Prignano

This video provides a deep dive into the Cluster API project, its vision, and the roadmap for its API graduation. The presenters discuss the project's goals, the core concepts, the focus on production readiness, and the upcoming changes in the V1 beta 2 release, including improvements to the status fields and the drain behavior.

SIG-Multicluster Intro & Deep Dive - Jeremy Olmsted-Thompson, Laura Lorenz, Ryan Zhang, Stephen Kitt

The video introduces the SIG-Multicluster group, which focuses on addressing the challenges of managing multiple Kubernetes clusters. The video covers the group's current projects, such as the About API, Multicluster Services API, and the Cluster Profile API, and invites the audience to get involved and contribute to the group's efforts.

Does My K8s Application Need CPR? Performance Evaluation of a Multi-Cluster... B. Dumba, E. Silvera

This talk presents a performance evaluation of a multi-cluster workload management application, using the Kubernets project as a case study. The authors share their experience in conducting performance experiments, identifying issues such as memory leaks and controller bottlenecks, and provide recommendations for building a robust performance testing framework for such complex distributed applications.

Setting New Standards for Reliability in Cloud Native Multi-Region Applications - Trey Caliva

The presentation discusses how Global Payments, a leading payment processing company, has leveraged cloud-native technologies and Cockroach DB to build a highly reliable and scalable payment gateway that can withstand regional outages and provide seamless global operations. The key aspects covered include microservices architecture, containerization, multi-region deployments, and the unique features of Cockroach DB that enable consistent, fault-tolerant, and scalable transactions across multiple regions.

Kubernetes at Scale: Practical Solutions for Enhanced CNI and Kubelet P... H. Santana, B.G. da Silva

This talk discusses practical solutions for enhancing Container Network Interface (CNI) and Kubelet performance at scale in Kubernetes clusters. The presenters share their experiences in troubleshooting and resolving issues encountered when scaling a Kubernetes cluster to 1,000 nodes, including lessons learned about default configurations, multiple error sources, and the importance of proactive monitoring and best practices.

Misadventures in Large Scale Cluster Performance - Shane Corbett, AWS & Dima Ilchenko, Lacework

This talk explores the challenges of achieving large-scale Kubernetes cluster performance, delving into the intricacies of the control plane, the impact of workloads on scalability, and the importance of understanding the underlying metrics and logs to identify and resolve various bottlenecks. The presenters share their real-world experiences and provide practical guidance on optimizing Kubernetes clusters for high-performance workloads.

Automated Multi-Cloud, Multi-Flavor Kubernetes Cluster Upgrades Using Operators - Ziyuan Chen

The presentation discusses how Databricks automated multi-cloud, multi-flavor Kubernetes cluster upgrades using Kubernetes operators, addressing the challenges of frequent upgrades at scale across different cloud providers and Kubernetes flavors, and achieving consistent upgrade behavior with reduced downtime and improved rollback capabilities.

When Life Gives You Containers, Make an Open Source RDS: A Kubernetes Love Story - Sergey Pronin

This talk explores the journey of running databases on Kubernetes, from initial enthusiasm and complexity to the development of an open-source database-as-a-service solution called Perona Everest. The speaker highlights the importance of balancing technical and emotional aspects when adopting cloud-native technologies, emphasizing the need to choose the right tools and approach to empower developers and focus on application development rather than database management.

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kub... D. Gray

The talk discusses optimizing load balancing and autoscaling for large language model (LLM) inference on Kubernetes. It presents performance comparisons between different load balancing strategies, including request-based load balancing with Kserve, and custom load balancing based on key-value cache utilization and queue size, and discusses techniques to speed up the scale-up process of new model replicas.

Perform Laser Focused Deployments by Deciding in Advance the Blast Radius - Kostis Kapelonis

This talk discusses how to perform laser-focused deployments using Argo Rollouts, a Kubernetes-native progressive delivery tool. The speaker presents two methods for controlling the blast radius of canary deployments: static URL routing and dynamic header-based routing, demonstrating how to selectively expose the new version to specific user groups.

All Your Routes Are Ready, More or Less - Dave Protasowski

This talk covers the challenges and solutions in building a reliable and consistent abstraction layer for managing Kubernetes networking and routing using the open-source project Knative and the emerging Gateway API standard. The speaker discusses the importance of providing clear and actionable status conditions to clients, the complexities of handling route updates safely, and the need for better integration and guarantees from the underlying proxy implementations.

Keynote: Take a Peek Under the Hood of Cloud-Native AI at Scale - Chen Goldberg & Peter Salanki

The talk explores the challenges of running large-scale, cloud-native AI training workloads on Kubernetes, focusing on the importance of observability, automation, and fleet management to ensure reliable and performant clusters. The speakers share their experiences and strategies in building a platform that can handle the complexities of AI infrastructure, from hardware provisioning to real-time monitoring and remediation.

Tutorial: Kubernetes Smart Scaling: Getting Started with Karpenter - Panel

This tutorial provides a comprehensive introduction to Karpenter, a smart Kubernetes autoscaler that dynamically provisions instances based on workload requirements, without the need for pre-configured node groups. The session covers Karpenter's key features, including cost optimization, diverse workload support, and seamless day-two operations like upgrades and patching.

What Istio Got Wrong: Learnings from the Last Seven Years of Service Mesh - C. Posta, L. Ryan

The talk discusses the lessons learned from the last seven years of the Istio service mesh project, including issues with focus, vision, quality, and architectural complexity. The speakers highlight the importance of delivering a simple, maintainable, and user-friendly product, and discuss the introduction of Ambient Mesh as a solution to address these challenges.

Kubernetes on Multisites – A Story About Stateful App, Hybrid Clouds, a... F. Coulombel, J. Šafránek

The presentation discusses the challenges of running stateful applications on Kubernetes across multiple sites, both on-premises and in the cloud. The key takeaways are to leverage the built-in high availability capabilities of the application as much as possible, and to carefully consider the trade-offs between different storage replication strategies when the application cannot handle the replication itself.

Kubernetes Upgrades: Less Pain, More Gain (and Maybe a Little Swearing) - Jago Macleod, Google

The talk discusses the importance of frequent Kubernetes upgrades, the complexities involved due to the ecosystem's dependencies, and strategies for managing upgrades safely, such as progressive rollouts and release channels. The speaker emphasizes the need to address the risk of fragmentation and the importance of keeping up with the latest features and security improvements in Kubernetes.

How We Scale a Distributed SQL Database to 1 PB - Jinpeng Zhang, PingCAP

This talk presents how PingCAP has scaled their distributed SQL database, TiDB, to handle petabyte-scale data. It covers the key architectural and operational challenges they faced, such as metadata management, hotspot handling, observability, and noisy neighbor issues, and the solutions they implemented to address these challenges.

Tick, TAG, TOC - Keeping Cloud Native Running - Panel

This panel discussion explores the evolution of the Technical Advisory Groups (TAGs) in the Cloud Native Computing Foundation (CNCF) ecosystem. The panelists share their experiences in leading and contributing to various TAGs, such as security, runtime, storage, and app delivery, and discuss how the TAGs have played a crucial role in guiding the maturation of cloud-native projects through the sandbox, incubation, and graduation stages.

Supercharge Your Kubernetes Autoscaling with Custom Metrics - V. Krishna Samudrala, S. Akinapally

This talk explores how to supercharge Kubernetes autoscaling using custom metrics. It covers the use of Prometheus Adapter and Keda to scale Horizontal Pod Autoscaler (HPA) based on application-specific metrics, providing insights into best practices and live demos.

The Key Value of etcd Over Custom Resources: Scalability - Jef Spaleta, Isovalent at Cisco

The talk discusses the scalability challenges faced by the etcd key-value store used by Kubernetes, particularly in the context of custom resource definitions (CRDs) used by projects like Cilium. The speaker highlights the need for better scalability testing and collaboration between project maintainers and users to address these issues.

Share the Ride: Robust Multi-Tenancy in Kubernetes at Uber - Sashank Appireddy & Apoorva Jindal

The video presents Uber's implementation of a multi-tenant single cluster architecture for Kubernetes, addressing challenges faced with a multi-cluster approach. The solution leverages namespace-based isolation, custom controllers, and capacity management techniques to provide robust multi-tenancy while improving manageability, user experience, and resource efficiency.

Upgrade Safely: Avoid the Pitfalls of Kubernetes Versioning - Rob Scott, Google

This talk discusses the challenges and best practices for safely upgrading Kubernetes versions, focusing on the pitfalls of Kubernetes versioning and the complexities of managing Custom Resource Definitions (CRDs). The speaker provides insights into managing API version changes, understanding storage versions, and navigating the experimental and standard channels in the Gateway API, aiming to help attendees avoid future upgrade-related pains.

Zero Downtime Upgrades at Scale: How Okta Manages Hundreds of Clusters Daily - J. Albuixech, K. Lei

The video discusses how Okta manages hundreds of Kubernetes clusters daily with zero downtime upgrades using a 'red-black' deployment strategy. The presentation covers Okta's platform architecture, the technical details of their deployment process, and the various challenges and solutions they have implemented to achieve reliable and scalable infrastructure upgrades.

Migratory Patterns: Making Architectural Transitions with Confidence and Grace - Pete Hodgson

This talk discusses the challenges of big-bang migrations and presents the 'expand-contract' pattern as an alternative approach. The speaker showcases real-world examples of using this pattern to migrate architectural components, such as databases and APIs, without downtime and with increased confidence in the migration process.

Bloomberg's Journey to Manage a Multi-Cluster Training Application with Karmada - Y. Zhang, W. Lai

The video discusses Bloomberg's journey in managing a multi-cluster training application using Karmada, a multi-cluster and multi-cloud Kubernetes management system. It highlights the challenges faced by Bloomberg's data science platform and how Karmada helped address them, including automatic failover, configuration management, and balanced scheduling across clusters.

Are Your Microservices Truly Scaling? A Framework for Unlocking the Stateful Backend - M. Penaroza

This talk discusses the challenges of scaling stateful backends in microservices architectures and proposes a framework for addressing these challenges. The speaker highlights the importance of scalability, reliability, and operability in data systems, and introduces TiDB as a distributed SQL database that can provide the necessary scalability and performance to support large-scale applications.

Building a More Resilient Future with Advanced Cloud Provider Testing - M. McCune, B. Kromhout

This talk discusses the history of cloud providers in Kubernetes and the challenges of testing cloud provider functionality. The presenters propose a new approach to cloud provider testing that involves separating the test code from the core Kubernetes repository and allowing each cloud provider to maintain their own test implementation, while providing a common set of tests that can be run across all providers.

Thousands of Gamers, One Kubernetes Network - Surya Seetharaman, Red Hat & Girish Moodalbail

This talk discusses the challenges of achieving end-to-end quality of service (QoS) in Kubernetes networks, using a cloud gaming environment as a use case. The presenters share how they leveraged custom resources and differentiated services code point (DSCP) to prioritize and control network traffic, ensuring a seamless gaming experience for users.

Topology Aware Routing: Understanding the Tradeoffs - Rob Scott, Google

The talk discusses the challenges and trade-offs involved in implementing topology-aware routing in Kubernetes, including the need to balance proximity, cost, and availability. The speaker presents the evolution of their approach, from topology hints to a simpler traffic distribution model, and highlights ongoing work to integrate feedback loops and support for different data planes.

The Node Tetris Rabbit Hole: Why Your Binpacking Might Be Underperforming - Hannah Taub, Adobe Inc.

This talk explores the challenges of efficient resource management in a large-scale Kubernetes platform, Adobe's Ethos, and the strategies the team employed to address issues like underperforming bin packing, client behavior, and platform changes. The presentation covers the team's journey through the 'Node Tetris Rabbit Hole', discussing their attempts to improve cost efficiency, the client pushback they faced, and the solutions they ultimately adopted, including the use of Carpenter and adjustments to the Max Pods configuration.

The State of Kubernetes Optimization and the Role of AI - Panel

This panel discusses the evolution of Kubernetes autoscaling, from simple rule-based approaches to more advanced model-based and AI-driven techniques. The panelists share their experiences and insights on leveraging tools like Carpenter, Kube-prom, and AI to optimize Kubernetes workloads for efficiency, cost-savings, and sustainability.

KubeSlice: Migrate Kubernetes Services With Confidence! | Project Lightning Talk

KubeSlice is a multi-cloud, multi-cluster, and multi-tenant secure service connectivity solution that enables seamless migration of Kubernetes services across different environments without exposing private endpoints. The project is also expanding its capabilities to include dynamic GPU allocation and resource usage optimization across multiple slices and clusters.

k8gb: Global Load Balancing, the Kubernetes Way | Project Lightning Talk

k8gb is a Kubernetes-native global load balancing solution that addresses the challenge of managing traffic across multiple clusters in different regions. It leverages DNS, CoreDNS, and ExternalDNS to provide a vendor-agnostic and highly available solution for routing users to the most appropriate cluster based on various load balancing strategies.

KubeStellar: Multi-Cluster Configuration Management with KubeStellar | Project Lightning Talk

KubeStellar is a multi-cluster configuration management solution that simplifies the deployment and scaling of workloads across diverse Kubernetes environments, enabling centralized control, customization, and seamless integration with cloud-native projects. The presentation highlights KubeStellar's ability to overcome the challenges of managing multiple clusters, such as scaling, configuration variations, and direct compatibility with standard Kubernetes resources.

Lightning Talk: Is Everyone O-KEDA? “Exciting” Lessons Learned in Our Journey to Use KED... B. Davis

The talk discusses the lessons learned in implementing the Kubernetes Event-Driven Autoscaler (KEDA) within an organization. It highlights the challenges faced, such as scaling issues, rate limits, and the need to balance message processing, and provides insights into the unexpected surprises that can arise during the KEDA adoption journey.

Lightning Talk: Future-Proofing Kubernetes: Impact of Storage Version Migration and... N. Chaudhari

This talk discusses the impact of the storage version migration in Kubernetes, highlighting the importance of the storage version migrator feature and the potential implications of the upcoming change in the resource version schema. The speaker emphasizes the need for the community to be aware of this change and encourages feedback and collaboration to ensure a smooth transition.

Lightning Talk: Safer Cluster Upgrades with Mixed Version Proxy - Richa Banker, Google

The video discusses a new feature in Kubernetes 1.28 called Mixed Version Proxy, which enables safer and more reliable upgrades of Kubernetes clusters by intelligently routing requests to the appropriate API server based on the requested resource and the API server's version. The feature addresses the challenges of managing a mixed-version cluster during an upgrade process, ensuring that requests are directed to the correct API server and minimizing the risk of application disruptions, deployment failures, and data inconsistencies.

Enabling Fault Tolerance for GPU Accelerated AI Workloads in Kubernetes - A. Singh & A. Paithankar

The talk discusses the need for fault tolerance in GPU-accelerated AI workloads running on Kubernetes, covering the complexities of AI infrastructure, the sources of failures, and the four pillars of fault tolerance: monitoring, propagation, recovery, and remediation. The presenters also highlight the challenges in scaling to large GPU clusters and the scope for enhancements in the ecosystem.

Distributed Multi-Node Model Inference Using the LeaderWorkerSet API- Abdullah Gharaibeh, Rupeng Liu

This talk presents the LeaderWorkerSet API, a novel approach for distributed multi-node model inference that addresses the challenge of serving increasingly large language models. The API provides features such as horizontal scaling, group restart, rolling updates, and topology-aware scheduling to enable efficient and automated deployment of these models across multiple nodes.

Scheduling

GÖDel Scheduler: A Unified Scheduler for Online and Offline Workloads - B. Li, Y. Yin, L. Jiang

GÖDel Scheduler is a unified scheduler for online and offline workloads, designed to address the challenges of large-scale infrastructure with heterogeneous workloads and resource requirements. The presentation highlights key features like job-level affinity, resource reservation, and real-time load-aware scheduling, along with the system's achievements in improving resource utilization, elasticity, and fragmentation.

SIG Scheduling Intro & Updates - Aldo Culquicondor, Google & Kensei Nakada, Tetrate.io

The video provides an overview of the Kubernetes Scheduling Special Interest Group (SIG), including updates on the Kube Scheduler, the Kube Quota (Q) project, and the Kube Scheduler Simulator (Kua). The presentation covers recent performance improvements, new features like Queueing Hints and Asynchronous Preemption, as well as the roadmap for these projects.

Scale Job Triggering with a Distributed Scheduler - Cassie Coyle & Artur Souza, Diagrid

The presentation discusses how Dapper, a distributed system, scales job triggering using a distributed scheduler. The key features highlighted include the use of an embedded etcd database, a cron scheduling library for dynamic job partitioning, and performance improvements in handling active reminders and workflows.

Behind Schedule: Pod Resource Configuration from Beginning to... Huh? - Joe Thompson, Independent

This talk provides a comprehensive overview of the complexities and nuances involved in pod resource configuration in Kubernetes, including concepts like requests, limits, quality of service, priority, eviction, and preemption. The speaker offers practical advice and strategies for managing these challenges, emphasizing the importance of simplicity, testing, and leveraging autoscaling tools to ensure reliable and efficient resource allocation.

Keynote: Multicluster Batch Jobs Dispatching with Kueue at CERN - Ricardo Rocha & Marcin Wielgus

The video presents the Kueue project, a multicluster batch jobs dispatching solution developed at CERN, which addresses the challenges of running complex scientific workloads on Kubernetes by providing features like cluster quotas, fair sharing, and automatic resource provisioning across multiple clusters. The speaker demonstrates how Kueue simplifies the management of such workloads and enables seamless execution across on-premises and cloud-based Kubernetes clusters.

Kubernetes Workspaces: Enhancing Multi-Tenancy with Intelligent Apiserver... J. Munnelly, A. Tosatto

This presentation introduces Kubernetes Workspaces, a concept that provides a view over a set of resources running across multiple namespaces in a Kubernetes cluster. The goal is to promote security best practices and standardization around multi-tenancy operations, by allowing teams to manage resources across namespaces through a scoped API-level abstraction.

Unlocking the Future of GPU Scheduling in Kubernetes with Reinforcement Learning- N. Goyal, A. Gupta

This presentation outlines a reinforcement learning-based solution for efficient GPU scheduling in Kubernetes, addressing limitations of the default Kubernetes scheduler. The proposed approach aims to improve GPU utilization, workload prioritization, and scalability by dynamically allocating GPU resources based on real-time demand and application characteristics.

WASM + KWOK Wizardry: Writing and Testing Scheduler Plugins at Scale - D. Pejchev, J. Giannuzzi

This talk explores the use of WebAssembly (WASM) and the Kubernetes scheduler to build and test scheduling plugins at scale. The presenters demonstrate how to create custom scheduling plugins using different frameworks and tools, and how to leverage the Kubernetes scheduler simulator and the KWOK (Kubernetes Without Kubectl) framework to test and optimize the performance of these plugins in a controlled environment.

Best of Both Worlds: Integrating Slurm with Kubernetes in a Kubernetes... E.A. Gutierrez, A. Beltre

This presentation discusses the integration of Slurm, a popular HPC job scheduler, with Kubernetes, a leading container orchestration platform. The key focus is on the development of K-Foundry, a project that aims to bridge the gap between traditional HPC environments and cloud-native technologies, enabling seamless execution of workloads across both platforms.

Better Together! GPU, TPU and NIC Topological Alignment with DRA - John Belamaric & Patrick Ohly

This talk covers the evolution of Dynamic Resource Allocation (DRA) in Kubernetes, a feature that provides a richer API for expressing complex device resource requirements. The speakers discuss how DRA enables topological alignment of GPU, TPU, and NIC resources to optimize performance, and propose future enhancements to simplify the user experience while managing the underlying complexity.

Bloomberg’s Journey to Improve Resource Utilization in a Multi-Cluster Platform- Yao Weng, Leon Zhou

This talk discusses Bloomberg's journey to improve resource utilization in a multi-cluster platform. The key challenges addressed include overbudgeting, unbalanced usage, and difficulty with multi-cluster federation, which were solved by adopting Karmada, a Kubernetes-native multi-cluster management system, to enable time-based quota management, even job distribution, and automatic configuration propagation.

WG Batch Updates: What’s New and What Is Next - Marcin Wielgus, Google & Kevin Hannon, Red Hat

The video discusses updates to the Kubernetes Batch Working Group, including the introduction of the Job Set and Q projects. The Job Set project aims to provide a unified API for running distributed batch workloads, while Q is a scheduling and admission system that helps manage and prioritize batch jobs on Kubernetes.

Service Profiling Based Management and Scheduling in K8- Jia Deng, Cong Xu & Mingmeng Luo, Bytedance

The video discusses a service profiling-based resource management and scheduling system developed by ByteDance, called Catalyst. The system uses historical data and machine learning models to predict resource usage, aiming to optimize resource utilization and reduce performance bottlenecks in Kubernetes clusters.

Open Cluster Management: Scheduling AI Workload Among Multiple Clusters | Project Lightning Talk

Open Cluster Management provides a solution for scheduling AI workloads across multiple clusters, utilizing a Placement Prioritizer API to dynamically place workloads on clusters with available GPU resources. This approach aims to democratize access to AI and enable small research facilities, educators, and individuals to leverage the power of AI and machine learning.

Lightning Talk: Evaluating Scheduler Efficiency for AI/ML Jobs Using Custom Resourc... D. Shmulevich

The talk discusses the efforts to compare and assess the performance of different scheduler systems for AI workloads on Kubernetes. The speaker presents the use of a custom resource exporter to measure GPU occupancy and the results of testing different scheduling scenarios, highlighting the potential benefits and limitations of the approach.

Security

Keynote: Open Source Security Is Not A Spectator Sport - Justin Cappos & Santiago Torres Arias

This keynote presentation by Justin Cappos and Santiago Torres Arias highlights the importance of active involvement in open-source security, emphasizing how anyone can make a difference, and showcasing the ripple effects of their work on projects like Git and the In-Toto framework, which aimed to protect against sophisticated software supply chain attacks.

From Silicon to Service: Ensuring Confidentiality in Serverless GPU Cloud Functions - Zvonko Kaiser

This talk discusses the importance of ensuring confidentiality in serverless GPU cloud functions, focusing on the need for a full-stack approach to attestation and security, from the silicon to the service. The speaker explores various use cases, such as federated learning and generative AI, and the challenges of providing a confidential clean room for multiple parties in a serverless environment.

It's Dangerous to Build It Alone, Take This. - Jeremy Rickard & Ashna Mehrotra, Microsoft

This talk discusses best practices and tools for securing the open-source software supply chain. It covers strategies for ingesting, scanning, updating, and rebuilding open-source dependencies, highlighting the use of CNCF projects like ORAS, Trivy, and Copa to address vulnerabilities and maintain control over the software components used in development and deployment.

From Standards to Practice: The Journey to Container Maturity - Carmen Chow & Thomas Robinson, Yelp

The talk presents a container maturity model developed by Yelp to improve the security of their container infrastructure. The model focuses on defining a set of security standards, integrating risk management practices, automating compliance checks, and providing a container scoring mechanism to prioritize remediation efforts.

What Agent to Trust with Your K8s: Falco, Tetragon or KubeArmor? - Henrik Rexed, Dynatrace

This presentation compares the runtime security agents Falco, Tetragon, KubeArmor, and Tracy, evaluating their components, capture and filtering capabilities, observability features, and performance impact. The speaker provides a detailed analysis of the strengths and trade-offs of each solution, helping the audience choose the best fit for their Kubernetes environment.

Exceeded Your Validation Cost Budget? Now What? - Joel Speed, Red Hat

This talk explores the concept of validation cost budgets in Kubernetes custom resource definitions (CRDs) and how to avoid exceeding them. It discusses the factors that contribute to the cost of validation, such as rule cost and cardinality, and provides practical advice on how to design CRDs to stay within the budget.

Mish-Mesh: Abusing the Service Mesh to Compromise Kubernetes Environments - H. Ben-Sasson, N. Ohfeld

This talk discusses how attackers can abuse the service mesh, a critical component in many Kubernetes environments, to compromise production environments. The presenters demonstrate techniques to map internal networks, bypass security rules, and gain unauthorized access to sensitive infrastructure and data using legitimate features of service mesh solutions like Linkerd and Istio.

Multi-Tier Security in WasmCloud: From Developer Constraints to Platform Extensibility - B. Townsend

The talk explores the multi-tier security model of the WasmCloud platform, which leverages WebAssembly's unique capabilities to provide a secure and extensible platform for deploying applications. It showcases how WasmCloud's capability-driven permissions and pluggable security components enable developers to build secure applications while allowing platform engineers to maintain control over the runtime environment.

Microsegment Your Network Like Mastercard with AdminNetworkP... J. Zaiss, D. Ruggeri, S. Seetharaman

This talk discusses MasterCard's journey in implementing microsegmentation in their multi-tenant Kubernetes clusters using the Admin Network Policy API, a new feature in the Kubernetes networking ecosystem. The presenters highlight the challenges they faced, the solutions they developed, and the lessons learned in their efforts to achieve secure-by-default network isolation for their containerized workloads.

TLS and MTLS: Introduction to Modern Security - Andrew Davis & Sandeep Kanabar

This talk provides a comprehensive introduction to TLS and MTLS, the fundamental security protocols that underpin modern digital communication. It covers the core concepts of cryptography, public key infrastructure, and how TLS and MTLS enable secure and authenticated communication, especially in the context of zero-trust architectures and microservices environments.

Peak Innovation and Cloud Tweaks: Falco’s Ongoing Runtime Security Development - Panel

This video covers the latest developments in the Falco runtime security project, including new features like more expressive rule definitions, customizable outputs, and improvements to the plugin ecosystem. The speakers also introduce Talon, a new response engine that integrates with Falco to provide automated mitigation of detected threats.

Securing the Future of Ingress-Nginx - James Strong, Isovalent & Marco Ebert, Giant Swarm

This presentation discusses the security improvements and feature deprecations in Ingress-Nginx 1.12, as well as the plans to transition the project to the Gateway API. The speakers highlight the challenges faced by the maintainers and the community's involvement in shaping the future of the project.

SPIRE: Intro & In-Depth Exploration of the Upcoming Forced Rotation and Revoc... A.M. Fayó, M. Yacob

This presentation provides an in-depth exploration of the upcoming forced rotation and revocation features in the Spire project, a maintainer of the Spify framework. It covers the handling of signing keys, the life cycle of signing keys, and a live demo showcasing the new local authority server API and CLI commands for managing key rotation and revocation.

Secure Release Processes with in-Toto Policy Verification - John Kjell, Aditya Sirish A Yelgundhalli

The talk discusses secure release processes using the in-toto policy verification framework. It covers the challenges of software supply chain attacks, the need for compliance with security standards, and the use of in-toto to automate and verify the entire software development and release pipeline.

Elevate Your Kubernetes Policy Game with Kyverno! - C. Breteche, L. Chiang, K. Tu

This talk explores the experience of integrating Kyverno, a Kubernetes policy engine, into the Robin Hood platform. The speakers discuss the challenges they faced with existing policy solutions, the evaluation process that led them to choose Kyverno, and the detailed implementation steps, including policy migration and testing strategies. The talk also delves into the recent improvements to Kyverno's reporting system, including the introduction of the Kyverno Report Server to address scalability and operational concerns.

Best Friends Keep No Secrets: Going Secretless with cert-manager - Ashley Davis & Tim Ramlot, Venafi

This talk discusses how to use Cert-Manager, a tool for provisioning X.509 certificates, in a secretless manner. It covers the issuance process, where Cert-Manager can use Kubernetes service account tokens instead of static secrets, and the provisioning process, where Cert-Manager can use a CSI driver to mount certificates directly into containers without storing them in Kubernetes secrets.

Using Notary Project to Ensure Authenticity and Integrity of Artifacts Wit... T. Mladenov, T. Rasche

This presentation discusses how the Notary Project can be used to ensure the authenticity and integrity of software artifacts in enterprise environments, even with complex and heterogeneous infrastructure. The presenters share their experience in implementing Notary Project at Mercedes-Benz, highlighting the flexibility and pluggability of the tool, as well as the roadmap for future developments, including support for signing arbitrary artifacts and attestations.

Tutorial: Confidential Containers 101: A Hands-on Workshop - Archana Choudhary & Suraj Deshmukh

This tutorial provided a hands-on workshop on securing containerized workloads using confidential containers. The session covered the concepts of confidential compute, the confidential containers project, and demonstrated how to deploy and use encrypted container images and enforce security policies to protect containerized applications.

Expanding the Capabilities of Kubernetes Access Control - Jimmy Zelinskie & Lucas Käldström

This talk explores the challenges and limitations of Kubernetes' current authorization mechanisms, and presents emerging solutions like relationship-based access control (RBAC) and the integration of policy engines like Cedar with Kubernetes. The speakers emphasize the complexity of authorization and the importance of leveraging existing tools and models rather than rolling out custom solutions.

From Observability to Enforcement: Lessons Learned Implementing eBPF R... A. Kapuścińska, K. Kourtis

This talk discusses the use of eBPF for implementing runtime security observability and enforcement policies, with a focus on the Tetragon tool. The speakers cover the challenges of writing effective enforcement policies and the integration of Tetragon with Kubernetes, highlighting the need for improved user experience and community support for building specialized security tools on top of the Tetragon framework.

Squashing Trampoline Pods: The Future of Securely Enabling Hardware Extensions- Joe Betz, David Eads

This talk presents a novel approach to securely enabling hardware extensions in Kubernetes clusters by squashing trampoline pod attacks. The speakers discuss techniques such as service account node claims, validating admission policies, and structured authorization configurations to restrict the privileges and access of container breakouts and prevent the escalation of attacks across the cluster.

CEL-Ebrating Simplicity: Mastering Kubernetes Policy Enforcement - Kevin Conner & Anish Ramasekar

This talk provides a comprehensive overview of Kubernetes' Cell language, a powerful policy enforcement mechanism. It covers the key concepts of validating and mutating admission policies, demonstrating how to use Cell expressions to define flexible and efficient policy rules within the Kubernetes API server.

Bridging Clouds: TikTok’s Blueprint for Unified OIDC Access on Multi-Cloud Kubernetes - N. Mogulla

This presentation discusses Tik Tok's blueprint for unified OIDC access management on multi-cloud Kubernetes clusters. The solution involves a reverse proxy (Envoy), an authorization server, and cluster metadata management to provide a consistent user experience across on-premises and cloud-based Kubernetes clusters.

Breaking Free from Vulnerability Scanning Noise: Automated VEX Aggregation for Accuracy - T. Fukuda

The presentation discusses a solution called 'Vex Hub' to address the challenges of vulnerability management, including the problem of false positives and the need to automate the discovery and distribution of 'Vex' (Vulnerability Exploitability eXchange) documents. The speaker proposes a standardized 'Vex Repository Specification' to enable anyone to publish their own 'Vex' repository, allowing security scanners to consume and apply these documents to filter out non-exploitable vulnerabilities.

AuthZEN: The “OpenID Connect” for Authorization - Omri Gazitt, Aserto

The video discusses the need for a standardized authorization framework, similar to OpenID Connect for authentication, and introduces the work being done by the OpenID Foundation's AuthZEN working group to create a specification for cloud-native, fine-grained, and real-time authorization. The video also showcases a demo application and the interoperability efforts among various authorization platforms to adopt the AuthZEN specification.

Harbor Project - The Maintainers Session What We Have Accomplished! - Orlin Vasilev & Vadim Bauer

The Harbor project, a graduated CNCF project, is a self-hosted container registry that provides features like Notary, replication, vulnerability scanning, and policy management. The session covers the recent Harbor 22 release, the ecosystem around Harbor, and the future plans for improving the project's audit logs, AI model management, and community engagement.

Cilium, eBPF, WireGuard: Can We Tame the Network Encryption Performanc... D. Borkmann, A. Protopopov

The talk discusses the integration of Cilium, eBPF, and WireGuard technologies to optimize network encryption performance in Kubernetes clusters. The presenters demonstrate various techniques, such as inline encryption, source port hashing, and multi-device configurations, to achieve significant performance improvements for both bulk and request-response workloads.

Workload Identity Federation – Stop Using Long-Lived Credentials - Benjamin Dronen & Anjali Telang

The video discusses the benefits of using workload identity federation to authenticate to cloud resources instead of long-lived credentials. It introduces Spire, an open-source project that provides a scalable and federated identity solution for Kubernetes and other platforms.

TUF: Secure Distribution Beyond Software - Marina Moore, Independent

This talk discusses how the Update Framework (TUF) can be used to securely distribute not just software, but also critical metadata such as software bills of materials, attestations, and vulnerability information. The speaker explains how TUF's features, such as compromise resilience and delegation, can help ensure the integrity and freshness of this metadata, which is crucial for securing the software supply chain.

Secure by Design CI/CD: Practical Insights from Adobe and Autodesk - Vikram Sethi & Jesse Sanford

The presenters discuss practical insights on securing the CI/CD pipeline, emphasizing the importance of separating CI and CD concerns, and leveraging tools like SLSA, SBOM, and attestations to build trust and visibility across the software supply chain.

AI for Policy and Policy for AI! - P. Lamba, B. Kurktchiev, A. Suderman, R. Petty, J. Ray

This panel discussion explores the intersection of AI and policy in the context of Kubernetes, covering topics such as the role of AI in policy authoring, the challenges of managing AI/ML workloads, and the emerging concepts around policy enforcement within AI models. The panel provides insights on the current state of tools and techniques, as well as the future directions and considerations for effectively integrating AI and policy in the Kubernetes ecosystem.

GitOops... I Did It Again! Protecting Your GitOps System from Being Used for... O. Livni, E. Pticha

This talk explores the security implications of GitOps, a popular DevOps and cloud-native approach, from an attacker's perspective. The presenters share their research on discovering a critical vulnerability in the Argo CD GitOps tool and provide a comprehensive security checklist to help organizations strengthen their GitOps systems.

Running Quantum-Safe Applications on Kubernetes - Paul Schweigert & Michael Maximilien, IBM Quantum

This presentation discusses the importance of preparing for the threat of quantum computing to current cryptographic systems, and outlines steps organizations can take to become 'quantum-safe', including discovering where cryptography is used, observing its usage, and transforming applications to use post-quantum cryptographic algorithms. The speakers also demonstrate how IBM has implemented quantum-safe cryptography in their Quantum platform running on Kubernetes.

Micro-Segmentation and Multi-Tenancy: The Brown M&Ms of Platform Engine... J. Bugwadia, R. Wonnacott

The presentation discusses the importance of micro-segmentation and multi-tenancy in platform engineering, using tools like Cilium and Kube to automate security and provide the right guardrails for developers. The speakers demonstrate a live example of how these concepts can be implemented to achieve secure self-service and application isolation within a Kubernetes-based platform.

Powering Automatic Authorization in Envoy Through Live Traffic Inspection - Dom Del Nano

This talk explores how to leverage observability tools like Pixie to automatically generate and update authorization policies in Envoy, ensuring they stay in sync with the evolving microservice environment. The approach combines live traffic inspection, authentication details extraction, and policy generation via Apache Spark, providing a foundation for least-privilege access enforcement at the layer 7 level.

Rogue No More: Securing Kubernetes with Node-Specific Restrictions - Anish Ramasekar, James Munnelly

This talk presents a comprehensive approach to securing Kubernetes by introducing node-specific restrictions and validating admission policies. The speakers demonstrate how to mitigate the risk of privilege escalation and lateral movement within a Kubernetes cluster, leveraging recent advancements in service account token management and admission control mechanisms.

Privacy in the Age of Big Compute - Sal Kimmich, Confidential Computing Consortium, Linux Foundation

The talk explores the concept of confidential computing, which aims to protect sensitive data and computations even during processing, using secure hardware-based primitives. It discusses various use cases, including protecting data in regulated environments, enabling human rights initiatives, and ensuring data sovereignty, highlighting the importance of privacy in the age of big compute.

SPIFFE the Easy Way: Universal X509 and JWT Identities Using cert-manager - Tim Ramlot, Ashley Davis

This talk explores an alternative approach to implementing the SPIFFE (Secure Production Identity Framework for Everyone) identity standard using the Cert-Manager project in Kubernetes. The presenters demonstrate how to leverage Cert-Manager's features to simplify the setup and management of SPIFFE identities, including the ability to issue both X.509 and JWT-based identities, and integrate with cloud provider authentication mechanisms.

Seccomp and eBPF; What’s the Difference? Why Do I Need to Know? - Natalia Reka Ivanko, Duffie Cooley

The presentation explores the differences between Seccomp and eBPF, two powerful kernel-level technologies used for securing and observing Linux systems. It highlights the strengths and limitations of each approach, as well as how they can be used together to provide a comprehensive security solution for modern, dynamic container-based environments.

SPIFFE Deployments in Non-Kubernetes Environments - Nadin El-Yabroudi & Eli Nesterov, SPIRL

This talk provides an overview of SPIFFE deployments in non-Kubernetes environments, highlighting the differences and considerations when using SPIFFE in Linux-based systems compared to Kubernetes. The speakers discuss the key concepts of SPIFFE, such as SPIFFE IDs and attestation, and how they can be implemented and leveraged in various deployment scenarios.

The Policy Engines Showdown - G.L. Manor, A. Aguiar, O. Gazitt, P. Jamin, T. Schade, J. Scharmen

This panel discussion explores the various policy engines available for authorization, including Open Policy Agent, Open FGA, Cedar, and Topaz. The panelists discuss the trade-offs between data-driven and policy-driven approaches, centralized vs. decentralized deployment, stateful vs. stateless architectures, verification vs. correctness, and other key considerations when choosing a policy engine for your use case.

Why Perfect Compliance Is the Enemy of Good Kubernetes Security - Michele Chubirka, Google

The presentation discusses the challenges of achieving perfect compliance in Kubernetes security and emphasizes the importance of focusing on key security principles and architectural decisions to build a more secure and resilient platform. The speaker provides practical advice on navigating identity management, multi-tenancy, container security, and monitoring, emphasizing the need for collaboration between security and platform teams.

Pushing Authorization Further: CEL, Selectors and Maybe RBAC++ - Mo Khan, Rita Zhang, Jordan Liggitt

The talk covers recent and upcoming features in Kubernetes authorization, including structured authorization and authentication configuration, label-based authorization, and efforts to further restrict service account permissions. The presenters also discuss an experimental approach called 'Conditional RBAC++' that aims to provide a more declarative way to express fine-grained authorization policies.

Working Together to Improve Security Visibility in Kubernetes - Rita Zhang & Jeremy Rickard

This talk discusses how different Kubernetes Special Interest Groups (SIGs) and committees work together to improve security visibility and responsiveness to vulnerabilities. It highlights the key roles of the Security Response Committee, SIG Release, SIG Security, and the Kubernetes bug bounty program in the vulnerability disclosure and patching process.

Running a Highly Available Identity and Access Management with Keycloak - R. Emerson, K. Akella

The presentation covers the journey of making Keycloak, an open-source identity and access management solution, highly available across multiple availability zones. It discusses the challenges faced, the architectural solutions implemented, and the future roadmap for enhancing Keycloak's resilience and scalability.

Tutorial: Stop Kubernetes' Revolving Door: A Hands-on Tutoria... S. Raghunathan, R. Lejano, M. Tardy

This tutorial provides a hands-on guide to securing a Kubernetes cluster, covering topics such as authentication and authorization, network security, pod security, and pod placement. The presentation covers best practices and practical steps to reduce the attack surface and ensure the security of your Kubernetes environment.

Open Policy Agent (OPA) Intro, Deep Dive & V1.0 Update - Charlie Egan, Styra & Rita Zhang, Microsoft

This video provides an introduction to Open Policy Agent (OPA), a general-purpose policy engine for building policy-as-code solutions across the stack. It covers the key features of OPA, including its policy language, policy server, and decision logging, as well as an overview of the upcoming OPA 1.0 release and the Gatekeeper project, which integrates OPA with Kubernetes.

SIG Auth & SIG Storage: Secret Guardians - (Secrets Store) CSI Driver and Sync Controller | PLT

The video discusses the Secret Store CSI Driver and the new Secret Store Sync Controller, two projects that help manage secrets securely in Kubernetes. The Secret Store CSI Driver allows Kubernetes to mount multiple secrets from external secret stores into pods, while the Sync Controller decouples the syncing of secrets to Kubernetes secrets, providing offline support and reduced rate limits.

Eraser: Cleaning Up Vulnerable Images from Kubernetes Nodes | Project Lightning Talk

Eraser is a CNCF Sandbox project that helps remove non-running, vulnerable images from Kubernetes nodes. The tool provides customizable control over the removal process, including scheduling, image scanning, and targeted cleanup, addressing common issues with stale and vulnerable container images in Kubernetes clusters.

Harbor: Harbor and the World of SBOMs | Project Lightning Talk

This talk introduces Harbor, an open-source project that provides a registry for container images and other artifacts, and discusses the importance of Software Bill of Materials (SBOMs) in software supply chain security and transparency. The speaker demonstrates how Harbor integrates with the SBOM generation tool Trivy, allowing users to easily create and manage SBOMs for their container images.

Kyverno: Level Up Your Cluster - 5 Kyverno Policies You Need Now! | Project Lightning Talk

The video presents five key Kyverno policies that can significantly enhance the management and security of Kubernetes clusters. These policies cover areas such as pod security, just-in-time provisioning of critical resources, image verification, dynamic resource tuning, and automated resource cleanup, providing a comprehensive approach to cluster governance and optimization.

Copa: Project Copacetic - Directly Patch Container Image Vulnerabilities | Project Lightning Talk

Copa is a CNCF Sandbox project that directly patches container image vulnerabilities by updating outdated packages using a scan report. The project has added new features like the Grype scanner plugin, the ability to discard patch layers, and integrations with Docker Desktop and GitHub Actions.

OpenFGA: The Cloud Native Way to Implement Fine Grained Authorization | Project Lightning Talk

OpenFGA is a cloud-native authorization system that implements relationship-based access control, inspired by Google's Zanzibar system. The talk introduces OpenFGA's key features, including its ability to define complex authorization policies that traverse resource hierarchies, and highlights its adoption by various open-source projects.

Falco: Evolution of Real Time Cloud Security with Falco | Project Lightning Talk

Falco, a graduated project from the CNCF, is an open-source runtime security solution that monitors infrastructure, including hosts, Kubernetes, and containers, for security events. The presentation highlights Falco's evolution, including its simple rule language, integration capabilities, and the recent addition of Falco Talon, a response engine that allows users to take automated actions in response to detected security events.

Lightning Talk: Effortless, Sidecar-Less Mutual TLS and Rich Authorization Policies up and... L. Sun

This talk demonstrates how to achieve effortless, sidecar-less mutual TLS and rich authorization policies using the Ambient project. The speaker showcases a live demo that automatically configures mutual TLS and applies layer 7 authorization policies without modifying the client or server applications.

Securing Outgoing Traffic: Building a Powerful Internet Egress Gateway for Re... E. Yang, A. Agarwal

This talk presents the design and implementation of a powerful internet egress gateway at Airbnb, which leverages Envoy proxy to provide secure and transparent outgoing traffic control. The speakers explore different architectural approaches, highlighting the challenges and trade-offs, and ultimately describe their hybrid solution that combines a transparent HTTPS egress and an HTTP CONNECT egress to address the organization's security and observability requirements.

Serverless

Why Serverless Is Trending Again - Matt Butcher, Fermyon & Jay Jenkins, Akamai

The talk explores the resurgence of serverless computing, highlighting how the evolution of the computing continuum, from edge to cloud, has enabled new possibilities for building and deploying serverless applications. The presenters discuss the challenges and opportunities of this new paradigm, including the development tool 'Spin' and the 'Spin Cube' platform that facilitate selective deployments and the ability to run serverless functions across diverse environments.

0.1 to 1.16: How Has Knative Fulfilled Its Vision? - Calum Murray & Evan Anderson

Knative, an open-source project, has evolved significantly since its initial release over six years ago. The talk covers the project's journey, its key components (Serving, Eventing, and Functions), and how they have addressed the requirements of modern serverless and event-driven architectures, while also highlighting the project's recent advancements and future plans.

Storage

Longhorn: Intro, Deep Dive and Q+A - Phan Le, SUSE

Longhorn is a reliable, scalable, and easy-to-use storage solution for Kubernetes stateful workloads. The presentation covers Longhorn's architecture, features, and performance, highlighting the upcoming V2 data engine with improved performance and new capabilities.

Sustainability

The Spice Must Flow Green: CNCF's Environmental Sustainability TAG - M. Warnicke (Weston), S. Pathak

The video discusses the CNCF's Environmental Sustainability Technical Advisory Group (TAG), which aims to advocate, develop, and support sustainability initiatives in the cloud-native ecosystem. The talk covers the TAG's mission, landscape of sustainability tools and projects, and various working groups focused on areas like sustainability measurement, advocacy, and AI.

Cloud Native Sustainability Speedrun: Tools from Infrastructure to Applicati... S. Pathak, S. Narang

This talk discusses the important role of the software industry in sustainability, highlighting the significant carbon emissions associated with cloud-native infrastructure and applications. It introduces several open-source tools and projects from the CNCF ecosystem that can help enterprises observe, measure, and reduce their environmental impact, including Kepler, Cube Green, and V-Cluster.

Kepler: How's Things Going in Kepler? | Project Lightning Talk

The Kepler project, a collaboration between Red Hat and IBM, is a new initiative to measure energy consumption at the container, pod, and application level in real-time, enabling further optimizations. The presentation highlights the project's growth, community engagement, and ongoing research efforts to improve power models and expand the energy matrix for various hardware devices.

Uncategorized

Day 2 - KubeCon + CloudNativeCon North America Highlights

The conference has been a resounding success, offering attendees the opportunity to attend keynotes, showcases, and discussion forums, as well as connect with maintainers and contribute to open-source projects. The connections made at KubeCon + CloudNativeCon North America have been invaluable, fostering a collaborative environment and elevating the careers of attendees.

KubeCon + CloudNativeCon NA 2024 - Salt Lake City

Table of Contents

AI & ML

API & ML

Accessibility

Community

Compute

Containers

Contributor Experience

DNS

Data & Analytics

Database

Developer Experience