PyTorch Conference 2024

AI & ML (29 videos)
Compute (14 videos)
Computer Vision (1 videos)
Developer Experience (24 videos)
Ethics (1 videos)
Keynote (8 videos)
Performance Engineering (10 videos)
Security (1 videos)

AI & ML

Keynote: Building an Advanced Knowledge Assistant - Jerry Liu, Co-Founder & CEO, LlamaIndex

The talk discusses the key components for building an advanced knowledge assistant, including high-quality data and retrieval modules, agentic reasoning, and agentic decision-making and output generation. The speaker also highlights the importance of deploying these agents in production using the right infrastructure and architecture to ensure scalability, standardized communication, and support for features like human-in-the-loop and observability.

Unlocking the Enigma: Crafting Unbiased, Transparent, and Explainable Large Languag... Rashmi Nagpal

The talk discusses the challenges and approaches to crafting unbiased, transparent, and explainable large language models. It highlights the need for responsible AI development, emphasizing the importance of fairness evaluation, model explainability, and mitigating biases and hallucinations in these powerful language models.

Building PyTorch Computer Vision Algorithms for 100 Skin Shades - Emmanuel Acheampong, roboMUA

This talk discusses the use of PyTorch for building computer vision algorithms that can accurately handle a wide range of skin tones, from very fair to very dark. The speaker, Emmanuel Acheampong, co-founder of the company yShade.AI, shares their approach to curating a large dataset of 12 million skin tone images and leveraging PyTorch's flexibility and community support to develop predictive and generative models for various beauty and fashion applications.

Keynote Panel Discussion: Scaling & Benchmarking

The panel discussion explores the challenges and opportunities in scaling and benchmarking large language models, including the need for flexible and adaptive evaluation tools, the race for computational resources, and the potential societal impact of these rapidly advancing AI systems. The panelists offer diverse perspectives on the path forward, highlighting the importance of measurement, innovation, and community collaboration in navigating this rapidly evolving landscape.

Lightning Talk: Empowering Developers: Tools and Resources for Running Generative A... Pareena Verma

This talk explores the tools and resources available to developers for running generative AI on ARM-based CPUs. It highlights ARM's efforts to empower developers by providing learning resources, software advancements, and optimized performance for AI inference on ARM-based cloud and edge devices.

The Rise of `Transformers` in the Growing PyTorch Ecosystem - Arthur Zucker, Hugging Face

The talk covers the history and philosophy of the Transformers library in the PyTorch ecosystem, highlighting the transition from a PyTorch-centric focus to a more modular and community-driven approach. The speaker discusses the challenges faced with the rapid growth of models and the introduction of a new modular inheritance-based approach to address these issues while preserving the core principles of the library.

Lightning Talk: FlexAttention - The Flexibility of PyTorch + The Performa... Yanbo Liang & Horace He

FlexAttention is a new API that aims to provide the flexibility of PyTorch with the performance of flash attention, allowing for the implementation of various attention variants in a few lines of code while still achieving high performance. The API introduces a 'score mod' function that enables the implementation of a wide range of attention mechanisms, including relative positional encodings, soft capping, causal masking, and document-level masking, all of which are efficiently lowered into fused CUDA kernels.

Lightning Talk: LLMs on Edge with AI Accelerators - Chen Lai, Kimish Patel & Cemal Bilgin, Meta

This talk discusses the challenges and solutions for running large language models (LLMs) on edge devices with AI accelerators. The speakers cover techniques like quantization, memory optimization, and hardware acceleration with partners like Apple, Qualcomm, and MediaTek to enable efficient LLM execution on a wide range of edge devices.

Meta Llama 3 and the Future of Responsible AI Development - Spencer Whitman & Vincent Gonguet, Meta

Meta presents its open-source Llama 3 language model and the tools it has developed to enable responsible AI development, focusing on providing powerful models, comprehensive safety testing, and easy-to-use safeguard systems for developers. The talk covers the trade-offs between open-sourcing and responsible development, the technical details of Meta's approach, and a live demonstration of how to deploy a secure coding assistant using Llama and the accompanying safety tools.

Building Scientific Computing Infrastructure Software with the PyTorch Ecosystem - Bharath Ramsundar

The talk discusses the development of scientific computing infrastructure software using the PyTorch ecosystem. The speaker highlights the challenges faced by the Deep Chem community in transitioning from legacy frameworks to PyTorch, the importance of long-term governance and stability in scientific software, and the unique requirements of scientific machine learning applications.

Running State-of-Art Gen AI Models on-Device with NPU Acceleration - Felix Baum, Qualcomm

This talk discusses the challenges and solutions in running state-of-the-art generative AI models on mobile devices using Qualcomm's hardware acceleration and software stack. It highlights the importance of optimizing models for on-device deployment, leveraging techniques like quantization, weight sharing, and block quantization to balance performance, accuracy, and power consumption.

Welcome to the PyTorch Ecosystem for LLM Fine-tuning Mini Summit - Kartikay Khandelwal, Meta

This video is a summary of the PyTorch Ecosystem for LLM Fine-tuning Mini Summit, featuring Kartikay Khandelwal from Meta. The speaker introduces the event, highlighting the importance of LLMs and the fine-tuning community, and introduces the first speaker, Joe, who will be discussing LLaMa.

The State of the Llama Ecosystem - Joe Spisak, Meta

The talk provides an overview of the state of the Llama ecosystem, highlighting the rapid adoption and evolution of the Llama language model. It discusses the various releases, features, and community engagement around Llama, as well as the efforts to turn Llama into a more agentic and open-source system.

Panel Discussion - T. Dettmers, H. Schoelkopf, A. Chowdhery, A. Conneau, Moderated by K. Khandelwal

The panel discussed the current trends and future directions in large language models (LLMs) and multimodal AI systems. Key topics included the importance of test-time compute, the rise of open-source models, the potential for multimodal AI to revolutionize human-computer interaction, and the challenges of evaluating and deploying these complex systems.

State of PyTorch - Ji Li & Damien Sereni, Meta

The talk provides an overview of the growth and development of PyTorch, highlighting its widespread adoption, active contributor community, and the core maintainers' efforts. The presenters also outline the upcoming features and roadmap for PyTorch, focusing on performance improvements, support for large language models, and advancements in the PyTorch ecosystem.

Lightning Talk: Low Precision Dtypes in PyTorch - Vasiliy Kuznetsov, Meta

This talk discusses the challenges of expressing low-precision data types in existing tensor systems, and how tensor subclasses in PyTorch can be used to build custom low-precision data types with flexible operations and conversions. The speaker also introduces the Torch AO library, which provides native support for low-precision workflows and various optimization techniques for both inference and training.

Lightning Talk: Distributing a Million Open Models in the Wild: Lessons Learned f... Omar Sanseviero

This lightning talk discusses the challenges and lessons learned in distributing over 1 million open machine learning models through the Hugging Face Hub. The speaker covers topics such as model security, usage trends, and strategies for improving model adoption and long-term usage.

ExecuTorch Beta and on-Device Generative AI Support - Mergen Nachin & Mengtao (Martin) Yuan, Meta

ExecuTorch Beta and on-Device Generative AI Support is a solution for deploying PyTorch models on mobile and edge devices, focusing on Android, iOS, and embedded devices. The talk covers the challenges of deploying AI models on devices, the features of ExecuTorch, and demonstrations of running large language models like Llama and multimodal models like Lava on mobile devices.

The Impact and Challenges of Open Source Generative Datasets and Models - Aaron Gokaslan

This talk explores the impact and challenges of open-source generative datasets and models, highlighting the importance of accessibility, transparency, and innovation in the field of AI. The speaker discusses strategies to make open-source models more accessible, efficient, and legally compliant, including the development of the Common Canvas and Common Catalog initiatives.

Lightning Talk: Debiasing the Data Lifecycle - Shailvi Wakhlu, Shailvi Ventures LLC

This lightning talk discusses the importance of addressing data bias throughout the data lifecycle, from data collection to decision-making and application. The speaker outlines various types of data bias, the consequences of ignoring them, and the core principles and practical steps to debias the data lifecycle.

Lightning Talk: d-Matrix LLM Compression Flow Based on Torch.Fx: Simplify... Zifei Xu & Tristan Webb

DMX Compressor is a user-friendly model compression toolkit focused on the fake quantization of large language models, built on top of Torch.FX. It simplifies the process of post-training quantization and quantization-aware training, addressing the challenges of using FX for quantizing LLMs, such as the loss of model attributes and methods, and the dynamic nature of LLM computation graphs.

Training MoEs at Scale with PyTorch - Mihir Patel & Brian Chu, Databricks

This talk discusses techniques for training large-scale Mixture of Experts (MoE) models using PyTorch, including expert parallelism, hybrid sharded data parallelism, and techniques for handling failure modes and scaling to thousands of GPUs. The presenters share their learnings and open-source tools developed at Databricks to address the challenges of training large AI models at scale.

Compute

Keynote: Enabling Generative AI on the Edge - Cormac Brick, Principal Engineer, Google

The talk highlights the advancements in enabling generative AI on the edge, including the increasing compute power on mobile devices, the rapid progress in smaller generative AI models, and the tools and frameworks developed by the PyTorch community to streamline the deployment of these models on edge devices. The speaker showcases the AI Edge Torch library and the Model Explorer tool, which facilitate the development and optimization of generative AI applications for the edge.

Keynote: Ray: A Distributed Framework for Heterogeneous Computing - Ion Stoica, UC Berkeley

The talk discusses Ray, a distributed framework for heterogeneous computing, which aims to simplify the scaling of AI workloads on complex, heterogeneous infrastructures. The speaker highlights Ray's key features, such as its computation model, support for heterogeneous resources, and recent advancements like Accelerated Dynamic DAG Graphs (AEX) to improve the efficiency of smaller tasks and GPU-to-GPU data transfers.

Lightning Talk: Optimized PyTorch Inference on aarch64 Linux CPUs - Sunita Nadampalli, Amazon (AWS)

This talk discusses the optimization techniques used to improve PyTorch inference performance on aarch64 Linux CPUs, particularly on AWS Graviton processors. The speaker covers the hardware features, software optimizations, and benchmarks that demonstrate up to 3.5x performance improvements for eager mode and 1.5-2x improvements for TorchScript models.

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

This talk provides a comprehensive overview of the key concepts and challenges involved in running large language model (LLM) inference at scale. The speaker covers topics such as tokenization, attention mechanisms, memory management, and optimization techniques, offering practical insights and recommendations for effectively deploying and monitoring LLM inference systems.

Torchtitan: Large-Scale LLM Training Using Native PyTorch 3D Parallel... Wanchao Liang & Linsong Chu

Torchtitan is a large-scale LLM training library built on native PyTorch 3D parallelism, addressing key challenges in scaling language models. The presentation showcases Torchtitan's composable parallelism techniques, training efficiency optimizations, and ease of adoption demonstrated through a collaboration with IBM Research.

Slaying OOMs - Mark Saroufim & Jane Xu, Meta

This talk presents a comprehensive set of techniques to address the common problem of out-of-memory (OOM) errors during model training. The speakers cover strategies for optimizing parameters, optimizer state, and activations, as well as the use of data parallelism and tensor sharding, to maximize memory usage and enable larger batch sizes for improved throughput.

Intel GPU in Upstream PyTorch: Expanding GPU Choices and Enhancing Back... Eikan Wang & Min Jean Cho

The presentation introduces the integration of Intel GPU support in PyTorch, enabling users to access a wider range of GPU hardware choices and providing a more convenient way to utilize Intel GPUs on both Windows and Linux platforms. The speaker highlights the key pillars of the integration, including runtime support, eager mode, and TorchScript compilation, and discusses the ongoing efforts to address the challenges of device-specific runtime APIs and achieve runtime generalization for better support of new hardware.

Lightning Talk: Making the Most of Heterogeneous Memory Capacity Using PyTorch - Syed Ahmed, NVIDIA

This talk discusses how PyTorch's memory pools can be leveraged to enable the use of different CUDA allocators within the same program, unlocking features like extended GPU memory-based all-gathers and NVLink-based reductions. The speaker emphasizes that evolving computer systems require programming models to evolve, and PyTorch's memory pool abstraction can help accommodate these changes.

Torch.Compile for Autograd, DDP and FSDP - Will Feng , Chien-Chin Huang & Simon Fan, Meta

The video discusses the integration of PyTorch's Torch.Compile with Autograd, Distributed Data Parallel (DDP), and Fully Sharded Data Parallelism (FSDP) to optimize the performance of distributed training. The presenters explain the challenges and benefits of these integrations, including graph capture, communication-computation overlap, and automated optimization techniques.

Lightning Talk: PyTorch/XLA Auto-Sharding - Yeounoh Chung, Google

This talk introduces PyTorch/XLA Auto-Sharding, a new experimental feature that automatically optimizes the sharding of tensors in distributed PyTorch workloads, leading to significant performance improvements without requiring manual sharding annotations. The speaker demonstrates the benefits of Auto-Sharding on popular language models and the Stable Diffusion training script, while also discussing the trade-offs around increased compilation times.

[HALIDE] A Halide Backend for TorchInductor - Jason Ansel, Meta

This talk presents a new Halide backend for TorchInductor, a PyTorch compiler. The Halide backend aims to serve as a reference for others interested in extending PyTorch Dynamo and TorchInductor with their own kernel-level domain-specific languages, and the speaker discusses the challenges and ongoing work to improve the performance of the Halide backend.

[MLIR] Enabling Composition of Kernels and Compilers - Jacques Pienaar, Google

This talk explores the challenges and solutions in enabling the composition of kernels and compilers for efficient machine learning deployment. It highlights the importance of microkernel design, custom kernel support, and the integration of these components within a compiler pipeline to achieve optimal performance.

[TVM] Universally Deploy Large-language Models via ML Compilation - Tianqi Chen, CMU & OctoAI

The talk discusses the challenges of deploying large language models and proposes a machine learning compilation approach that aims to make the development process more productive and adaptable. The speaker showcases the TVM-based 'M' compiler that enables universal deployment of large language models across diverse hardware platforms, including web browsers, mobile devices, and low-power edge devices.

Computer Vision

Lightning Talk: HieroGlyph2Text: A PyTorch-Powered Pipeline for Automated Egyptian H... Susi Gentsch

This talk presents a PyTorch-powered pipeline for automated translation of Egyptian hieroglyphs from images. The pipeline combines object detection, classification, and language modeling techniques to provide interpretive translations of hieroglyphic imagery, laying the groundwork for egyptologists to share their knowledge and achieve automated, legitimate translation of Egyptian hieroglyphs.

Developer Experience

Keynote: Why You Should Think Twice Before Paying for an Evaluation Tool - Chip Huyen, Voltron Data

The talk discusses the challenges of evaluating AI systems and the importance of understanding the failures before adopting evaluation tools. The speaker emphasizes the need to define clear criteria for success and failure, identify the root causes of failures, and design the system to mitigate those failures before relying on evaluation tools.

Lightning Talk: Fast, Scalable Distributed Training with StreamingDataset - Saaketh Narayan

The talk presents a PyTorch-based streaming dataset, designed for fast and scalable distributed training. The system leverages cloud storage, deterministic sample partitioning, and efficient data loading to enable high-throughput training on large datasets across multiple GPUs.

Implementing a Custom Torch.Compile Backend - A Case Study - Maanav Dalal & Yulong Wang, Microsoft

This talk explores the experience of implementing a custom Torch.Compile backend, highlighting the challenges and benefits encountered during the process. The speaker shares insights into the ease of using Torch.Compile, the surprises encountered during model conversion, and the performance gains achieved through custom kernel fusion and optimization.

Lightning Talk: AOTriton: Ahead of Time Triton Kernel Libraries on ROCm - Jeff Daily, AMD

The presentation discusses AOTriton, a project that leverages the Triton compiler to generate ahead-of-time optimized kernel libraries for the PyTorch framework, with a focus on the Flash Attention kernel. The project aims to expand Triton's use case beyond just-in-time compilation, providing a C++ library for efficient eager-mode execution of attention kernels on AMD GPUs.

vLLM: Easy, Fast, and Cheap LLM Serving for Everyone - Woosuk Kwon & Xiaoxuan Liu, UC Berkeley

The video presents vLLM, an open-source project that provides a fast, easy-to-use, and cost-effective large language model (LLM) serving engine. The project leverages PyTorch to offer a wide range of features, including support for various quantization methods, automatic pre-fetch caching, pipeline parallelism, and speculative decoding, making it a versatile and efficient solution for LLM inference.

Lightning Talk: New Activation Checkpointing APIs in PyTorch - Jeffrey Wan & Horace He, Meta

This presentation introduces new activation checkpointing APIs in PyTorch, including selective activation checkpointing and a memory budget-based optimization approach. The speakers discuss the trade-offs between memory usage and computation speed, and how these new APIs provide more flexibility and automation in managing these trade-offs.

Lightning Talk: PyTorch Release Process - Andrey Talman, Meta

The talk provides an in-depth overview of the PyTorch release process, including the rationale, distribution channels, and the various phases involved, such as feature submission, launch readiness, and post-release activities. The speaker also outlines future plans to improve the release process, such as eliminating dependencies on external projects and enhancing the release testing and validation pipeline.

Lightning Talk: What's New for PyTorch Developer Infrastructure - Sahan Paliskara & Catherine Lee

This lightning talk provides an overview of the PyTorch developer infrastructure team and the tools and processes they have developed to enhance the contributor experience for the PyTorch project. The talk covers new features like log search, test dashboards, target termination, and the migration to the Linux Foundation CI, as well as other organizational and infrastructure changes aimed at making contributing to PyTorch easier and more efficient.

Data-Dependent Shapes in PT2 - Edward Yang, Meta

The talk discusses the challenges of handling data-dependent shapes in PyTorch's new compiler stack, PT2. It presents strategies and techniques for compiling models with data-dependent shapes, including using size-oblivious semantics, patching PyTorch code, and leveraging custom operators.

DL Compiler Panel Discussion - P. Tillet, J. Ansel, J. Pienaar, T. Chen, M. Zolotukhin, P. Wu

The panel discussion explores the evolution of deep learning compilers, highlighting the challenges of maintaining generality while achieving high performance across diverse hardware architectures and model architectures. The panelists share their perspectives on the future directions of ML compilers, emphasizing the need for adaptability, customization, and integration at multiple levels of the deployment stack.

A Distributed Stateful Dataloader for Large-Scale Pretraining - Davis Wertheimer & Linsong Chu

The presenters discuss the development of a distributed, stateful data loader for large-scale pre-training tasks, focusing on key features such as checkpoint-ability, auto-rescalability, and efficient shuffling and streaming capabilities. The data loader is designed to be flexible, extensible, and optimized for performance, addressing common challenges encountered in modern large-scale language model training.

Lightning Talk: Introduction to Torch.Distributed.Pipelining - Howard Huang & Ke Wen, Meta

This talk introduces Torch.Distributed.Pipelining, a new library in PyTorch that enables efficient model parallelism through pipeline parallelism. The library provides a rich set of scheduling algorithms, automatic model cutting capabilities, and support for custom model representations, making it easier to scale large-scale training on distributed hardware.

[TRITON] Maximizing Kernel Development Productivity Under Performance Constraints - Philip Tillet

Triton, a new language developed by the presenter, aims to strike a balance between the flexibility of CUDA and the simplicity of graph compilers, offering increased productivity and performance portability for kernel development. The presenter discusses how Triton has helped their team at Anthropic, allowing researchers to write their own kernels and reducing the maintenance burden on the kernel team.

[MOJO] Lifting PT to New Heights with MAX and Mojo - Mikhail Zolotukhin, Modular

This talk introduces Modular's technology stack, including their custom programming language Mojo and their efforts to build a fast and efficient inference solution for machine learning models. The speaker discusses the challenges of integrating with PyTorch Compile and shares lessons learned, as well as recommendations for developers working on custom backends.

Lightning Talk: Mobile Computational Photography with PyTorch: Low-Light Denoising - Alexis Baudron

This lightning talk provides an overview of mobile computational photography, focusing on low-light denoising using PyTorch. The speaker discusses the challenges of mobile camera hardware, the role of the image signal processor (ISP), and how PyTorch enables rapid iteration on network architectures for tasks like depth-of-field manipulation, super-resolution, and denoising.

Lightning Talk: Extending PyTorch with Custom Python/C++/CUDA Operators - Richard Zou, Meta

The talk discusses how to extend PyTorch with custom Python, C++, and CUDA operators, highlighting the differences between kernels and operators, and the various APIs available for registering custom operators. The speaker also covers the integration of custom operators with PyTorch's subsystems, such as TorchScript and AutoGrad, and the advantages of using Trident kernels over traditional CUDA kernels.

The Challenges of Building an Opinionated Open Source LLM Framework - Wing Lian, Axolotl AI

The presentation discusses the challenges of building an opinionated open-source LLM framework, Axolotl AI, which aims to provide a no-code approach for users interested in fine-tuning large language models. The speaker highlights the importance of focusing on the majority of use cases, leveraging community feedback, and managing dependencies to ensure a reliable and efficient framework.

torchtune: Easy and Accessible Finetuning in Native PyTorch - Evan Smothers, Meta

The video presents an overview of the design and features of the torchtune library, a PyTorch-based tool for efficient and accessible fine-tuning of large language models. It showcases various memory optimization techniques, such as activation checkpointing, reduced-precision optimizers, and tensor offloading, as well as performance improvements through the use of PyTorch Compile and other specialized techniques.

Lightning Talk: A Whirlwind Tour of PyTorch Extension Points - Alban Desmaison, Meta

This talk provides a comprehensive overview of the various extension points available in PyTorch, covering user code, core libraries, the dispatcher, and kernels. The presenter highlights the powerful capabilities of these extension points, including hooks, reparameterization, and custom ops, which enable developers to customize and extend the functionality of PyTorch to suit their specific needs.

Lightning Talk: Building and Supporting the Chinese PyTorch Community: Resources, Tu... Zong Zesheng

The talk presents the efforts to build an inclusive and supportive PyTorch community in China, including creating localized learning materials, engaging the community through various channels, and plans to further enrich the community. The key focus is on reducing the barriers for Chinese developers to access PyTorch resources and fostering a collaborative ecosystem.

Lightning Talk: What’s New in Ex... Angela Yi, Tugsbayasgalan Manlaibaatar, Avik Chaudhuri & Yidi Wu

The talk discusses the evolution of PyTorch's export mechanism, Torch Export, which provides a stable IR representation of PyTorch models for deployment in various environments. The key enhancements include improved tracing, support for dynamic control flow, automatic shape inference, and seamless integration with various runtime platforms.

Lightning Talk: Implementing and Using Iterable Datasets: What Could Go Wrong? - Nicolas Hug, Meta

The talk explores the challenges and potential pitfalls of implementing and using iterable datasets in PyTorch, highlighting issues with parallelism, shuffling, and the need to manage separate random number generator streams. The speaker recommends using existing solutions instead of implementing these complex features from scratch to avoid common mistakes.

Startup Showcase

The video showcases a startup showcase event where eight startups pitch their AI-powered solutions to a panel of venture capitalists. The startups cover a wide range of applications, including teleportation, multilingual language models, AI-assisted product photography, and AI workflow platforms, highlighting the diverse innovations in the AI industry.

PyTorch Conference 2024 Highlights

The PyTorch Conference 2024 showcased the latest advancements in the popular machine learning framework, highlighting the vibrant community and the importance of in-person collaboration. The event provided a valuable opportunity for attendees to connect with industry experts, share insights, and explore new opportunities in the rapidly evolving field of artificial intelligence.

Ethics

Keynote Panel Discussion: Responsible AI - K. Rooney, K. Varshney, S. Hooker, A. Madry, R. Bommasani

The panel discusses the impact of AI technologies, particularly ChatGPT, and the role of academia and regulation in ensuring responsible development of AI. The panelists share their perspectives on the challenges and opportunities of transparency, data sharing, and aligning AI systems with societal values.

Keynote

Keynote: Welcome Back & Opening Remarks

The keynote address welcomes attendees back to the second day of the PyTorch Conference 2024, highlighting the exciting lineup of speakers, the startup showcase, and the PyTorch Flare Party. The address also announces the winner of the poster award and sets the stage for a day filled with engaging presentations and networking opportunities.

Keynote: Open Language Models (OLMo): Accelerating the Science of Language Modeling Hanna Hajishirzi

The talk presents the Open Language Models (OLMo) project, which aims to build a fully open ecosystem for advancing the science of language modeling. The project focuses on open data, training, adaptation, and evaluation, with the goal of providing public literacy about AI and closing the gap to proprietary language models.

Keynote: Welcome & Opening Remarks - Matt White, Executive Director, PyTorch Foundation

The keynote address provides an overview of the PyTorch Conference 2024, highlighting the event's growth, the PyTorch Foundation's mission, and the remarkable expansion of the PyTorch ecosystem. The speaker expresses gratitude to the sponsors, organizers, and the PyTorch community, emphasizing their crucial role in the platform's success and the future of AI innovation.

Keynote: PyTorch Technical Deep Dive - P. Bialecki, P. Wu, W. Constable, K. Khandelwal & M. Yuan

The keynote presentation provides a comprehensive overview of PyTorch's evolution, from its early days to the current 2.4 release, highlighting the framework's core values of simplicity, debuggability, and hackability. The presentation also covers the advancements in PyTorch's compiler technology, distributed training capabilities, fine-tuning workflows, and deployment solutions, showcasing the framework's versatility and its ability to support the growing demands of the AI research and development community.

Keynote: Community Awards

The video presents the 2024 PyTorch Community Awards, celebrating the contributions of various individuals to the PyTorch ecosystem. The awards recognize diverse contributions, including new contributors, code reviewers, problem solvers, innovators, trailblazers, and ecosystem champions, highlighting the breadth and depth of the PyTorch community.

Keynote: Navigating the Architectural Timeline of LLMs - Sebastian Raschka, Lightning AI

This talk provides an overview of the architectural timeline of large language models (LLMs) from GPT-1 to the recent Llama 3.1 release. It highlights the key changes in model size, dataset size, training techniques, and architectural innovations that have shaped the evolution of LLMs over the past six years.

Performance Engineering

Pushing the Performance Envelope: An Optimization Study for 3... Suvaditya Mukherjee & Shireen Chand

This study explores optimization techniques to reduce the time taken for 3D generative modeling from text input. The researchers evaluated various methods, including bfloat16, forward chunking, PyTorch compilation, and quantization, and achieved a significant reduction in the overall processing time.

Lightning Talk: On-Device Profiling and Debugging with ExecuTorch - Olivia Liu & Vaun Puri, Meta

This talk presents the ExecuTorch developer tools, which provide on-device profiling and debugging capabilities for PyTorch models running on constrained edge devices. The tools enable developers to identify performance bottlenecks, analyze numerical accuracy issues, and optimize memory usage, facilitating the deployment of PyTorch models on mobile and embedded platforms.

Lightning Talk: In-Transit Machine Learning Using PyTorch on Frontier Exascale System- Vineeth Gutta

This talk presents a workflow for in-transit machine learning using PyTorch on the Frontier exascale system, a collaborative effort to address the challenges of fast and predictive simulations for next-generation laser plasma accelerators. The key aspects include using in-memory data streaming to bypass disk storage constraints, scaling PyTorch models on AMD GPUs, and extracting physics insights from the simulation data through unsupervised learning techniques.

Lightning Talk: Sparsifying Vision Transformers with Minimal Accuracy Loss - Jesse Cai, Meta

This talk discusses techniques for sparsifying Vision Transformers to achieve significant performance improvements with minimal accuracy loss. The speaker covers topics such as acceleration through sparse kernel optimization, techniques for maintaining accuracy through structured pruning and sparse training, and the benefits of composing sparsity with quantization.

Maximizing Training Throughput Using Torch.Compile and FSDP - L. Chu, A. Viros i Martin, B. Vaughan

This talk discusses techniques for maximizing training throughput using Torch.Compile and FSDP, including optimizing graph breaks, using FSDP, and leveraging activation checkpointing. The speakers will share their practical experience and trade-offs in applying these techniques to their training setup, providing a practical guide for attendees to apply similar optimizations in their own environments.

Together Goes Brrr: Threading Research & Production with Torch Compile - Pragaash Ponnusamy

The talk discusses the importance of inference in modern applications and the challenges involved, such as memory-bound operations and the need for efficient solutions. It then introduces Torch Compile, a tool that aims to address these challenges by performing various types of fusion to improve performance.

TorchInductor CPU Backend Advancements: New Features and Performance Imp... Jiong Gong & Leslie Fang

The talk presents the latest advancements in the TorchInductor CPU Backend, including new features such as CPP vectorized code generation, Max autotune for improved GEM performance, and Windows support. The talk also provides a deep dive into the optimization techniques used, such as analyzing index expressions, aligning vectorization factors, and optimizing loop schedules, to achieve significant performance improvements across various deep learning models.

Hacks to Make LLM Training Faster - Daniel Han, Unsloth AI

The video discusses various techniques to make large language model (LLM) training faster, including bit representation, tensor cores, algorithms, and high-quality data. The speaker also shares hypotheses about future trends, such as the challenges of training with lower-precision representations and the potential limits of hardware improvements.

Lightning Talk: Understanding and Optimizing PyTorch Models with Thunder - Luca Antiga, Lightning AI

The video introduces Thunder, a source-to-source compiler for PyTorch, which facilitates the manual or automatic manipulation of computations to optimize performance. The talk demonstrates how Thunder can be used to replace PyTorch operations with custom Triton kernels, providing developers with fine-grained control over the execution of their models.

Blobs to Clips: Efficient End-to-End Video Data Loading - Andrew Ho & Ahmad Sharif, Meta

This presentation discusses efficient end-to-end video data loading techniques, including improvements to the video decoder, offline decoding, multi-threading, and GPU-accelerated preprocessing, which resulted in up to 20x improvements in training throughput on a video classification task. The speakers share insights and lessons learned, and encourage the audience to explore the open-source tools and upcoming developments in this area.

Security

Lightning Talk: Beyond Zero: Eliminating Vulnerabili... Patrick Smyth, Dan Fernandez & Srishti Hegde

The talk discusses the importance of securing PyTorch container images, which are widely used in AI applications. The presenters explain how Chainyard has created a minimal, zero-vulnerability PyTorch image by building it fresh, patching as needed, and minimizing the package set to reduce the attack surface.