Resources / Events

PyTorch Conference

PyTorch Conference 2025 - San Francisco

We look forward to seeing you at PyTorch!

Experience Baseten’s AI inference platform, offering industry-leading performance, security, and scale for organizations building AI products.

  • Visit us at Booth P2 for a demo and to get your "Artificially Intelligent" T-shirt!

  • Join us for lunch on October 23 at our workshop with Nvidia: Dynamo & Dine. Save your spot here.

  • And don't miss our talk:

    Low-Precision Inference without Quality Loss: Selective Quantization & Microscaling
    Speakers: Pankaj Gupta (Co-founder, Baseten) & Philip Kiely (Head of Developer Relations, Baseten)
    Wednesday October 22, 2025
    3:50pm - 4:15pm PDT
    Room 2005 - 2007

    Everyone wants faster inference, but no one wants to compromise the quality of their model outputs. FP8 quantization offers 30-50% lower latencies for inference on large models, but must be applied carefully to maintain quality. Recently, NVIDIA Blackwell GPUs introduced new microscaling number formats (MXFP8, MXFP4, NVFP4) and new kernel options for low-precision inference. In this talk, Baseten inference engineers will cover practical applications of quantization to quality-sensitive inference tasks with a focus on selecting which parts of the inference system to quantize (weights, activations, KV cache, attention) and how microscaling number formats help preserve dynamic range.

Trusted by top engineering and machine learning teams
Logo
Logo
Logo
Logo
Logo
Logo
Bland AI logo
Logo
Logo
Logo
Logo
OpenEvidence logo
Logo
Logo
Logo
Logo
Logo
Logo
Logo
Latent Health logo
Praktika AI logo
toby
Logo
Logo
Logo
Logo
Logo
Logo
Logo
Logo
Bland AI logo
Logo
Logo
Logo
Logo
OpenEvidence logo
Logo
Logo
Logo
Logo
Logo
Logo
Logo
Latent Health logo
Praktika AI logo
toby
Logo
Logo