Philip Kiely

Lead Developer Advocate

Philip Kiely

AI engineering

CI/CD for AI model deployments

Vlad Shulman

Samiksha Pal

Sid Shanker

Philip Kiely

Vlad Shulman

3 others

CI/CD for AI models

AI engineering

Streaming real-time text to speech with XTTS V2

Het Trivedi

Philip Kiely

Het Trivedi

1 other

Streaming TTS

Model performance

Continuous vs dynamic batching for AI inference

Matt Howard

Philip Kiely

Matt Howard

1 other

Continuous vs Dynamic batching

Infrastructure

Using fractional H100 GPUs for efficient model serving

Matt Howard

Vlad Shulman

Pankaj Gupta

Philip Kiely

Matt Howard

3 others

H100 MIGs

Model performance

Benchmarking fast Mistral 7B inference

Abu Qader

Pankaj Gupta

Philip Kiely

Abu Qader

3 others

Mistral 7B

Model performance

33% faster LLM inference with FP8 quantization

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

Faster inference with FP8

Model performance

High performance ML inference with NVIDIA TensorRT

Philip Kiely

Justin Yi

1 other

NVIDIA TensorRT

Model performance

FP8: Efficient model inference with 8-bit floating point numbers

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

8-bit floating point numbers

Infrastructure

The benefits of globally distributed infrastructure for model serving

Phil Howes

Philip Kiely

Phil Howes

1 other

Benefits of global infra

1 2 3 4 5...8

Explore Baseten today

Start deploying

Talk to an engineer