Lead Developer Advocate
NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference
This article compares two popular GPUs—the NVIDIA A10 and A100—for model inference and discusses the option of using multi-GPU instances for larger models.
SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion optimization
SDXL 1.0 initially takes 8-10 seconds for a 1024x1024px image on A100 GPU. Learn how to reduce this to just 1.92 seconds on the same hardware.
Build your own open-source ChatGPT with Llama 2 and Chainlit
Llama 2 rivals GPT-3.5 in quality and powers ChatGPT. Chainlit helps build ChatGPT-like interfaces. This guide shows creating such interfaces with Llama 2.
Build a chatbot with Llama 2 and LangChain
Build a ChatGPT-style chatbot with open-source Llama 2 and LangChain in a Python notebook.
Deploying and using Stable Diffusion XL 1.0
Deploy Stable Diffusion XL 1.0 for free to generate images from text prompts and invoke Stable Diffusion with the Baseten Python client.
Three techniques to adapt LLMs for any use case
Prompt engineering, embeddings, vector databases, and fine-tuning are ways to adapt Large Language Models (LLMs) to run on your data for your use case
Understanding NVIDIA’s Datacenter GPU line
This guide helps you navigate NVIDIA’s datacenter GPU lineup and map it to your model serving needs.
Comparing GPUs across architectures and tiers
So what are reliable metrics for comparing GPUs across architectures and tiers? We’ll consider core count, FLOPS, VRAM, and TDP.