News
google/gemma-4-12 B-it API & Inference Endpoint
2+ hour, 32+ min ago (810+ words) GLM-5. 2 is live. #1 throughput on Open Router, pay-per-token on Friendli AI. Try it today " Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Talk with our engineer to get a quote for reserved GPU…...
google/gemma-4-12 B API & Inference Endpoint
1+ day, 12+ hour ago (808+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Talk with our engineer to get a quote for reserved GPU instances…...
moonshotai/Kimi-K2. 7-Code API & Inference Endpoint
2+ day, 11+ hour ago (350+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Run this model inference with full control and performance in your environment....
deepseek-ai/Deep Seek-V4-Flash API & Inference Endpoint
3+ week, 2+ day ago (213+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Talk with our engineer to get a quote for reserved GPU instances…...
Run NVIDIA's Most Powerful Open Reasoning Model on Day 0 " Nemotron 3 Ultra on Friendli AI
3+ week, 1+ day ago (745+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model built for long-running autonomous agents. As part of the NVIDIA Nemotron family of open models for agentic…...
Deploy Kimi K2. 6 on Dedicated Endpoints
4+ week, 2+ day ago (908+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Kimi K2. 6 is built for the next phase of AI applications: not just chat, but autonomous coding, long-running agent workflows, multimodal understanding, and coordinated task execution. Developed by…...
Mini Max AI/Mini Max-M2. 5 - Fast, Reliable, and Scalable Inference on Friendli AI
4+ week, 2+ day ago (590+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Run this model inference with a simple API call. Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Run this model…...
Friendli AI San Francisco Office
1+ mon, 2+ week ago (409+ words) Scale on Friendli AI and get up to $50 K inference credit! " Apply now "San Francisco is the epicenter of AI innovation, and a deeper presence here lets us partner with the customers and developers shaping what comes next," said Friendli…...
Gemma-4-31 B-it API on Friendli AI: #1 Output Speed & Response Time
1+ mon, 2+ week ago (674+ words) Scale on Friendli AI and get up to $50 K inference credit! " Apply now Gemma-4-31 B-it is the largest of the Gemma 4 open-weight model family by Google Deep Mind. The model is live on Friendli AI, and our Model API delivers…...
Customer use case: LG AI Research Powers K-EXAONE Production Deployment with Friendli AI
1+ mon, 3+ week ago (518+ words) Scale on Friendli AI and get up to $50 K inference credit! " Apply now Moving K-EXAONE from research into real-world deployment meant finding an inference platform that could hold up under production demands, one that was fast, cost-efficient, and flexible enough…...