News

Friendli AI
friendli. ai > models > google > gemma-4-12 B-it

google/gemma-4-12 B-it API & Inference Endpoint

2+ hour, 32+ min ago  (810+ words) GLM-5. 2 is live. #1 throughput on Open Router, pay-per-token on Friendli AI. Try it today " Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Talk with our engineer to get a quote for reserved GPU…...

Friendli AI
friendli. ai > models > google > gemma-4-12 B

google/gemma-4-12 B API & Inference Endpoint

1+ day, 12+ hour ago  (808+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Talk with our engineer to get a quote for reserved GPU instances…...

Friendli AI
friendli. ai > models > moonshotai > Kimi-K2. 7-Code

moonshotai/Kimi-K2. 7-Code API & Inference Endpoint

2+ day, 11+ hour ago  (350+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Run this model inference with full control and performance in your environment....

Symbols: btc-usd
Friendli AI
friendli. ai > models > deepseek-ai > Deep Seek-V4-Flash

deepseek-ai/Deep Seek-V4-Flash API & Inference Endpoint

3+ week, 2+ day ago  (213+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Talk with our engineer to get a quote for reserved GPU instances…...

Symbols: nasdaq:peng
Friendli AI
friendli. ai > blog > nvidia-nemotron-3-ultra

Run NVIDIA's Most Powerful Open Reasoning Model on Day 0 " Nemotron 3 Ultra on Friendli AI

3+ week, 1+ day ago  (745+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model built for long-running autonomous agents. As part of the NVIDIA Nemotron family of open models for agentic…...

Symbols: nasdaq:nvda
Friendli AI
friendli. ai > blog > kimi-k2-6

Deploy Kimi K2. 6 on Dedicated Endpoints

4+ week, 2+ day ago  (908+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Kimi K2. 6 is built for the next phase of AI applications: not just chat, but autonomous coding, long-running agent workflows, multimodal understanding, and coordinated task execution. Developed by…...

Symbols: btc-usd
Friendli AI
friendli. ai > models > Mini Max AI > Mini Max-M2. 5

Mini Max AI/Mini Max-M2. 5 - Fast, Reliable, and Scalable Inference on Friendli AI

4+ week, 2+ day ago  (590+ words) " Hit your SLA, cut costs. Download the Friendli Guide to Inference Performance Optimization " Run this model inference with a simple API call. Run this model inference on single tenant GPU with unmatched speed and reliability at scale. Run this model…...

Symbols: nasdaq:crwv
Friendli AI
friendli. ai > blog > friendliai-sf-office

Friendli AI San Francisco Office

1+ mon, 2+ week ago  (409+ words) Scale on Friendli AI and get up to $50 K inference credit! " Apply now "San Francisco is the epicenter of AI innovation, and a deeper presence here lets us partner with the customers and developers shaping what comes next," said Friendli…...

Symbols: btc-usd
Friendli AI
friendli. ai > blog > gemma-4-31b-it

Gemma-4-31 B-it API on Friendli AI: #1 Output Speed & Response Time

1+ mon, 2+ week ago  (674+ words) Scale on Friendli AI and get up to $50 K inference credit! " Apply now Gemma-4-31 B-it is the largest of the Gemma 4 open-weight model family by Google Deep Mind. The model is live on Friendli AI, and our Model API delivers…...

Symbols: 018260.ks,005930.ks,rms.pa,nyse:apg
Friendli AI
friendli. ai > customers > lg-ai-research

Customer use case: LG AI Research Powers K-EXAONE Production Deployment with Friendli AI

1+ mon, 3+ week ago  (518+ words) Scale on Friendli AI and get up to $50 K inference credit! " Apply now Moving K-EXAONE from research into real-world deployment meant finding an inference platform that could hold up under production demands, one that was fast, cost-efficient, and flexible enough…...

Symbols: btc-usd