LPU-accelerated inference that returns Llama-70B at 500 tokens/sec. The choice when latency matters more than absolute capability.

From Wikipedia

Groq, Inc. is an American artificial intelligence (AI) company that builds an AI accelerator application-specific integrated circuit (ASIC). The architecture was originally introduced as a Tensor Streaming Processor (TSP) but was later rebranded as a Language Processing Unit (LPU) following the widespread adoption of large language models after the breakthrough of ChatGPT. The company also develops related computer hardware and software to accelerate AI inference performance.

Read on Wikipedia ↗

Open source ↗

01
Lv 1 · Browser0 pts
0 / 100 to Lv 2+1 / 200px scrolled
Theme
Display
Density