llama.cpp

productllm_inference

Try in Playground →

Overview

Developed byGeorgi Gerganov

Founded2023

LicenseMIT License

Open source✓ Open Source

Primary languageC++

Use caseRunning LLM inference on CPU

Technical

Protocols

GGML format

Integrates with

CUDA

Also see

Alternative to

Hugging Face Transformers PyTorch inference

Based onC++

Knowledge graph stats

Claims43

Avg confidence93%

Avg freshness99%

Last updatedUpdated yesterday

WikidataQ125998452

Trust distribution

100% unverified

Governance

Not assessed

Contribute governance data →

llama.cpp

product

MIT C/C++ library for LLM inference with GGUF support, runs on CPU and GPU

Compare with...

based on

Value	Trust	Confidence	Freshness	Sources
C++	○Unverified	High	Fresh	1
C++ programming language	○Unverified	High	Fresh	1
GGML tensor library	○Unverified	High	Fresh	1
GGML library	○Unverified	Moderate	Fresh	1

supports model

Value	Trust	Confidence	Freshness	Sources
LLaMA	○Unverified	High	Fresh	1
LLaMA models	○Unverified	High	Fresh	1
Alpaca models	○Unverified	High	Fresh	1
Vicuna models	○Unverified	High	Fresh	1
Alpaca	○Unverified	High	Fresh	1
Vicuna	○Unverified	Moderate	Fresh	1
Code Llama	○Unverified	Moderate	Fresh	1
GPT4All	○Unverified	Moderate	Fresh	1

programming language

Value	Trust	Confidence	Freshness	Sources
C++	○Unverified	High	Fresh	1

open source

Value	Trust	Confidence	Freshness	Sources
true	○Unverified	High	Fresh	1

pricing model

Value	Trust	Confidence	Freshness	Sources
free	○Unverified	High	Fresh	1

integrates with

Value	Trust	Confidence	Freshness	Sources
CUDA	○Unverified	High	Fresh	1

primary use case

Value	Trust	Confidence	Freshness	Sources
Running LLM inference on CPU	○Unverified	High	Fresh	1
local inference of large language models	○Unverified	High	Fresh	1
CPU inference for LLaMA models	○Unverified	High	Fresh	1
LLM inference on CPU	○Unverified	High	Fresh	1
CPU-based LLM inference	○Unverified	High	Fresh	1
Local LLM inference with minimal dependencies	○Unverified	High	Fresh	1

supports quantization

Value	Trust	Confidence	Freshness	Sources
true	○Unverified	High	Fresh	1
4-bit quantization	○Unverified	High	Fresh	1

supports platform

Value	Trust	Confidence	Freshness	Sources
Linux	○Unverified	High	Fresh	1
macOS	○Unverified	High	Fresh	1
Windows	○Unverified	High	Fresh	1

quantization support

Value	Trust	Confidence	Freshness	Sources
GGML format	○Unverified	High	Fresh	1

requires

Value	Trust	Confidence	Freshness	Sources
CPU	○Unverified	High	Fresh	1
C++ compiler	○Unverified	High	Fresh	1
quantized model weights	○Unverified	High	Fresh	1
no GPU	○Unverified	High	Fresh	1

platform support

Value	Trust	Confidence	Freshness	Sources
cross-platform	○Unverified	High	Fresh	1

maintained by

Value	Trust	Confidence	Freshness	Sources
Georgi Gerganov	○Unverified	High	Fresh	1

developed by

Value	Trust	Confidence	Freshness	Sources
Georgi Gerganov	○Unverified	High	Fresh	1

license type

Value	Trust	Confidence	Freshness	Sources
MIT License	○Unverified	High	Fresh	1

supports protocol

Value	Trust	Confidence	Freshness	Sources
GGML format	○Unverified	High	Fresh	1

supports hardware

Value	Trust	Confidence	Freshness	Sources
CUDA GPUs	○Unverified	High	Fresh	1
Apple Silicon	○Unverified	High	Fresh	1

uses quantization

Value	Trust	Confidence	Freshness	Sources
4-bit and 8-bit	○Unverified	High	Fresh	1

founded year

Value	Trust	Confidence	Freshness	Sources
2023	○Unverified	Moderate	Fresh	1

alternative to

Value	Trust	Confidence	Freshness	Sources
Hugging Face Transformers	○Unverified	Moderate	Fresh	1
PyTorch inference	○Unverified	Moderate	Fresh	1

Alternatives & Similar Tools

Hugging Face Transformers

alternative to

Compare →

PyTorch inference

alternative to

Compare →

Commonly Used With

CUDA

Related entities

Claim count: 43Last updated: 4/9/2026Edit history