Model
Size
Quantization
Memory Fit
Status
INTELLECT-3-MoE
Prime Intellect · 100B+ MoE
FP4
Loaded
Hermes-3-70B
Nous Research · Llama 3.1
FP8
Cached
NousCoder-14B
Nous Research · Qwen3-14B RL
FP4
Cached
Llama-3.3-70B-Instruct
Meta · Llama 3.3
Q4_K_M
On disk
GPT-OSS-120B
NVIDIA · Nemotron
FP4
On disk
DeepSeek-R1-70B
DeepSeek · Reasoning
FP8
On disk
Hermes-Agent-7B
Nous Research · Agentic
BF16
Cached
NuminaMath-QwQ-CoT-5M
Prime Intellect · Reasoning Traces
Parquet
On disk
Atropos-RL-24K
Nous Research · Competitive Programming
JSONL
On disk
Prime Intellect
INTELLECT-3-MoE
100B+ parameter Mixture-of-Experts reasoning model. State-of-the-art on math, code, science & reasoning. Trained via decentralized RL on PRIME-RL.
Specs
Benchmarks
Config
Files
Architecture
Parameters
103B
Active Params
14B
Experts
64 / 8
Context
128K
Vocab Size
152K
Layers
64
Memory Fit · GB10
Model + KV Cache (8K ctx)
52 / 128 GB · 40.6%
0
32
64
96
128 GB
Fine-tune (LoRA r=16, bs=4)
89 / 128 GB · 69.5%
0
32
64
96
128 GB
Spark Compatibility
FP4 Tensor Cores
✓ native
Memory Capacity
✓ 52/128 GB
Bandwidth Bound
⚠ 273 GB/s
2× Spark Cluster
✓ FP8 capable
Benchmarks
MATH-500
91.2
LiveCode v6
74.1
GPQA
68.4
MMLU-Pro
82.7
ARC-C
95.3
Quick Serve Config
# spark-control auto-generated
backend: sglang
model: /models/intellect-3-moe-fp4
quantization: fp4
tensor_parallel: 1
max_model_len: 8192
gpu_memory_utilization: 0.85
api_compat: openai
port: 8000