spark-a1b2.local · GB10
GPU Compute
87% Blackwell SM
Unified Memory
94.2GB / 128 GB
Mem Bandwidth
218GB/s / 273 GB/s
GPU Temperature
67°C throttle: 95°C
Power Draw
89W TDP: 120W
Tensor Core · FP4
812TOPS / 1000 TOPS

System Timeline

GPU Compute Utilization 87%
Unified Memory Usage 94.2 / 128 GB
model load
Memory Bandwidth (LPDDR5x) 218 GB/s
273 GB/s ceiling
Inference Throughput 308 tok/s
server start

GPU Processes

5 active
Process GPU % Memory Tok/s
sglang.server
PID 4821 · INTELLECT-3-MoE
72%
52.1 GB 308
python: train.py
PID 5102 · llama-finetune run-4
12%
38.4 GB
cudf.pandas
PID 5340 · metric query
3%
2.8 GB
jupyter-lab
PID 3201 · idle
0%
0.4 GB
DGX Dashboard
PID 1820 · system
0%
0.1 GB

CUDA Kernel Timeline

sglang · INTELLECT-3
SM Compute
mha_fwd
gemm
silu
mha_fwd
gemm
rmsnorm
mha_fwd
gemm
Tensor Core
fp4_mma
fp4_mma
fp4_mma
fp4_mma
fp4_mma
fp4_mma
Memory
D2D
KV$
D2D
KV$
D2D
KV$
D2D
KV$
Copy Engine
H2D
D2H
Decode
sample
sample
sample
sample
sample
Compute Attention MemCopy NCCL Idle
spark-a1b2.local · GB10 Grace Blackwell · DGX OS 7.4.0 Uptime: 14d 3h 22m SSD: 1.8 / 4.0 TB · SMART: OK ConnectX-7: 200 Gbps (QSFP not connected) Polling: 500ms · nvidia-smi over SSH Spark Pulse v0.1.0