Comprehensive AI model performance analysis across hardware platforms
Average Performance Gain
65.9%
Across all models
Average Cost Savings
34.9%
hardware costs
Average Power Efficiency
34.9%
power saving
Models Tested
62
configurations
Peak Throughput
21,015.55
tokens/sec
| Model | Model Type | Category | Hardware | Size | Precision | Without EdgeMatrix | With EdgeMatrix | Improvement | Cost Savings | Power Efficiency |
|---|---|---|---|---|---|---|---|---|---|---|
Shakti-1B Shakti Family | Multimodal | GPU | A100 (80GB) | 1.88GB | FP16 | 16,250 | 21,015.55 | 29.3% | 22.7% | 22.7% |
Qwen2-VL-2B Qwen Family | Multimodal | GPU | A100 (80GB) | 4.419GB | FP16 | 12,690.3 | 19,963.4 | 57.3% | 36.4% | 36.4% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | Dense | GPU | A100 (80GB) | 3.55GB | FP16 | 12,931.42 | 18,104.78 | 40.0% | 28.6% | 28.6% |
Qwen2.5-VL-3B Qwen Family | Multimodal | GPU | A100 (80GB) | 7.51GB | FP16 | 8,234.11 | 13,903.65 | 68.9% | 40.8% | 40.8% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | Dense | GPU | L40s (48GB) | 3.55GB | FP16 | 7,998 | 13,040.77 | 63.1% | 38.7% | 38.7% |
Shakti-2.5B Shakti Family | Dense | GPU | H100 (80GB) | 6.43GB | FP8 | 6,672.57 | 12,403.72 | 85.9% | 46.2% | 46.2% |
Shakti-1B Shakti Family | Multimodal | GPU | L40s (48GB) | 1.88GB | FP16 | 7,570 | 12,064.4 | 59.4% | 37.3% | 37.3% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | Dense | GPU | A100 (80GB) | 1.12GB | Q4 | 9,004.67 | 11,412.41 | 26.7% | 21.1% | 21.1% |
Shakti-2.5B Shakti Family | Dense | GPU | H100 (80GB) | 6.43GB | FP16 | 6,615.88 | 11,110.48 | 67.9% | 40.5% | 40.5% |
Qwen2-VL-2B Qwen Family | Multimodal | GPU | L40s (48GB) | 4.419GB | FP16 | 6,509.44 | 10,659.19 | 63.7% | 38.9% | 38.9% |
Shakti-4B Shakti Family | Multimodal | GPU | A100 (80GB) | 7.42GB | FP16 | 7,564 | 10,099.3 | 33.5% | 25.1% | 25.1% |
Gemma-2-2B-IT Gemma Family | Dense | GPU | A100 (80GB) | 5.14GB | FP16 | 7,256.9 | 9,954.13 | 37.2% | 27.1% | 27.1% |
Gemma-2-2B-IT Gemma Family | Dense | GPU | A100 (80GB) | 1.71GB | Q4 | 8,904.67 | 9,929.41 | 11.5% | 10.3% | 10.3% |
InternVL2.5-4B InternVL Family | Multimodal | GPU | A100 (80GB) | 7.42GB | FP16 | 4,576.06 | 9,542.87 | 108.5% | 52.0% | 52.0% |
Llama-3-8B Llama Family | Dense | GPU | H100 (80GB) | 16GB | FP8 | 6,669.55 | 9,490.71 | 42.3% | 29.7% | 29.7% |
Gemma-3-4B-IT Gemma Family | Multimodal | GPU | A100 (80GB) | 8.6GB | FP16 | 5,798.23 | 9,462.91 | 63.2% | 38.7% | 38.7% |
Qwen2.5-VL-3B Qwen Family | Multimodal | GPU | L40s (48GB) | 7.51GB | FP16 | 1,738.95 | 9,188.68 | 428.4% | 81.1% | 81.1% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | Dense | GPU | L40s (48GB) | 1.12GB | Q4 | 7,011.01 | 8,527.61 | 21.6% | 17.8% | 17.8% |
Llama-3.1-8B Llama Family | Dense | GPU | H100 (80GB) | 16GB | FP16 | 6,473.07 | 7,956.74 | 22.9% | 18.6% | 18.6% |
Shakti-2.5B Shakti Family | Dense | GPU | A100 (80GB) | 6.43GB | FP16 | 6,120.03 | 7,854.42 | 28.3% | 22.1% | 22.1% |
Phi-4-mini-Instruct Phi Family | Dense | GPU | A100 (80GB) | 7.67GB | FP16 | 5,285.01 | 6,870.52 | 30.0% | 23.1% | 23.1% |
Llama-3.2-3B Llama Family | Dense | GPU | A100 (80GB) | 2.02GB | Q4 | 4,014.23 | 6,852.23 | 70.7% | 41.4% | 41.4% |
Phi-4-mini-reasoning Phi Family | Dense | GPU | A100 (80GB) | 7.67GB | FP16 | 3,105.49 | 6,607.25 | 112.8% | 53.0% | 53.0% |
Llama-3.2-3B Llama Family | Dense | GPU | A100 (80GB) | 6.6GB | FP16 | 3,581.4 | 6,362.89 | 77.7% | 43.7% | 43.7% |
Gemma-3-4B-IT Gemma Family | Multimodal | GPU | A100 (80GB) | 8.6GB | FP16 | 4,998.02 | 6,063.09 | 21.3% | 17.6% | 17.6% |
Shakti-4B Shakti Family | Multimodal | GPU | L40s (48GB) | 7.42GB | FP16 | 2,339 | 5,789 | 147.5% | 59.6% | 59.6% |
Gemma-2-2B-IT Gemma Family | Dense | GPU | L40s (48GB) | 1.71GB | Q4 | 4,501 | 5,699.2 | 26.6% | 21.0% | 21.0% |
Shakti-2.5B Shakti Family | Dense | GPU | L40s (48GB) | 6.43GB | FP16 | 3,122.92 | 5,612.24 | 79.7% | 44.4% | 44.4% |
InternVL2.5-4B InternVL Family | Multimodal | GPU | L40s (48GB) | 7.42GB | FP16 | 2,951.75 | 5,569.34 | 88.7% | 47.0% | 47.0% |
Llama-3.2-3B Llama Family | Dense | GPU | L40s (48GB) | 2.02GB | Q4 | 3,163.2 | 5,183.45 | 63.9% | 39.0% | 39.0% |
InternVL2-2B InternVL Family | Multimodal | GPU | A100 (80GB) | 4.41GB | FP16 | 2,074 | 4,970.63 | 139.7% | 58.3% | 58.3% |
Llama-3.1-8B Llama Family | Dense | GPU | A100 (80GB) | 16GB | FP16 | 3,796.37 | 4,875.86 | 28.4% | 22.1% | 22.1% |
Gemma-2-2B-IT Gemma Family | Dense | GPU | L40s (48GB) | 5.14GB | FP16 | 2,862.5 | 4,698.01 | 64.1% | 39.1% | 39.1% |
InternVL2-2B InternVL Family | Multimodal | GPU | L40s (48GB) | 4.41GB | FP16 | 1,959 | 4,684.38 | 139.1% | 58.2% | 58.2% |
Llama-Guard-3-8B Llama Family | Dense | GPU | A100 (80GB) | 16.07GB | FP16 | 3,528.9 | 4,517 | 28.0% | 21.9% | 21.9% |
Llama-3.2-3B Llama Family | Dense | GPU | L40s (48GB) | 6.6GB | FP16 | 1,676.3 | 4,467.32 | 166.5% | 62.5% | 62.5% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | Dense | GPU | A100 (80GB) | 16.06GB | FP16 | 3,424.03 | 4,417.09 | 29.0% | 22.5% | 22.5% |
Qwen1.5-MoE-A2.7B-Chat Qwen Family | MoE | GPU | A100 (80GB) | 28.63 GB | FP16 | 4,283.78 | 4,378.79 | 2.2% | 2.2% | 2.2% |
Janus-Pro-7B DeepSeek Family | Multimodal | GPU | A100 (80GB) | 14.84GB | FP16 | 2,654 | 4,235 | 59.6% | 37.3% | 37.3% |
Gemma-2-9B-IT Gemma Family | Dense | GPU | A100 (80GB) | 18.48GB | FP16 | 3,269.69 | 4,184.52 | 28.0% | 21.9% | 21.9% |
Phi-4-mini-Instruct Phi Family | Dense | GPU | A100 (80GB) | 2.49GB | Q4 | 3,121.23 | 4,117.11 | 31.9% | 24.2% | 24.2% |
Phi-4-mini-reasoning Phi Family | Dense | GPU | A100 (80GB) | 2.49GB | Q4 | 3,281.02 | 4,109.32 | 25.2% | 20.2% | 20.2% |
LLaVA-OneVision-Qwen2-7B LLaVA Family | Multimodal | GPU | A100 (80GB) | 16.06GB | FP16 | 1,850 | 4,101.7 | 121.7% | 54.9% | 54.9% |
Gemma-3-4B-IT Gemma Family | Multimodal | GPU | A100 (80GB) | 2.49GB | Q4 | 3,039.12 | 4,007.55 | 31.9% | 24.2% | 24.2% |
Qwen3-4B Qwen Family | Dense | GPU | A100 (80GB) | 2.2GB | Q4 | 3,068.12 | 3,917.55 | 27.7% | 21.7% | 21.7% |
Gemma-3-4B-IT Gemma Family | Multimodal | GPU | L40s (48GB) | 8.6GB | FP16 | 2,220.35 | 3,903 | 75.8% | 43.1% | 43.1% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | Dense | GPU | A100 (80GB) | 4.92GB | Q4 | 3,098.77 | 3,884 | 25.3% | 20.2% | 20.2% |
Qwen3-4B Qwen Family | Dense | GPU | A100 (80GB) | 8.1GB | FP16 | 3,105.59 | 3,837.05 | 23.6% | 19.1% | 19.1% |
Llama-3.1-8B Llama Family | Dense | GPU | A100 (80GB) | 4.92GB | Q4 | 3,081.02 | 3,822.92 | 24.1% | 19.4% | 19.4% |
Gemma-3-12B-IT Gemma Family | Multimodal | GPU | A100 (80GB) | 24.32GB | FP16 | 3,061.5 | 3,801.01 | 24.2% | 19.5% | 19.5% |
Phi-4-multimodal Phi Family | Multimodal | GPU | A100 (80GB) | 11.12GB | FP16 | 1,816 | 3,769.45 | 107.6% | 51.8% | 51.8% |
Qwen3-8B Qwen Family | Dense | GPU | A100 (80GB) | 5.03GB | Q4 | 3,208.87 | 3,709.09 | 15.6% | 13.5% | 13.5% |
Phi-mini-MoE-instruct Phi Family | MoE | GPU | A100 (80GB) | 15.3 GB | FP16 | 3,555.56 | 3,704.66 | 4.2% | 4.0% | 4.0% |
Phi-4-multimodal Phi Family | Multimodal | GPU | L40s (48GB) | 11.12GB | FP16 | 1,564.63 | 3,641 | 132.7% | 57.0% | 57.0% |
Llama-Guard-3-8B Llama Family | Dense | GPU | A100 (80GB) | 4.92GB | Q4 | 2,967.97 | 3,618.06 | 21.9% | 18.0% | 18.0% |
Qwen3-4B Qwen Family | Dense | GPU | L40s (48GB) | 2.2GB | Q4 | 2,991.43 | 3,587.3 | 19.9% | 16.6% | 16.6% |
deepseek-moe-16b-chat deepseek Family | MoE | GPU | A100 (80GB) | 32.77 GB | FP16 | 3,250.71 | 3,556.08 | 9.4% | 8.6% | 8.6% |
Qwen2.5-7B Qwen Family | Dense | GPU | L40s (48GB) | 15.2GB | FP16 | 3,112.53 | 3,554.41 | 14.2% | 12.4% | 12.4% |
Phi-4-mini-Instruct Phi Family | Dense | GPU | L40s (48GB) | 7.67GB | FP16 | 2,116.71 | 3,493 | 65.0% | 39.4% | 39.4% |
Phi-4-mini-reasoning Phi Family | Dense | GPU | L40s (48GB) | 7.67GB | FP16 | 2,013.6 | 3,434.8 | 70.6% | 41.4% | 41.4% |
Llama-3.2-1B Llama Family | Dense | Device | Tesla T4 (16GB) | 808MB | FP16 | 2,793.42 | 3,256.68 | 16.6% | 14.2% | 14.2% |
Gemma-2-9B-IT Gemma Family | Dense | GPU | A100 (80GB) | 5.76GB | Q4 | 2,360.33 | 3,209 | 36.0% | 26.4% | 26.4% |
Qwen3-8B Qwen Family | Dense | GPU | A100 (80GB) | 16.5GB | FP16 | 2,845.2 | 3,159.02 | 11.0% | 9.9% | 9.9% |
Qwen3-8B Qwen Family | Dense | GPU | L40s (48GB) | 5.03GB | Q4 | 2,829.54 | 3,118.23 | 10.2% | 9.3% | 9.3% |
Qwen3-4B Qwen Family | Dense | GPU | L40s (48GB) | 8.1GB | FP16 | 2,696.34 | 3,110.34 | 15.4% | 13.3% | 13.3% |
Llama-3.1-8B Llama Family | Dense | GPU | L40s (48GB) | 4.92GB | Q4 | 2,639.54 | 3,089.13 | 17.0% | 14.6% | 14.6% |
Tiiuae/falcon-7b Falcon Family | Dense | GPU | L40s (48GB) | 14.43GB | FP16 | 2,758.43 | 2,987.55 | 8.3% | 7.7% | 7.7% |
Mistral-7B-v0.1 Ministral Family | Dense | GPU | L40s (48GB) | 14.48GB | FP16 | 2,705.88 | 2,987.04 | 10.4% | 9.4% | 9.4% |
Gemma-3-4B-IT Gemma Family | Multimodal | GPU | L40s (48GB) | 8.6GB | FP16 | 1,749.45 | 2,974.85 | 70.0% | 41.2% | 41.2% |
Openchat-3.6-8b-20240522 OpenChat Family | Dense | GPU | L40s (48GB) | 16.1GB | FP16 | 2,514.41 | 2,920.44 | 16.1% | 13.9% | 13.9% |
Gemma-3-12B-IT Gemma Family | Multimodal | GPU | A100 (80GB) | 7.3GB | Q4 | 2,241.31 | 2,904.86 | 29.6% | 22.8% | 22.8% |
Qwen3-8B Qwen Family | Dense | GPU | L40s (48GB) | 16.5GB | FP16 | 2,428.23 | 2,874.12 | 18.4% | 15.5% | 15.5% |
Ministral-3-8B-Base-2512 Ministral Family | Multimodal | GPU | L40s (48GB) | 17.84GB | FP16 | 2,554.04 | 2,845.69 | 11.4% | 10.2% | 10.2% |
Llama-3.1-8B Llama Family | Dense | GPU | L40s (48GB) | 16GB | FP16 | 1,578.77 | 2,748.51 | 74.1% | 42.6% | 42.6% |
Meta-Llama-3-8B Llama Family | Dense | GPU | L40s (48GB) | 16.07GB | FP16 | 2,548.07 | 2,747.81 | 7.8% | 7.3% | 7.3% |
Janus-Pro-7B DeepSeek Family | Multimodal | GPU | L40s (48GB) | 14.84GB | FP16 | 1,380 | 2,746.76 | 99.0% | 49.8% | 49.8% |
LLaVA-OneVision-Qwen2-7B LLaVA Family | Multimodal | GPU | L40s (48GB) | 16.06GB | FP16 | 1,154.6 | 2,704.22 | 134.2% | 57.3% | 57.3% |
Phi-3-mini-4k-instruct Phi Family | Dense | GPU | L40s (48GB) | 7.64GB | FP16 | 2,551.35 | 2,678.52 | 5.0% | 4.7% | 4.7% |
Command-r7b-12-2024 Command R Family | Dense | GPU | L40s (48GB) | 16.06GB | FP16 | 2,310.67 | 2,628.13 | 13.7% | 12.1% | 12.1% |
Yi-9B Yi Family | Dense | GPU | L40s (48GB) | 17.7GB | FP16 | 2,228.23 | 2,604.29 | 16.9% | 14.4% | 14.4% |
Granite-3.0-8b-instruct IBM Granite Family | Dense | GPU | L40s (48GB) | 16.34GB | FP16 | 2,367.43 | 2,596.86 | 9.7% | 8.8% | 8.8% |
Llama-Guard-3-8B Llama Family | Dense | GPU | L40s (48GB) | 16.07GB | FP16 | 1,528.9 | 2,575 | 68.4% | 40.6% | 40.6% |
InternVL2-8B InternVL Family | Multimodal | GPU | A100 (80GB) | 16.16GB | FP16 | 1,553.84 | 2,423.82 | 56.0% | 35.9% | 35.9% |
Deepseek-llm-7b-base DeepSeek Family | Dense | GPU | L40s (48GB) | 13.8GB | FP16 | 1,972.13 | 2,291.52 | 16.2% | 13.9% | 13.9% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | Dense | GPU | L40s (48GB) | 16.06GB | FP16 | 1,552.96 | 2,281.91 | 46.9% | 31.9% | 31.9% |
Phi-4-mini-Instruct Phi Family | Dense | GPU | L40s (48GB) | 2.49GB | Q4 | 1,752.2 | 2,238.41 | 27.7% | 21.7% | 21.7% |
Phi-4-mini-reasoning Phi Family | Dense | GPU | L40s (48GB) | 2.49GB | Q4 | 1,760.3 | 2,149.98 | 22.1% | 18.1% | 18.1% |
Gemma-7b Gemma Family | Dense | GPU | L40s (48GB) | 17.07GB | FP16 | 1,734.53 | 2,061.96 | 18.9% | 15.9% | 15.9% |
Tiiuae/falcon-11b Falcon Family | Dense | GPU | L40s (48GB) | 22.2GB | FP16 | 1,715.25 | 2,001.2 | 16.7% | 14.3% | 14.3% |
Gemma-3-4B-IT Gemma Family | Multimodal | GPU | L40s (48GB) | 2.49GB | Q4 | 1,682 | 1,890.36 | 12.4% | 11.0% | 11.0% |
Qwen2.5-14B Qwen Family | Dense | GPU | L40s (48GB) | 21.47GB | FP16 | 1,664.25 | 1,834.04 | 10.2% | 9.3% | 9.3% |
SOLAR-10.7B-v1.0 SOLAR Family | Dense | GPU | L40s (48GB) | 21.47GB | FP16 | 1,664.25 | 1,834.04 | 10.2% | 9.3% | 9.3% |
DeepSeek-R1-Distill-Qwen-14B DeepSeek Family | Dense | GPU | L40s (48GB) | 29.5GB | FP16 | 1,422.78 | 1,796.35 | 26.3% | 20.8% | 20.8% |
Gemma-2-9B-IT Gemma Family | Dense | GPU | L40s (48GB) | 18.48GB | FP16 | 1,138.09 | 1,723.46 | 51.4% | 34.0% | 34.0% |
Gemma-2 9B IT Gemma Family | Dense | GPU | L40s (48GB) | 18.48GB | FP16 | 1,138.09 | 1,723.46 | 51.4% | 34.0% | 34.0% |
InternVL2-8B InternVL Family | Multimodal | GPU | L40s (48GB) | 16.16GB | FP16 | 1,020.33 | 1,700.65 | 66.7% | 40.0% | 40.0% |
Phi-3-medium-4k-instruct Phi Family | Dense | GPU | L40s (48GB) | 27.92GB | FP16 | 1,388.35 | 1,667.44 | 20.1% | 16.7% | 16.7% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | Dense | GPU | L40s (48GB) | 4.92GB | Q4 | 1,298.2 | 1,569.88 | 20.9% | 17.3% | 17.3% |
Llama-3.2-3B Llama Family | Dense | GPU | T4 (16GB) | 6.6GB | FP16 | 1,132.43 | 1,518.19 | 34.1% | 25.4% | 25.4% |
Gemma-2-9B-IT Gemma Family | Dense | GPU | L40s (48GB) | 5.76GB | Q4 | 1,156.5 | 1,428.07 | 23.5% | 19.0% | 19.0% |
Gemma-3-12B-IT Gemma Family | Multimodal | GPU | L40s (48GB) | 24.32GB | FP16 | 941.33 | 1,412.09 | 50.0% | 33.3% | 33.3% |
Gemma-3-12B-IT Gemma Family | Multimodal | GPU | L40s (48GB) | 7.3GB | Q4 | 831.9 | 1,049.86 | 26.2% | 20.8% | 20.8% |
Nous-Hermes-13b Nous Hermes Family | Dense | GPU | L40s (48GB) | 26GB | FP16 | 749.43 | 1,023.61 | 36.6% | 26.8% | 26.8% |
Baichuan-13B-Chat Baichuan Family | Dense | GPU | L40s (48GB) | 26.5GB | FP16 | 688.44 | 993.17 | 44.3% | 30.7% | 30.7% |
Llama-Guard-3-8B Llama Family | Dense | GPU | L40s (48GB) | 4.92GB | Q4 | 769.15 | 978.74 | 27.2% | 21.4% | 21.4% |
Llama-2-13b Llama Family | Dense | GPU | L40s (48GB) | 26GB | FP16 | 663.91 | 935.71 | 40.9% | 29.0% | 29.0% |
Llama-3.1-8B Llama Family | Dense | Device | Tesla T4 (16GB) | 16GB | INT4 | 380.59 | 502.43 | 32.0% | 24.3% | 24.3% |
Shakti-250M Shakti Family | Dense | Device | MacBook Pro M3 (36GB) | 148MB | Q4 | 295 | 385 | 30.5% | 23.4% | 23.4% |
Shakti-100M Shakti Family | Dense | Device | MacBook Pro M3 (36GB) | 126MB | Q4 | 280 | 365 | 30.4% | 23.3% | 23.3% |
Shakti-500M Shakti Family | Dense | Device | MacBook Pro M3 (36GB) | 303MB | Q4 | 215 | 281.43 | 30.9% | 23.6% | 23.6% |
SmolLM2-135M SmolLM Family | Dense | Device | MacBook Pro M3 (36GB) | 105MB | Q4 | 175 | 227.21 | 29.8% | 23.0% | 23.0% |
SmolLM2-360M SmolLM Family | Dense | Device | MacBook Pro M3 (36GB) | 271MB | Q4 | 140 | 182.81 | 30.6% | 23.4% | 23.4% |
Qwen2.5-500M Qwen Family | Dense | Device | MacBook Pro M3 (36GB) | 398MB | Q4 | 135 | 173.82 | 28.8% | 22.3% | 22.3% |
Shakti-100M Shakti Family | Dense | Device | iPhone 14 (6GB) | 126MB | Q4 | 120 | 153.7 | 28.1% | 21.9% | 21.9% |
Qwen3-0.6B Qwen Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 456.11MB | Q4 | 54.03 | 143.27 | 165.2% | 62.3% | 62.3% |
Shakti-2.5B Shakti Family | Dense | Device | MacBook Pro M3 (36GB) | 1.5GB | Q4 | 95 | 128 | 34.7% | 25.8% | 25.8% |
Qwen3-0.6B Qwen Family | Dense | CPU | AMD EPYC 9554 (32 cores, 117GB) | 456.11MB | Q4 | 45.11 | 115.6 | 156.3% | 61.0% | 61.0% |
Qwen3-1.7B Qwen Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 1.28GB | Q4 | 49.22 | 98.73 | 100.6% | 50.1% | 50.1% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 1.12GB | Q4 | 42.01 | 97.01 | 130.9% | 56.7% | 56.7% |
Qwen3-1.7B Qwen Family | Dense | CPU | AMD EPYC 9554 (32 cores, 117GB) | 1.28GB | Q4 | 36.44 | 94.34 | 158.9% | 61.4% | 61.4% |
Shakti-250M Shakti Family | Dense | Device | iPhone 14 (6GB) | 148MB | Q4 | 65 | 88.11 | 35.6% | 26.2% | 26.2% |
Llama-3.3-70B Llama Family | Dense | GPU | A100 (80GB) | 42.5GB | Q4 | 48.87 | 84.24 | 72.4% | 42.0% | 42.0% |
Llama-3.2-3B Llama Family | Dense | CPU | AMD EPYC 9554 (32 cores, 117GB) | 2.02GB | Q4 | 32.9 | 82.34 | 150.3% | 60.0% | 60.0% |
Qwen3-0.6B Qwen Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 456.11MB | Q4 | 37.9 | 82.2 | 116.9% | 53.9% | 53.9% |
Qwen3-1.7B Qwen Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 1.28GB | Q4 | 32.77 | 74.23 | 126.5% | 55.9% | 55.9% |
Llama-3.2-3B Llama Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.02GB | Q4 | 46.4 | 69.84 | 50.5% | 33.6% | 33.6% |
Phi-4-mini-reasoning Phi Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.49GB | Q4 | 25 | 65.37 | 161.5% | 61.8% | 61.8% |
Llama-3.2-3B Llama Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.02GB | Q4 | 28.39 | 64.23 | 126.2% | 55.8% | 55.8% |
Phi-4-mini-Instruct Phi Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.49GB | Q4 | 26.6 | 63.28 | 137.9% | 58.0% | 58.0% |
Shakti-500M Shakti Family | Dense | Device | iPhone 14 (6GB) | 303MB | Q4 | 45 | 62.4 | 38.7% | 27.9% | 27.9% |
Shakti-100M Shakti Family | Dense | Device | Raspberry Pi 5 (8GB) | 126MB | Q4 | 45 | 60.74 | 35.0% | 25.9% | 25.9% |
Gemma-2-2B-IT Gemma Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 1.71GB | Q4 | 41.31 | 56.62 | 37.1% | 27.0% | 27.0% |
Qwen3-0.6B Qwen Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 456.11MB | Q4 | 29.34 | 55.63 | 89.6% | 47.3% | 47.3% |
Qwen3-4B Qwen Family | Dense | CPU | AMD EPYC 9554 (32 cores, 117GB) | 2.2GB | Q4 | 22.6 | 53.99 | 138.9% | 58.1% | 58.1% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 1.12GB | Q4 | 30.78 | 53.74 | 74.6% | 42.7% | 42.7% |
Qwen3-4B Qwen Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.2GB | Q4 | 38.04 | 53.11 | 39.6% | 28.4% | 28.4% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 1.12GB | Q4 | 29.47 | 50.18 | 70.3% | 41.3% | 41.3% |
Shakti-250M Shakti Family | Dense | Device | Raspberry Pi 5 (8GB) | 148MB | Q4 | 35 | 48.911 | 39.7% | 28.4% | 28.4% |
Gemma-3-4B-IT Gemma Family | Multimodal | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.49GB | Q4 | 21.99 | 47.4 | 115.6% | 53.6% | 53.6% |
Qwen3-4B Qwen Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.2GB | Q4 | 19.7 | 44.01 | 123.4% | 55.2% | 55.2% |
Gemma-2-2B-IT Gemma Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 1.71GB | Q4 | 22.17 | 43.52 | 96.3% | 49.1% | 49.1% |
Qwen3-1.7B Qwen Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 1.28GB | Q4 | 22.87 | 41.76 | 82.6% | 45.2% | 45.2% |
Qwen3-8B Qwen Family | Dense | CPU | AMD EPYC 9554 (32 cores, 117GB) | 5.03GB | Q4 | 16.02 | 39.98 | 149.6% | 59.9% | 59.9% |
Llama-3.1-8B Llama Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 4.92GB | Q4 | 22.67 | 36.81 | 62.4% | 38.4% | 38.4% |
Shakti-100M Shakti Family | Dense | CPU | Intel Xeon Silver 4110 (197GB) | 126MB | Q4 | 21.35 | 36.64 | 71.6% | 41.7% | 41.7% |
Llama-Guard-3-8B Llama Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 4.92GB | Q4 | 13.01 | 35.89 | 175.9% | 63.8% | 63.8% |
Llama-3.1-8B Llama Family | Dense | CPU | AMD EPYC 9554 (32 cores, 117GB) | 4.92GB | Q4 | 19.43 | 34.42 | 77.1% | 43.6% | 43.6% |
Qwen3-8B Qwen Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 5.03GB | Q4 | 28.64 | 34.21 | 19.4% | 16.3% | 16.3% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 4.92GB | Q4 | 15.4 | 33.91 | 120.2% | 54.6% | 54.6% |
Llama-3.3-70B Llama Family | Dense | GPU | L40s (48GB) | 42.5GB | Q4 | 19.78 | 33.48 | 69.3% | 40.9% | 40.9% |
SmolLM2-135M SmolLM Family | Dense | Device | Raspberry Pi 5 (8GB) | 105MB | Q4 | 25 | 32.355 | 29.4% | 22.7% | 22.7% |
Gemma-2-2B-IT Gemma Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 1.71GB | Q4 | 14.67 | 31.8 | 116.8% | 53.9% | 53.9% |
Gemma-3-4B-IT Gemma Family | Multimodal | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.49GB | Q4 | 17.07 | 31.75 | 86.0% | 46.2% | 46.2% |
Llama-3.2-3B Llama Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.02GB | Q4 | 16.6 | 31.69 | 90.9% | 47.6% | 47.6% |
Phi-4-mini-reasoning Phi Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.49GB | Q4 | 17.17 | 30.28 | 76.4% | 43.3% | 43.3% |
Phi-4-mini-Instruct Phi Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.49GB | Q4 | 16.33 | 30.13 | 84.5% | 45.8% | 45.8% |
Shakti-500M Shakti Family | Dense | Device | Raspberry Pi 5 (8GB) | 303MB | Q4 | 22 | 29.54 | 34.3% | 25.5% | 25.5% |
Qwen3-8B Qwen Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 5.03GB | Q4 | 12.65 | 29.11 | 130.1% | 56.5% | 56.5% |
SmolLM2-360M SmolLM Family | Dense | Device | Raspberry Pi 5 (8GB) | 271MB | Q4 | 22 | 28.99 | 31.8% | 24.1% | 24.1% |
Qwen3-4B Qwen Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.2GB | Q4 | 13.7 | 28.18 | 105.7% | 51.4% | 51.4% |
Shakti-2.5B Shakti Family | Dense | Device | iPhone 14 (6GB) | 1.5GB | Q4 | 18 | 27.32 | 51.8% | 34.1% | 34.1% |
Llama-3.1-8B Llama Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 4.92GB | Q4 | 17.8 | 26.56 | 49.2% | 33.0% | 33.0% |
Shakti-250M Shakti Family | Dense | CPU | Intel Xeon Silver 4110 (197GB) | 148MB | Q4 | 14.75 | 25.67 | 74.0% | 42.5% | 42.5% |
Llama-3.2-1B Llama Family | Dense | CPU | Intel Xeon Silver 4110 (197GB) | 808MB | Q4 | 10.35 | 24.78 | 139.4% | 58.2% | 58.2% |
Phi-4-mini-Instruct Phi Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.49GB | Q4 | 11 | 23.94 | 117.6% | 54.1% | 54.1% |
Gemma-3-4B-IT Gemma Family | Multimodal | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.49GB | Q4 | 11.63 | 22.48 | 93.3% | 48.3% | 48.3% |
Gemma-2-9B-IT Gemma Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 5.76GB | Q4 | 11.09 | 20.73 | 86.9% | 46.5% | 46.5% |
Llama-3.1-8B Llama Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 4.92GB | Q4 | 13.4 | 19.15 | 42.9% | 30.0% | 30.0% |
Phi-4-mini-reasoning Phi Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.49GB | Q4 | 8.55 | 18.8 | 119.9% | 54.5% | 54.5% |
Gemma-3-12B-IT Gemma Family | Multimodal | CPU | AMD EPYC 9554 (60 cores, 201GB) | 7.3GB | Q4 | 10.6 | 18.25 | 72.2% | 41.9% | 41.9% |
Qwen2.5-500M Qwen Family | Dense | Device | Raspberry Pi 5 (8GB) | 398MB | Q4 | 14 | 18.24 | 30.3% | 23.2% | 23.2% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 4.92GB | Q4 | 8.79 | 15.83 | 80.1% | 44.5% | 44.5% |
Llama-Guard-3-8B Llama Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 4.92GB | Q4 | 7.98 | 15.26 | 91.2% | 47.7% | 47.7% |
Qwen3-8B Qwen Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 5.03GB | Q4 | 8.34 | 14.29 | 71.3% | 41.6% | 41.6% |
Shakti-500M Shakti Family | Dense | CPU | Intel Xeon Silver 4110 (197GB) | 303MB | Q4 | 4.56 | 14.26 | 212.7% | 68.0% | 68.0% |
Gemma-2-9B-IT Gemma Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 5.76GB | Q4 | 7.92 | 13.97 | 76.4% | 43.3% | 43.3% |
DeepSeek-R1-Distill-Qwen-1.5B Deepseek Family | Dense | CPU | Intel Xeon Silver 4110 | 1.12GB | Q4 | 6.83 | 13.14 | 92.4% | 48.0% | 48.0% |
Llama-Guard-3-8B Llama Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 4.92GB | Q4 | 6.96 | 12.76 | 83.3% | 45.5% | 45.5% |
Gemma-3-12B-IT Gemma Family | Multimodal | CPU | AMD EPYC 9554 (16 cores, 105GB) | 7.3GB | Q4 | 6.75 | 11.06 | 63.9% | 39.0% | 39.0% |
Gemma-2-9B-IT Gemma Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 5.76GB | Q4 | 5.11 | 10.08 | 97.3% | 49.3% | 49.3% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 4.92GB | Q4 | 4.88 | 9.58 | 96.3% | 49.1% | 49.1% |
Shakti-2.5B Shakti Family | Dense | CPU | Intel Xeon Silver 4110 (197GB) | 1.5GB | Q4 | 5.12 | 9.35 | 82.6% | 45.2% | 45.2% |
Llama-3.2-3B-Instruct Llama Family | Dense | CPU | Intel Xeon Silver 4110 | 2.02GB | Q4 | 4.11 | 8.76 | 113.1% | 53.1% | 53.1% |
Gemma-3-4b-it Gemma Family | Multimodal | CPU | Intel Xeon Silver 4110 | 2.49GB | Q4 | 2.92 | 8.52 | 191.8% | 65.7% | 65.7% |
Gemma-3-12B-IT Gemma Family | Multimodal | CPU | Intel Core i7-14700K (28 cores, 94GB) | 7.3GB | Q4 | 4.52 | 7.89 | 74.6% | 42.7% | 42.7% |
Qwen3-4B Qwen Family | Dense | CPU | Intel Xeon Silver 4110 | 2.2GB | Q4 | 3.26 | 6.67 | 104.6% | 51.1% | 51.1% |
DeepSeek-R1-Distill-Llama-8B Deepseek Family | Dense | CPU | Intel Xeon Silver 4110 | 4.92GB | Q4 | 2.55 | 6.27 | 145.9% | 59.3% | 59.3% |
Llama-3.3-70B Llama Family | Dense | CPU | AMD EPYC 9554 (32 cores, 117GB) | 42.5GB | Q4 | 2.01 | 5.6 | 178.6% | 64.1% | 64.1% |
Llama-3.1-8B Llama Family | Dense | CPU | Intel Xeon Silver 4110 | 4.92GB | Q4 | 2.23 | 5.34 | 139.5% | 58.2% | 58.2% |
Llama-3.3-70B Llama Family | Dense | CPU | AMD EPYC 9554 (60 cores, 201GB) | 42.5GB | Q4 | 2.7 | 4.52 | 67.4% | 40.3% | 40.3% |
Shakti-2.5B Shakti Family | Dense | Device | Raspberry Pi 5 (8GB) | 1.5GB | Q4 | 3.2 | 4.45 | 39.1% | 28.1% | 28.1% |
Llama-3.3-70B Llama Family | Dense | CPU | Intel Core i7-14700K (28 cores, 94GB) | 42.5GB | Q4 | 2.1 | 4.34 | 106.7% | 51.6% | 51.6% |
Llama-3.3-70B Llama Family | Dense | CPU | AMD EPYC 9554 (16 cores, 105GB) | 42.5GB | Q4 | 1.43 | 2.19 | 53.1% | 34.7% | 34.7% |