Comprehensive AI model performance analysis across hardware platforms
Average Performance Gain
74.2%
Across all models
Average Cost Savings
38.1%
hardware costs
Average Power Efficiency
39.6%
power saving
Models Tested
34
configurations
Peak Throughput
21,015.55
tokens/sec
| Model | Category | Hardware | Size | Precision | Without EdgeMatrix | With EdgeMatrix | Improvement | Cost Savings | Power Efficiency |
|---|---|---|---|---|---|---|---|---|---|
Shakti-1B Shakti Family | GPU | A100 (80GB) | 1.88GB | FP16 | 16,250 | 21,015.55 | +29.3% | 22.7% | 22.7% |
Qwen2-VL-2B Qwen Family | GPU | A100 (80GB) | 4.419GB | FP16 | 12,690.3 | 19,963.4 | +57.3% | 36.4% | 36.4% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | GPU | A100 (80GB) | 3.55GB | FP16 | 12,931.42 | 18,104.78 | +40.0% | 28.6% | 28.6% |
Qwen2.5-VL-3B Qwen Family | GPU | A100 (80GB) | 7.51GB | FP16 | 8,234.11 | 13,903.65 | +68.8% | 40.8% | 40.8% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | GPU | L40s (48GB) | 3.55GB | FP16 | 7,998 | 13,040.77 | +63.0% | 38.7% | 38.7% |
Shakti-1B Shakti Family | GPU | L40s (48GB) | 1.88GB | FP16 | 7,570 | 12,064.4 | +59.4% | 37.3% | 37.3% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | GPU | A100 (80GB) | 1.12GB | Q4 | 9,004.67 | 11,412.41 | +26.7% | 21.1% | 21.1% |
Qwen2-VL-2B Qwen Family | GPU | L40s (48GB) | 4.419GB | FP16 | 6,509.44 | 10,659.19 | +63.8% | 38.9% | 38.9% |
Shakti-4B Shakti Family | GPU | A100 (80GB) | 7.42GB | FP16 | 7,564 | 10,099.3 | +33.5% | 25.1% | 25.1% |
Gemma-2-2B-IT Gemma Family | GPU | A100 (80GB) | 5.14GB | FP16 | 7,256.9 | 9,954.13 | +37.2% | 27.1% | 27.1% |
Gemma-2-2B-IT Gemma Family | GPU | A100 (80GB) | 1.71GB | Q4 | 8,904.67 | 9,929.41 | +11.5% | 10.3% | 10.3% |
InternVL2.5-4B InternVL Family | GPU | A100 (80GB) | 7.42GB | FP16 | 4,576.06 | 9,542.87 | +108.5% | 52.0% | 52.0% |
Gemma-3-4B-IT Gemma Family | GPU | A100 (80GB) | 8.6GB | FP16 | 5,798.23 | 9,462.91 | +63.2% | 38.7% | 38.7% |
Qwen2.5-VL-3B Qwen Family | GPU | L40s (48GB) | 7.51GB | FP16 | 1,738.95 | 9,188.68 | +428.4% | 81.1% | 81.1% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | GPU | L40s (48GB) | 1.12GB | Q4 | 7,011.01 | 8,527.61 | +21.6% | 17.8% | 17.8% |
Llama-3.1-8B Llama Family | GPU | H100 (80GB) | 16GB | FP16 | 6,473.07 | 7,956.74 | +22.9% | 18.6% | 18.6% |
Shakti-2.5B Shakti Family | GPU | A100 (80GB) | 6.43GB | FP16 | 6,120.03 | 7,854.42 | +66.7% | 40.0% | 40.0% |
Phi-4-mini-Instruct Phi Family | GPU | A100 (80GB) | 7.67GB | FP16 | 5,285.01 | 6,870.52 | +30.0% | 23.1% | 23.1% |
Llama-3.2-3B Llama Family | GPU | A100 (80GB) | 2.02GB | Q4 | 4,014.23 | 6,852.23 | +70.7% | 41.4% | 41.4% |
Phi-4-mini-reasoning Phi Family | GPU | A100 (80GB) | 7.67GB | FP16 | 3,105.49 | 6,607.25 | +112.7% | 53.0% | 53.0% |
Llama-3.2-3B Llama Family | GPU | A100 (80GB) | 6.6GB | FP16 | 3,581.4 | 6,362.89 | +77.7% | 43.7% | 43.7% |
Gemma-3-4B-IT Gemma Family | GPU | A100 (80GB) | 8.6GB | FP16 | 4,998.02 | 6,063.09 | +21.3% | 17.6% | 17.5% |
Shakti-4B Shakti Family | GPU | L40s (48GB) | 7.42GB | FP16 | 2,339 | 5,789 | +147.5% | 59.6% | 59.6% |
Gemma-2-2B-IT Gemma Family | GPU | L40s (48GB) | 1.71GB | Q4 | 4,501 | 5,699.2 | +26.6% | 21.0% | 21.0% |
Shakti-2.5B Shakti Family | GPU | L40s (48GB) | 1.5GB | FP16 | 3,122.92 | 5,612.24 | +66.7% | 40.0% | 52.5% |
InternVL2.5-4B InternVL Family | GPU | L40s (48GB) | 7.42GB | FP16 | 2,951.75 | 5,569.34 | +88.7% | 47.0% | 47.0% |
Llama-3.2-3B Llama Family | GPU | L40s (48GB) | 2.02GB | Q4 | 3,163.2 | 5,183.45 | +63.9% | 39.0% | 41.3% |
InternVL2-2B InternVL Family | GPU | A100 (80GB) | 4.41GB | FP16 | 2,074 | 4,970.63 | +139.7% | 58.3% | 58.3% |
Llama-3.1-8B Llama Family | GPU | A100 (80GB) | 16GB | FP16 | 3,796.37 | 4,875.86 | +28.4% | 22.1% | 22.1% |
Gemma-2-2B-IT Gemma Family | GPU | L40s (48GB) | 5.14GB | FP16 | 2,862.5 | 4,698.01 | +64.1% | 39.1% | 39.1% |
InternVL2-2B InternVL Family | GPU | L40s (48GB) | 4.41GB | FP16 | 1,959 | 4,684.38 | +139.1% | 58.2% | 58.2% |
Llama-Guard-3-8B Llama Family | GPU | A100 (80GB) | 16.07GB | FP16 | 3,528.9 | 4,517 | +28.0% | 21.9% | 21.9% |
Llama-3.2-3B Llama Family | GPU | L40s (48GB) | 6.6GB | FP16 | 1,676.3 | 4,467.32 | +166.4% | 62.5% | 52.5% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | GPU | A100 (80GB) | 16.06GB | FP16 | 3,424.03 | 4,417.09 | +29.0% | 22.5% | 22.5% |
Janus-Pro-7B DeepSeek Family | GPU | A100 (80GB) | 14.84GB | FP16 | 2,654 | 4,235 | +59.6% | 37.3% | 37.3% |
Gemma-2-9B-IT Gemma Family | GPU | A100 (80GB) | 18.48GB | FP16 | 3,269.69 | 4,184.52 | +28.0% | 21.9% | 21.9% |
Phi-4-mini-Instruct Phi Family | GPU | A100 (80GB) | 2.49GB | Q4 | 3,121.23 | 4,117.11 | +31.9% | 24.2% | 24.2% |
Phi-4-mini-reasoning Phi Family | GPU | A100 (80GB) | 2.49GB | Q4 | 3,281.02 | 4,109.32 | +25.2% | 20.2% | 20.1% |
LLaVA-OneVision-Qwen2-7B LLaVA Family | GPU | A100 (80GB) | 16.06GB | FP16 | 1,850 | 4,101.7 | +121.7% | 54.9% | 54.9% |
Gemma-3-4B-IT Gemma Family | GPU | A100 (80GB) | 2.49GB | Q4 | 3,039.12 | 4,007.55 | +31.9% | 24.2% | 24.2% |
Qwen3-4B Qwen Family | GPU | A100 (80GB) | 2.2GB | Q4 | 3,068.12 | 3,917.55 | +27.7% | 21.7% | 21.7% |
Gemma-3-4B-IT Gemma Family | GPU | L40s (48GB) | 8.6GB | FP16 | 2,220.35 | 3,903 | +75.8% | 43.1% | 43.1% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | GPU | A100 (80GB) | 4.92GB | Q4 | 3,098.77 | 3,884 | +25.3% | 20.2% | 20.2% |
Qwen3-4B Qwen Family | GPU | A100 (80GB) | 8.1GB | FP16 | 3,105.59 | 3,837.05 | +23.5% | 19.1% | 19.1% |
Llama-3.1-8B Llama Family | GPU | A100 (80GB) | 4.92GB | Q4 | 3,081.02 | 3,822.92 | +24.1% | 19.4% | 19.4% |
Gemma-3-12B-IT Gemma Family | GPU | A100 (80GB) | 24.32GB | FP16 | 3,061.5 | 3,801.01 | +24.1% | 19.5% | 19.5% |
Phi-4-multimodal Phi Family | GPU | A100 (80GB) | 11.12GB | FP16 | 1,816 | 3,769.45 | +107.6% | 51.8% | 51.8% |
Qwen3-8B Qwen Family | GPU | A100 (80GB) | 5.03GB | Q4 | 3,208.87 | 3,709.09 | +15.6% | 13.5% | 13.5% |
Phi-4-multimodal Phi Family | GPU | L40s (48GB) | 11.12GB | FP16 | 1,564.63 | 3,641 | +132.7% | 57.0% | 57.0% |
Llama-Guard-3-8B Llama Family | GPU | A100 (80GB) | 4.92GB | Q4 | 2,967.97 | 3,618.06 | +21.9% | 18.0% | 18.0% |
Qwen3-4B Qwen Family | GPU | L40s (48GB) | 2.2GB | Q4 | 2,991.43 | 3,587.3 | +19.9% | 16.6% | 16.6% |
Phi-4-mini-Instruct Phi Family | GPU | L40s (48GB) | 7.67GB | FP16 | 2,116.71 | 3,493 | +65.0% | 39.4% | 39.4% |
Phi-4-mini-reasoning Phi Family | GPU | L40s (48GB) | 7.67GB | FP16 | 2,013.6 | 3,434.8 | +70.6% | 41.4% | 41.4% |
Llama-3.2-1B Llama Family | Device | Tesla T4 (16GB) | 808MB | FP16 | 2,793.42 | 3,256.68 | +16.6% | 14.2% | 14.3% |
Gemma-2-9B-IT Gemma Family | GPU | A100 (80GB) | 5.76GB | Q4 | 2,360.33 | 3,209 | +36.0% | 26.4% | 26.4% |
Qwen3-8B Qwen Family | GPU | A100 (80GB) | 16.5GB | FP16 | 2,845.2 | 3,159.02 | +11.0% | 9.1% | 9.9% |
Qwen3-8B Qwen Family | GPU | L40s (48GB) | 5.03GB | Q4 | 2,829.54 | 3,118.23 | +10.2% | 9.3% | 9.3% |
Qwen3-4B Qwen Family | GPU | L40s (48GB) | 8.1GB | FP16 | 2,696.34 | 3,110.34 | +15.4% | 13.3% | 13.3% |
Llama-3.1-8B Llama Family | GPU | L40s (48GB) | 4.92GB | Q4 | 2,639.54 | 3,089.13 | +17.0% | 14.6% | 14.6% |
Gemma-3-4B-IT Gemma Family | GPU | L40s (48GB) | 8.6GB | FP16 | 1,749.45 | 2,974.85 | +70.0% | 41.2% | 41.2% |
Gemma-3-12B-IT Gemma Family | GPU | A100 (80GB) | 7.3GB | Q4 | 2,241.31 | 2,904.86 | +29.6% | 22.8% | 22.8% |
Qwen3-8B Qwen Family | GPU | L40s (48GB) | 16.5GB | FP16 | 2,428.23 | 2,874.12 | +18.4% | 15.5% | 15.5% |
Llama-3.1-8B Llama Family | GPU | L40s (48GB) | 16GB | FP16 | 1,578.77 | 2,748.51 | +74.1% | 42.5% | 42.6% |
Janus-Pro-7B DeepSeek Family | GPU | L40s (48GB) | 14.84GB | FP16 | 1,380 | 2,746.76 | +99.2% | 49.8% | 49.8% |
LLaVA-OneVision-Qwen2-7B LLaVA Family | GPU | L40s (48GB) | 16.06GB | FP16 | 1,154.6 | 2,704.22 | +134.2% | 57.3% | 57.3% |
Llama-Guard-3-8B Llama Family | GPU | L40s (48GB) | 16.07GB | FP16 | 1,528.9 | 2,575 | +68.4% | 40.6% | 40.6% |
InternVL2-8B InternVL Family | GPU | A100 (80GB) | 16.16GB | FP16 | 1,553.84 | 2,423.82 | +56.0% | 35.9% | 35.9% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | GPU | L40s (48GB) | 16.06GB | FP16 | 1,552.96 | 2,281.91 | +46.9% | 31.9% | 31.9% |
Phi-4-mini-Instruct Phi Family | GPU | L40s (48GB) | 2.49GB | Q4 | 1,752.2 | 2,238.41 | +27.7% | 21.7% | 21.7% |
Phi-4-mini-reasoning Phi Family | GPU | L40s (48GB) | 2.49GB | Q4 | 1,760.3 | 2,149.98 | +22.1% | 18.1% | 18.1% |
Gemma-3-4B-IT Gemma Family | GPU | L40s (48GB) | 2.49GB | Q4 | 1,682 | 1,890.36 | +12.4% | 11.0% | 11.0% |
Gemma-2-9B-IT Gemma Family | GPU | L40s (48GB) | 18.48GB | FP16 | 1,138.09 | 1,723.46 | +51.4% | 34.0% | 34.0% |
InternVL2-8B InternVL Family | GPU | L40s (48GB) | 16.16GB | FP16 | 1,020.33 | 1,700.65 | +66.7% | 40.0% | 40.0% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | GPU | L40s (48GB) | 4.92GB | Q4 | 1,298.2 | 1,569.88 | +20.9% | 17.3% | 17.3% |
Llama-3.2-3B Llama Family | GPU | T4 (16GB) | 6.6GB | FP16 | 1,132.43 | 1,518.19 | +34.0% | 25.4% | 25.4% |
Gemma-2-9B-IT Gemma Family | GPU | L40s (48GB) | 5.76GB | Q4 | 1,156.5 | 1,428.07 | +23.5% | 19.0% | 19.0% |
Gemma-3-12B-IT Gemma Family | GPU | L40s (48GB) | 24.32GB | FP16 | 941.33 | 1,412.09 | +50.0% | 33.3% | 33.3% |
Gemma-3-12B-IT Gemma Family | GPU | L40s (48GB) | 7.3GB | Q4 | 831.9 | 1,049.86 | +26.2% | 20.8% | 20.8% |
Llama-Guard-3-8B Llama Family | GPU | L40s (48GB) | 4.92GB | Q4 | 769.15 | 978.74 | +27.3% | 21.4% | 21.4% |
Llama-3.1-8B Llama Family | Device | Tesla T4 (16GB) | 16GB | INT4 | 380.59 | 502.43 | +32.0% | 24.3% | 24.3% |
Shakti-250M Shakti Family | Device | MacBook Pro M3 (36GB) | 148MB | Q4 | 295 | 385 | +30.5% | 23.4% | N/A |
Shakti-100M Shakti Family | Device | MacBook Pro M3 (36GB) | 126MB | Q4 | 280 | 365 | +30.4% | 23.3% | N/A |
Shakti-500M Shakti Family | Device | MacBook Pro M3 (36GB) | 303MB | Q4 | 215 | 281.43 | +30.9% | 23.6% | N/A |
SmolLM2-135M SmolLM Family | Device | MacBook Pro M3 (36GB) | 105MB | Q4 | 175 | 227.21 | +29.8% | 23.0% | N/A |
SmolLM2-360M SmolLM Family | Device | MacBook Pro M3 (36GB) | 271MB | Q4 | 140 | 182.81 | +30.6% | 23.4% | N/A |
Qwen2.5-500M Qwen Family | Device | MacBook Pro M3 (36GB) | 398MB | Q4 | 135 | 173.82 | +28.8% | 22.3% | N/A |
Shakti-100M Shakti Family | Device | iPhone 14 (6GB) | 126MB | Q4 | 120 | 153.7 | +28.1% | 21.9% | N/A |
Qwen3-0.6B Qwen Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 456.11MB | Q4 | 54.03 | 152.7 | +182.5% | 64.6% | 64.6% |
Qwen3-1.7B Qwen Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 1.28GB | Q4 | 49.22 | 135.6 | +175.4% | 63.7% | 63.7% |
Shakti-2.5B Shakti Family | Device | MacBook Pro M3 (36GB) | 1.5GB | Q4 | 95 | 128 | +34.7% | 25.8% | N/A |
Llama-3.2-3B Llama Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.02GB | Q4 | 46.4 | 124.76 | +168.8% | 62.8% | 62.8% |
Qwen3-0.6B Qwen Family | CPU | AMD EPYC 9554 (32 cores, 117GB) | 456.11MB | Q4 | 45.11 | 115.6 | +156.3% | 61.0% | 61.0% |
Qwen3-4B Qwen Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.2GB | Q4 | 38.04 | 97.45 | +156.2% | 61.0% | 61.0% |
Qwen3-1.7B Qwen Family | CPU | AMD EPYC 9554 (32 cores, 117GB) | 1.28GB | Q4 | 36.44 | 94.34 | +158.9% | 61.4% | 61.4% |
Shakti-250M Shakti Family | Device | iPhone 14 (6GB) | 148MB | Q4 | 65 | 88.11 | +35.5% | 26.2% | N/A |
Llama-3.3-70B Llama Family | GPU | A100 (80GB) | 42.5GB | Q4 | 48.87 | 84.24 | +72.3% | 41.8% | 42.0% |
Llama-3.2-3B Llama Family | CPU | AMD EPYC 9554 (32 cores, 117GB) | 2.02GB | Q4 | 32.9 | 82.34 | +150.4% | 60.1% | 60.0% |
Qwen3-0.6B Qwen Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 456.11MB | Q4 | 37.9 | 82.2 | +117.0% | 53.9% | 53.9% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 1.12GB | Q4 | 42.01 | 81.9 | +95.0% | 48.7% | N/A |
Qwen3-8B Qwen Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 5.03GB | Q4 | 28.64 | 79.42 | +177.3% | 64.0% | 63.9% |
Gemma-2-2B-IT Gemma Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 1.71GB | Q4 | 41.31 | 75.28 | +82.2% | 45.1% | 45.1% |
Qwen3-1.7B Qwen Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 1.28GB | Q4 | 32.77 | 74.23 | +126.5% | 55.9% | 55.9% |
Llama-3.2-3B Llama Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.02GB | Q4 | 28.39 | 64.23 | +126.3% | 55.8% | 55.8% |
Shakti-500M Shakti Family | Device | iPhone 14 (6GB) | 303MB | Q4 | 45 | 62.4 | +38.7% | 27.9% | N/A |
Shakti-100M Shakti Family | Device | Raspberry Pi 5 (8GB) | 126MB | Q4 | 45 | 60.74 | +35.0% | 25.9% | N/A |
Qwen3-0.6B Qwen Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 456.11MB | Q4 | 29.34 | 56.72 | +93.4% | 48.3% | 48.3% |
Qwen3-4B Qwen Family | CPU | AMD EPYC 9554 (32 cores, 117GB) | 2.2GB | Q4 | 22.6 | 53.99 | +138.8% | 58.1% | 58.1% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 1.12GB | Q4 | 30.78 | 53.74 | +74.6% | 42.7% | 42.7% |
Llama-3.1-8B Llama Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 4.92GB | Q4 | 22.67 | 52.34 | +130.9% | 56.7% | 56.7% |
DeepSeek-R1-Distill-Qwen-1.5B DeepSeek Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 1.12GB | Q4 | 29.47 | 52.25 | +95.0% | 43.6% | 43.6% |
Shakti-250M Shakti Family | Device | Raspberry Pi 5 (8GB) | 148MB | Q4 | 35 | 48.911 | +39.8% | 28.4% | N/A |
Qwen3-1.7B Qwen Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 1.28GB | Q4 | 22.87 | 47.32 | +106.9% | 51.7% | N/A |
Phi-4-mini-reasoning Phi Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.49GB | Q4 | 25 | 46.42 | +85.7% | 46.1% | 46.1% |
Phi-4-mini-Instruct Phi Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.49GB | Q4 | 26.6 | 45.69 | +71.8% | 41.8% | 41.8% |
Qwen3-4B Qwen Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.2GB | Q4 | 19.7 | 44.01 | +123.4% | 55.2% | 55.2% |
Gemma-2-2B-IT Gemma Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 1.71GB | Q4 | 22.17 | 43.02 | +94.0% | 48.5% | 48.5% |
Gemma-3-4B-IT Gemma Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 2.49GB | Q4 | 21.99 | 40.83 | +85.7% | 46.1% | 46.1% |
Qwen3-8B Qwen Family | CPU | AMD EPYC 9554 (32 cores, 117GB) | 5.03GB | Q4 | 16.02 | 39.98 | +149.6% | 59.9% | 59.9% |
Shakti-100M Shakti Family | CPU | Intel Xeon Silver 4110 (197GB) | 126MB | Q4 | 21.35 | 36.64 | +71.6% | 41.7% | 41.7% |
Phi-4-mini-reasoning Phi Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.49GB | Q4 | 17.17 | 34.74 | +102.3% | 50.6% | 50.6% |
Llama-3.1-8B Llama Family | CPU | AMD EPYC 9554 (32 cores, 117GB) | 4.92GB | Q4 | 19.43 | 34.42 | +77.1% | 43.6% | 43.5% |
Llama-3.2-3B Llama Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.02GB | Q4 | 16.6 | 33.98 | +104.7% | 51.1% | 51.1% |
Phi-4-mini-Instruct Phi Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.49GB | Q4 | 16.33 | 33.98 | +108.1% | 51.9% | 51.9% |
Llama-3.3-70B Llama Family | GPU | L40s (48GB) | 42.5GB | Q4 | 19.78 | 33.48 | +69.3% | 40.9% | 41.9% |
SmolLM2-135M SmolLM Family | Device | Raspberry Pi 5 (8GB) | 105MB | Q4 | 25 | 32.355 | +29.4% | 22.7% | N/A |
Gemma-2-2B-IT Gemma Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 1.71GB | Q4 | 14.67 | 31.8 | +116.8% | 53.9% | 53.9% |
Gemma-3-4B-IT Gemma Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.49GB | Q4 | 17.07 | 30.33 | +77.7% | 43.7% | 43.7% |
Shakti-500M Shakti Family | Device | Raspberry Pi 5 (8GB) | 303MB | Q4 | 22 | 29.54 | +34.3% | 25.5% | N/A |
Qwen3-4B Qwen Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 2.2GB | Q4 | 13.7 | 29.44 | +114.9% | 53.2% | 53.5% |
Qwen3-8B Qwen Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 5.03GB | Q4 | 12.65 | 29.11 | +130.1% | 56.5% | 59.9% |
SmolLM2-360M SmolLM Family | Device | Raspberry Pi 5 (8GB) | 271MB | Q4 | 22 | 28.99 | +31.8% | 24.1% | N/A |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 4.92GB | Q4 | 15.4 | 28.09 | +82.4% | 45.2% | 45.2% |
Shakti-2.5B Shakti Family | Device | iPhone 14 (6GB) | 1.5GB | Q4 | 18 | 27.32 | +51.8% | 34.1% | N/A |
Llama-Guard-3-8B Llama Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 4.92GB | Q4 | 13.01 | 27.28 | +109.7% | 52.3% | 52.3% |
Llama-3.1-8B Llama Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 4.92GB | Q4 | 17.8 | 26.56 | +49.2% | 33.0% | 33.0% |
Shakti-250M Shakti Family | CPU | Intel Xeon Silver 4110 (197GB) | 148MB | Q4 | 14.75 | 25.67 | +74.0% | 42.5% | 42.5% |
Llama-3.2-1B Llama Family | CPU | Intel Xeon Silver 4110 (197GB) | 808MB | Q4 | 10.35 | 24.78 | +139.4% | 58.2% | 58.2% |
Phi-4-mini-Instruct Phi Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.49GB | Q4 | 11 | 23.94 | +117.6% | 54.0% | 54.0% |
Gemma-2-9B-IT Gemma Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 5.76GB | Q4 | 11.09 | 23.09 | +108.2% | 52.0% | 52.0% |
Gemma-3-4B-IT Gemma Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.49GB | Q4 | 11.63 | 22.48 | +93.3% | 48.3% | 43.7% |
Phi-4-mini-reasoning Phi Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 2.49GB | Q4 | 8.55 | 18.8 | +119.9% | 54.5% | 54.5% |
Gemma-3-12B-IT Gemma Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 7.3GB | Q4 | 10.6 | 18.71 | +76.5% | 43.3% | 43.3% |
Qwen2.5-500M Qwen Family | Device | Raspberry Pi 5 (8GB) | 398MB | Q4 | 14 | 18.24 | +30.3% | 23.2% | N/A |
Llama-3.1-8B Llama Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 4.92GB | Q4 | 13.4 | 17.03 | +27.1% | 21.3% | 21.3% |
Llama-Guard-3-8B Llama Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 4.92GB | Q4 | 7.98 | 16.78 | +110.2% | 52.4% | 52.4% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 4.92GB | Q4 | 8.79 | 15.99 | +81.9% | 45.0% | 45.0% |
Qwen3-8B Qwen Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 5.03GB | Q4 | 8.34 | 15.34 | +83.9% | 45.5% | 45.6% |
Shakti-500M Shakti Family | CPU | Intel Xeon Silver 4110 (197GB) | 303MB | Q4 | 4.56 | 14.26 | +212.7% | 68.0% | 68.0% |
Gemma-2-9B-IT Gemma Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 5.76GB | Q4 | 7.92 | 13.07 | +65.0% | 39.4% | 39.4% |
Llama-Guard-3-8B Llama Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 4.92GB | Q4 | 6.96 | 12.76 | +90.7% | 49.5% | 45.5% |
Gemma-3-12B-IT Gemma Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 7.3GB | Q4 | 6.75 | 10.9 | +61.5% | 38.1% | 38.1% |
Gemma-2-9B-IT Gemma Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 5.76GB | Q4 | 5.11 | 10.08 | +97.3% | 49.3% | 49.3% |
DeepSeek-R1-Distill-Llama-8B DeepSeek Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 4.92GB | Q4 | 4.88 | 9.58 | +96.3% | 49.1% | 49.1% |
Shakti-2.5B Shakti Family | CPU | Intel Xeon Silver 4110 (197GB) | 1.5GB | Q4 | 5.12 | 9.35 | +82.6% | 45.2% | 45.2% |
Gemma-3-12B-IT Gemma Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 7.3GB | Q4 | 4.52 | 7.89 | +74.5% | 42.7% | 42.7% |
Llama-3.3-70B Llama Family | CPU | AMD EPYC 9554 (60 cores, 201GB) | 42.5GB | Q4 | 2.7 | 7.65 | +183.3% | 64.7% | 64.7% |
Llama-3.3-70B Llama Family | CPU | AMD EPYC 9554 (32 cores, 117GB) | 42.5GB | Q4 | 2.01 | 5.6 | +178.6% | 64.1% | 64.1% |
Shakti-2.5B Shakti Family | Device | Raspberry Pi 5 (8GB) | 1.5GB | Q4 | 3.2 | 4.45 | +39.1% | 28.1% | N/A |
Llama-3.3-70B Llama Family | CPU | Intel Core i7-14700K (28 cores, 94GB) | 42.5GB | Q4 | 2.1 | 4.34 | +106.7% | 51.6% | 51.6% |
Llama-3.3-70B Llama Family | CPU | AMD EPYC 9554 (16 cores, 105GB) | 42.5GB | Q4 | 1.43 | 2.01 | +40.6% | 28.9% | 28.9% |