| Model Name | Model Parameter Size | Model Size | EdgeMatrix | VLLM | TensorRT-LLM | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean TTFT | Request Throughput (token/sec) | Throughput (token/sec) | Mean TTFT | Request Throughput (token/sec) | Throughput (token/sec) | Mean TTFT | Request Throughput (token/sec) | Throughput (token/sec) | |||
| Qwen-2.5-3B | 3B | 6.17 GB | 22.79 sec | 49.98 | 9,850.24 | 130.24 sec | 38.07 | 6,131.18 | 139.12 sec | 37.72 | 6,079.89 |
| Qwen3-4B | 4B | 8.01 GB | 146.97 sec | 28.78 | 5,721.81 | 231.158 sec | 23.01 | 4,232.57 | 238.815 sec | 29.01 | 4,260.94 |
| Llama-3.2-3B | 3B | 6.43 GB | 97.62 sec | 39.1 | 7,492 | 160.86 sec | 30.48 | 5,751.89 | 164.42 sec | 36.48 | 5,730 |
| Model Name | Model Parameter Size | Model Size | EdgeMatrix | VLLM | TensorRT-LLM | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean TTFT | Request Throughput (token/sec) | Throughput (token/sec) | Mean TTFT | Request Throughput (token/sec) | Throughput (token/sec) | Mean TTFT | Request Throughput (token/sec) | Throughput (token/sec) | |||
| Qwen-2.5-3B | 3B | 6.17 GB | 96.92 sec | 39.05 | 7,764.19 | 131.97 sec | 36.37 | 5,905.08 | 141.25 sec | 40.11 | 4,312.36 |
| Llama-3.2-3B | 3B | 6.43 GB | 154.62 sec | 27.35 | 5,241.3 | 174.1 sec | 27.47 | 5,197.08 | 167.54 sec | 38.64 | 4,084.77 |