Framework Comparision

Nvidia-A100

Model NameModel Parameter SizeModel SizeEdgeMatrixVLLMTensorRT-LLM
Mean TTFTRequest Throughput (token/sec)Throughput (token/sec)Mean TTFTRequest Throughput (token/sec)Throughput (token/sec)Mean TTFTRequest Throughput (token/sec)Throughput (token/sec)
Qwen-2.5-3B3B6.17 GB22.79 sec49.989,850.24130.24 sec38.076,131.18139.12 sec37.726,079.89
Qwen3-4B4B8.01 GB146.97 sec28.785,721.81231.158 sec23.014,232.57238.815 sec29.014,260.94
Llama-3.2-3B3B6.43 GB97.62 sec39.17,492160.86 sec30.485,751.89164.42 sec36.485,730

Nvidia-L40S

Model NameModel Parameter SizeModel SizeEdgeMatrixVLLMTensorRT-LLM
Mean TTFTRequest Throughput (token/sec)Throughput (token/sec)Mean TTFTRequest Throughput (token/sec)Throughput (token/sec)Mean TTFTRequest Throughput (token/sec)Throughput (token/sec)
Qwen-2.5-3B3B6.17 GB96.92 sec39.057,764.19131.97 sec36.375,905.08141.25 sec40.114,312.36
Llama-3.2-3B3B6.43 GB154.62 sec27.355,241.3174.1 sec27.475,197.08167.54 sec38.644,084.77