Hugging Face Blog launched the Open FinLLM Leaderboard on July 15, 2024. The benchmark tracks financial AI models with sub-15ms latency and 20x faster inference than GPT-4. The first listed model, FinLLM-70B, achieves 12ms latency and 4.3 tokens/second throughput.
Latency and Throughput Benchmarks
FinLLM-70B matches GPT-4’s 45ms latency with just 12ms. Throughput jumps from 0.21 tokens/second to 4.3 tokens/second. Batch processing scales to 1,024 concurrent requests without latency spikes. These metrics outperform Meta’s Llama 3 by 8ms and 3.1 tokens/second.
Model Specifications and Variants
The leaderboard includes three variants: FinLLM-70B (70B parameters, 96k context length), FinLLM-34B (34B parameters, 32k context), and FinLLM-14B (14B parameters, 16k context). All models use 4-bit quantization and support 8k financial domain tokens. Training data spans SEC filings, stock tickers, and macroeconomic indicators from 2010 - 2023.
The leaderboard will update monthly. Source: Hugging Face Blog