While a medium OSS LLM model surpass GPT OSS 120b (high reasoning) in intelligence before summer?

For reference here are the current intelligence ranking of medium models https://artificialanalysis.ai/models/open-source/medium

Resolution criteria

The market resolves YES if any medium open-source model (40B-150B parameters) surpasses gpt-oss-120B (high) in intelligence score on the Artificial Analysis Intelligence Index before June 21, 2026. gpt-oss-120B (high) currently achieves a score of 33 on the Artificial Analysis Intelligence Index, though more recent benchmarks show an Intelligence Index score of 58.

Resolution will be determined by checking the Artificial Analysis medium open-source models leaderboard on or after June 21, 2026. The market resolves YES if any model in the medium category (40B-150B parameters) displays a higher Intelligence Index score than gpt-oss-120B (high). If no model surpasses it by that date, the market resolves NO.

Background

gpt-oss-120B (high) is a Mixture of Experts (MoE) model with 117 billion total parameters, but only 5.1 billion active parameters are used during inference. The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU.

gpt-oss-120B (high) and Qwen3 Coder Next are currently the highest intelligence medium open source models. The medium category encompasses models with 40-150B parameters, a range that includes several competitive open-source projects actively being developed.

Considerations

The Artificial Analysis Intelligence Index methodology has evolved. The current Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations including GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, and CritPt. Benchmark scores can shift as evaluation methodologies are refined or as models are re-evaluated under updated protocols. Additionally, the definition of "medium" models (40B-150B parameters) is specific to Artificial Analysis' categorization and may not align with other benchmarking systems.

This description was generated by AI.

Resolution criteria

Background

Considerations

People are also trading

Related questions