A recent report from Meta AI researchers, led by Jacob Kahn, cautions about the potential flaws within the widely used SWE-bench Verified benchmark for evaluating artificial intelligence models. The researchers found multiple discrepancies within the benchmark’s evaluation process. Fair, the group behind the report, highlighted these issues, suggesting the benchmarks may not accurately reflect a model’s true capabilities. The concern centers on the potential for artificially inflated results.
Credits: News – South China Morning Post