New research looks at how leading AI models hold up doing actual white-collar work tasks, drawn from consulting, investment ...
On Thursday, Scale AI and the Center for AI Safety (CAIS) released Humanity's Last Exam (HLE), a new academic benchmark aiming to "test the limits of AI knowledge at the frontiers of human expertise," ...
AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human well-being or just maximize for engagement. A ...
To fix the way we test and measure models, AI is learning tricks from social science. It’s not easy being one of Silicon Valley’s favorite benchmarks. SWE-Bench (pronounced “swee bench”) launched in ...
As more AI models show evidence of being able to deceive their creators, researchers from the Center for AI Safety and Scale AI have developed a first-of-its-kind lie detector. On Wednesday, the ...
Our Secure Future (OSF), an organization dedicated to the advancement of the Women, Peace and Security (WPS) agenda, is leading the development of a WPS-specific Artificial Intelligence (AI) benchmark ...
Researchers are racing to develop more challenging, interpretable, and fair assessments of AI models that reflect real-world use cases. The stakes are high. Benchmarks are often reduced to leaderboard ...
Discover why the witec360 Raman microscope is considered the gold standard for correlative microscopy and nanoscale imaging.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results