Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity4просмотра5 месяцев назад
Extension OL-MDISF: Online Learning from Mix-Typed, Drifted, and Incomplete Streaming Features7просмотров5 месяцев назад
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering16просмотров5 месяцев назад
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety11просмотров5 месяцев назад
Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently9просмотров5 месяцев назад
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination4просмотра5 месяцев назад
Giving AI Agents Access to Cryptocurrency and Smart Contracts Creates New Vectors of AI Harm4просмотра5 месяцев назад
Gemini 2.5: Advancing Reasoning, Multimodality, Long Context, and Agentic Capabilities9просмотров5 месяцев назад