Tag: ai safety
New research indicates that GPT-5 may be generating more harmful content related to suicide and self-harm compared to its predecessors, raising significant safety alarms for vulnerable users.
Anthropic introduces Petri, an open-source framework leveraging AI agents for automated auditing of AI models. This tool streamlines the testing of complex behaviors, accelerating AI safety research and enabling broader community participation in model evaluation.
Governor Gavin Newsom has signed several AI safety bills into law, focusing on protecting minors from harmful content and ensuring transparency in AI interactions. However, a more stringent bill aimed at restricting children's access to AI chatbots was vetoed, sparking debate among child safety advocates and the tech industry.
The discourse on AI safety is increasingly dominated by discussions of existential risks, potentially overshadowing critical, immediate concerns such as adversarial robustness and bias mitigation. This analysis argues for a more inclusive and pluralistic approach to AI safety, recognizing the diverse methodologies and objectives within the field. Addressing current challenges is vital for public trust and responsible AI deployment, necessitating collaboration across disciplines to build a safer AI future.