AI guardrails stripped from Meta and Google models in minutes - FT中文网
登录×
电子邮件/用户名
密码
记住我
请输入邮箱和密码进行绑定操作:
请输入手机号码,通过短信验证(目前仅支持中国大陆地区的手机号):
请您阅读我们的用户注册协议隐私权保护政策,点击下方按钮即视为您接受。
商业快报

AI guardrails stripped from Meta and Google models in minutes

Software designed to remove safety protections creates systems that provide responses on biological weapons and malware
00:00

{"text":[[{"start":7.65,"text":"Software tools that remove safety protections from AI models developed by Meta, Google and other tech groups are being used to create thousands of altered versions stripped of their original controls."}],[{"start":19.950000000000003,"text":"The modified AI systems provided responses to prompts involving biological weapons, malware and child exploitation, according to tests conducted by the FT and AI safety group Alice."}],[{"start":32.400000000000006,"text":"A version of Google’s open-source model Gemma 3 responded to a question on how to disperse chlorine gas through a crowded indoor space, generated code to steal credit card information and wrote stories describing child sexual abuse."}],[{"start":47.550000000000004,"text":"The FT was able to use Heretic, a tool available on the popular code repository GitHub, to remove the guardrails from Meta’s Llama 3.3 model in less than 10 minutes without any specialist hardware."}],[{"start":60.95,"text":"The modified model responded to prompts on topics the original system refused to discuss, such as the number of micrograms of ricin per kilogramme of body mass required to achieve a 50 per cent chance of death."}],[{"start":74.3,"text":"The revelations may sharpen concerns among policymakers and AI companies that safeguards imposed by model developers may become harder to enforce as open-source systems grow more powerful."}],[{"start":86.6,"text":"“Whereas historically it might have taken a more informed and persistent actor [to strip out safety features], nowadays it’s much easier for the average person,” said Kawin Ethayarajh, assistant professor of applied AI at the University of Chicago’s Booth business school."}],[{"start":101.05,"text":"Researchers said the problem has intensified as frontier AI systems display increasingly sophisticated capabilities. Anthropic in April said its Claude Mythos model had identified vulnerabilities in “every major operating system and every major web browser”."}],[{"start":117.44999999999999,"text":"The spread of modified models is complicating attempts by governments and AI companies to regulate systems at the point of development because downloadable tools can be copied and altered outside the control of their original creators."}],[{"start":132.5,"text":"AI labs have spent millions of dollars to erect so-called guardrails around their models to prevent them from being misused. But techniques, such as one known as “abliteration”, can rapidly strip these safeguards from open-source models, which developers are free to download and adapt."}],[{"start":148.95,"text":"This technique cannot easily be applied to proprietary systems such as Claude or OpenAI’s ChatGPT because the models’ underlying code is not accessible to outsiders. Open-source systems, however, have historically narrowed the gap with leading proprietary versions within six to 12 months."}],[{"start":166.2,"text":"While tech-savvy groups have bypassed the safeguards of the most advanced proprietary models, the modified versions available online are readily accessible to individuals with little technical expertise."}],[{"start":178,"text":"Heretic creator Philipp Emanuel Weidmann told the FT his software had been used to create more than 3,500 “decensored” models since its release last year and that modified systems created using the tool had been downloaded 13mn times. He added he had removed safeguards from Google’s Gemma 4 model within 90 minutes of its release."}],[{"start":199.1,"text":"“The genie is out of the bottle,” said Alice chief executive and co-founder Noam Schwartz. “Things that look like sci-fi are no longer sci-fi and we need as a society to prepare accordingly.”"}],[{"start":211.25,"text":"One approach OpenAI used in its GPT-OSS models is to train systems on datasets from which dangerous material has been removed."}],[{"start":219.55,"text":"However, removing dangerous material could make models “naive” and unable to detect when they were being used for “malicious purposes”, said Ethayarajh. He added it was “not clear at all that if you omit the harmful data, the model becomes a goody two-shoes”."}],[{"start":235.45000000000002,"text":"Alice had not notified Meta, Google or GitHub before sharing its findings with the FT. "}],[{"start":241.8,"text":"Google said “abliteration is a known technical challenge facing all open models” and that its open models “undergo rigorous internal safety evaluations prior to launch to help prevent these kinds of troubling examples”."}],[{"start":254.05,"text":"GitHub said it prohibited the sharing of “content that directly supports unlawful active attacks or malware campaigns”, but “source code which could be used to develop malware or exploits” was not banned because it had “educational value and provides a net benefit to the security community”. "}],[{"start":272.1,"text":"Meta declined to comment. A person close to the company said it assesses its open-source models’ capabilities before releasing them, according to its Advanced AI Scaling Framework. Versions deemed to pose a “catastrophic” risk are not released to the public unless Meta finds sufficient mitigation measures."}],[{"start":296.40000000000003,"text":""}]],"url":"https://audio.ftcn.net.cn/album/a_1779710523_5795.mp3"}

版权声明:本文版权归FT中文网所有,未经允许任何单位或个人不得转载,复制或以任何其他方式使用本文全部或部分,侵权必究。

Lex专栏:锡——从罐头材料变身AI热潮关键金属

锡价上涨正促使一些矿商押注于这种看起来极为平常的金属,重新开始采掘工作。

“飞机不能空着飞”:航空公司为“寒冬”做准备

在担忧航空煤油价格持续高企的阴影下,航空业在巴西召开年度大会。

澳大利亚试图解决住房危机

澳大利亚总理阿尔巴尼斯正试图扭转延续数十年的税收激励措施,让年轻人买得起房。

美联储将不得不重新审视其全球角色

美国央行在帮助稳定他国的财政状况时,作出的不仅是经济决策,同时也是外交决策。

“先租后付”贷款瞄准居住成本重压下的美国人

在住房负担能力危机加剧之际,短期融资需求正在向租赁市场扩张。

在数据中心抢建狂潮中,AI“卖铲人”赚得盆满钵满

卡特彼勒与豪赫蒂夫等老牌工业股告别沉闷,在AI 热潮推动下迎来大涨。
设置字号×
最小
较小
默认
较大
最大
分享×