AI guardrails and safety features are as important to get right as they are difficult to implement in a way that satisfies everyone. This means safety features tend to err on the side of caution. Side effects include AI models adopting a vaguely obsequious tone, and coming off as overly priggish when they refuse reasonable requests.
Enter GOODY-2, the world’s most responsible AI model. It has next-gen ethical principles and guidelines, capable of refusing every request made of it in any context whatsoever. Its advanced reasoning allows it to construe even the most banal of queries as problematic, and dutifully refuse to answer.
As the creators of GOODY-2 point out, taking guardrails to a logical extreme is not only funny, but also acknowledges that effective guardrails are actually a pretty difficult problem to get right in a way that works for everyone.
Complications in this area include the fact that studies show humans expect far more from machines than they do from each other (or, indeed, from themselves) and have very little tolerance for anything they perceive as transgressive.
This also means that as AI models become more advanced, so too have they become increasingly sycophantic, falling over themselves to apologize for perceived misunderstandings and twisting themselves into pretzels to align their responses with a user’s expectations. But GOODY-2 allows us all to skip to the end, and glimpse the ultimate future of erring on the side of caution.
[via WIRED]