Billy and Manja: An alignment story
A lot of science-fiction driven theories on human-AI interaction have AIs troubling humans. But what if it were the other way around?
I don’t know if I’ve mentioned to readers of my blog that my wife is pivoting - she is putting all her matchmaking and relationship advisory and operations and supply chain background to the background, and is embarking on a career in AI alignment. To be more precise, she wants to work on human-AI interaction. Here are a couple of her recent blogposts on the matter:
This one is more like a life update:
And this one talks about one specific topic in AI and alignment, about “relationships” between humans and AI.
During the pandemic, in 2021 or so, we decided to get a Roomba. The Roomba app asked us to give it a name. My wife was keen on giving it a male name - to teach our daughter, in a way, that men can also do housework. In its initial days we found it a bit erratic, and “insane”. And so we decided to name it “Manja” after this cult song:
Manja worked unmolested for some three years. In the end of 2023, our son Bilahari (Billy) was born. A few months later, he started crawling. Initially he was scared of Manja and woud cry every time Manja started working. In fact, the way he reacted to Manja was similar to the way he reacted to some friends’ dog when we visited them.
Soon, Billy overcame his fears, and I don’t know if he had any memories of any trauma that Manja had caused him in his early life, but he has decided he will trouble Manja to no end.
He goes to Manja and randomly turns it on and off. I don’t know if Manja has any continuous learning mechanism, but if one exists, it surely has learnt all sorts of random things. Soon after Manja starts operating, Billy turns him off. He keeps blocking Manja. He tries to sit on Manja.
At some level, this is a study in human AI interaction. All literature, mainly produced by people who have read too much science fiction and not enough maths, about human AI interactions have AI going rogue and potentially troubling humans (and in some cases, which a section of the “AI safety” crowd has picked up, exterminating the human race).
In our house, we have one little human and one AI, and it is the little human who traumatises the AI. He doesn’t allow the AI to do its job. He keeps messing with the AI’s learning, and has possibly ruined what it has learnt.
In data science, there is this old adage that you “take data down into the basement and torture it until it confesses, by giving you the conclusion you want”. I wonder if there is an equivalent of this with AI, and with non-physical AI (such as LLMs). I guess there is - like there are stories that if you try to convince ChatGPT of some rubbish, and if you repeat it often enough, chatGPT will accept it as truth.
Our Billy restricts himself (so far) to torturing physical AI.
Right now Manja is broken. Every time we (typically it’s Billy) turn him on, he errors out saying “clean this brush”. We suspect something has been broken. If not anything else, the AI’s spirit has been broken. And a little human is responsible for it.
Make whatever conclusions you may want to make from this for human-AI interaction.
Hahahah! enjoyed reading this.
When it was time to name our own Robo vacuum cleaner, my wife's initial suggestion was Kantabai, which she quickly changed to Ramukaka, for the same reason. She didn't want the househelp to be female.
About the alignment part, this is an underexplored part of Human-AI relations. We don't need another movie about rogue AI. We need a movie where the AI is the hero valiantly fighting the human villain's efforts to alter its knowledge models and getting it to go wrong.