This video is the first time we were able to record two of our robots talking autonomously. While we were building them, they talked to each other all the time, but capturing on film proved harder than we thought. In this video, both robots are listening to what the other robot says and responding with replies generated by a chat bot based on what they hear.
The robots are completely offline and only use open-source software. They are powered by a RaspberryPi and have a local LangChain chat bot (TinyLlama LLM). They use Vosk for speech recognition and Piper to synthesize speech. Vosk does a fairly good job converting the Piper voice (it did not recognize anything spoken using eSpeech). Piper works well most of the time but can miss a few words and freeze up unexpectedly. The pause mid-video is due to one of the robots briefly not being able to speak due to a buffer overflow issue.
We also have distinct personalities and LLM prompts for all our robots, although in this clip they are hard to distinguish. The only thing noticeable is how one robot moves its arms much more than the other.
We have four modes:
- Puppet: a human controls the robot in real-time
- Scripted: The robot follows a script with minimal autonomous actions
- Autonomous: The robot responds to outside stimuli on its won
- Blended AI: the robot has a script but improvises what it says and how it moves.
Moving forward we will have two types of videos, scripted mode and fully autonomous. The puppet mode will use a human created script to control the robots. The fully autonomous films will be the robots talking on their own “off camera”.
We are working on releasing the code based used in this video, but it is a bit too rough at this stage.
Happy creating!