The long-term research goals of our labs focus on engineering humanoid robots from scratch, with an emphasis on grasping and manipulation, and learning from human observation and experience. My personal research is centered on natural interaction and communication, which significantly intersects with work in natural language processing (NLP).
At HT, we have a range of Arma humanoid robots, starting from models developed over 20 years ago. These robots have evolved visually and functionally, including iterations like Arma3, Arma6, ArmaDE, and the latest, Arma7. Arma7 boasts 32 degrees of freedom and over 100 sensors.
Our research doesn't just focus on the hardware but also involves developing software, such as the functional cognitive architecture used in Arma7. This architecture is memory-centric, featuring a three-layer system that includes hardware abstraction, high-level planning and reasoning, and a central memory for mediation between these layers.
The memory system in our architecture is designed to be distributed to avoid bottlenecks, model various types of memory, represent different data types, and be easily extendable. It includes sensory memory for input from sensors, working memory for information derived from sensory input, and long-term memory for encoding and consolidating content.
Natural language interaction has many complications due to ambiguities and context dependencies. Our dialog system employs an interaction manager that uses language models to understand and control the robot. Memory serves as a mediator, allowing the language model to query and update information.
We have deployed a large language model (LLM) as an agent to control the robot's behavior using a prompting-based approach. This includes an API specification, examples, and the current input. The LLM generates Python commands which are executed in a Python console environment to trigger physical actions or invoke perception components.
We introduced a system where robots can learn and improve from interactions. When a human gives feedback, an LLM reflects on the interaction, assesses improvement areas, and revises the action code. This new code is stored in the memory for future use.
We validated our system using various language models and interaction scenarios, observing that:
Our research faces multiple challenges, including:
We explored two approaches for personalization:
Both approaches have their advantages and constraints. Combining them might yield better results in future implementations.
Our research focuses on combining humanoid robot design with NLP to enable intuitive interaction and incremental learning. Future work aims at improving API design, managing data security, and enhancing personalization.
Q1: What are the main goals of your research?
A1: Our main goals include engineering humanoid robots with a focus on grasping, manipulation, and learning from human observation and experience, particularly focusing on natural interaction and communication.
Q2: How have the Arma humanoid robots evolved over the years?
A2: Our Arma humanoid robots have evolved significantly in both design and functionality, with the latest model, Arma7, featuring 32 degrees of freedom and over 100 sensors.
Q3: What does your functional cognitive architecture entail?
A3: It is a three-layer memory-centric architecture that includes hardware abstraction, high-level planning and reasoning, and a central memory for mediating between these layers.
Q4: How does the LLM agent control the robot's behavior?
A4: The LLM agent uses a prompting-based approach to generate Python commands, which are then executed to perform physical actions or invoke perception components.
Q5: What is incremental learning in your context?
A5: Incremental learning involves the robot improving its future actions based on feedback from human interactions, with updated codes stored in memory for future tasks.
Q6: What challenges are you currently facing?
A6: Challenges include designing effective APIs, managing system latency and data privacy, confirming improvements before updating memory, and personalizing user interactions.
Q7: How are you addressing personalization in human-robot interaction?
A7: We use explicit attribute storage with getter and setter functions and user-specific interaction memories to tailor interactions to individual users.
In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.
TopView.ai provides two powerful tools to help you make ads video in one click.
Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.
Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.