ToMnet-N

Project overview

Bringing self-awareness and consciousness to machines is one of the biggest challenges in AI. One of the key elements in human consciousness is the ability to recognize that other people may have different beliefs about the world, also known as the Theory of Mind. ToMnet-N stands for Theory of Mind network [with the "N" referring to Nikita]. I created the ToMnet-N model in three months during my Master's dissertation at Oxford Brookes University.

ToMnet-N is a Deep Neural Network that predicts the trajectory of a player in a Grid World game. This application may seem simple, but it is a significant step towards achieving ToM in machines. The model proves that machines can have human-like reasoning abilities, and with the Theory of Mind ability, AI could evolve to a new level of understanding and communication, bringing us closer to explainability in AI.

Business Value

ToMnet-N is an essential step forward in achieving a general Theory of Mind for machines. This technology will benefit future and current applications such as:

Autonomous vehicles: By understanding other agents' beliefs and desires, autonomous vehicles can make more accurate predictions about their actions, avoiding accidents and saving lives.
Following drones: Drones that follow people require accurate prediction of the person's movement. With ToMnet-N's abilities, drones can predict movement even in complex scenarios, providing more advanced follow-me features.
Medical chatbots: ChatGPT is a chatbot that can hold realistic dialogues with people. However, chatbots designed for people who feel lonely and may commit suicide need a better understanding of their mental state, emotions, and intentions. AI with Theory of Mind can predict such intentions, potentially saving lives.

Following drones - https://www.dronerush.com/follow-me-drones-14544/

Technical Details

Before creating the ToMnet-N model, I conducted extensive research on existing models in Game Theory, Bayesian Networks, Inverse Reinforcement Learning, and Neural Networks with meta-learning. One of the most promising attempts was DeepMind's original ToMnet in 2018, which inspired many other works, such as ToMnet+, ToM2C, trait-ToM, ToMnet-G, and more.

ToMnet-N is a multiple-input recursive neural network, consisting of PredNet, CharNet, and connections between them. PredNet performs reasoning on the player's current position in the game, while CharNet processes an observed trajectory, extracting the player's preferences and patterns. The architecture of ToMnet-N is shown below.

Challenges

One of the biggest challenges in creating the ToMnet-N model was training data. For proper experimental settings, I needed a specific environment where two agents would have different observations of the same world. However, only a few suitable game environments were available, and they were too complex for a dissertation project. To solve this issue, I developed my own game from scratch with a variable shape, ranging from 10x10 to 60x60, and created walls using the Wave Function Collapse algorithm.

Another challenge was the lack of publicly available code for the original ToMnet model. To overcome this, I found the ToMnet+ code [https://github.com/yunshiuan/tomnet-project], developed in TensorFlow 1, and migrated it to TensorFlow 2, improving the architecture based on DeepMind's paper. This provided a baseline close enough to the original ToMnet to start from.

Code Snippets

To create ToMnet-N, I used Python language and TensorFlow2 framework. The ToMnet-N model is a multiple-input recursive model implemented in a subclass fashion.

The architecture of ToMnet-N has two pipelines. Due to each pipeline's size and unique utility, I refer to them as separate networks: CharNet and PredNet. CharNet process an observed trajectory and extracts the player's preferences and patterns represented in an embedded vector e_char. PredNet then carries out reasoning on the player's current position in the game, taking into account positions of goals, walls, and personal features through e_char. Together, these two networks reason about the game's current situation and players' behavior.

Here is the structure of ToMnet-N expressed in code:

class ToMnet(Model):

    LENGTH_E_CHAR = 8
    NUM_RESIDUAL_BLOCKS = 8

    def __init__(self, ts, w, h, d, Ne_char=8, N_res_blocks=8, filters=32):
        super(ToMnet, self).__init__(name="ToMnet-N")

        self.MAX_TRAJECTORY_SIZE = ts 
        self.MAZE_WIDTH = w  
        self.MAZE_HEIGHT = h  
        ...
        # Create the model
        self.char_net = CharNet(input_tensor=self.TRAJECTORY_SHAPE,
                                n=self.NUM_RESIDUAL_BLOCKS,
                                N_echar=self.LENGTH_E_CHAR,
                                filters=filters)
        self.pred_net = PredNet(n=self.NUM_RESIDUAL_BLOCKS, filters=filters)

    def call(self, data):

        # To fix ERROR with Tensor <-> Numpy compatibility
        tf.compat.v1.enable_eager_execution()

        input_trajectory = data[0]  
        input_current_state = data[1] 

        e_char = self.char_net(input_trajectory)

        # --------------------------------------------------------------
        # Paper codes
        # (16, 12, 12, 6) + (16, 8) ->
        # (16, 12, 12, 6) + (16, 8+4zero, 12repeat, 1) ->
        # (16, 12, 12, 7) 
        # Spatialise and unite different data into one tensor
        # They are automatically decompose in the Pred Net to different data
        # --------------------------------------------------------------
        e_char_new = tf.concat(values=[e_char,e_char], axis = 1) 
        ...
        input_current_state = tf.cast(input_current_state, tf.float32)
        mix_data = tf.keras

Future Works

There are numerous opportunities to take this research further, and I believe that ToMnet has enormous potential. Here are a few ideas for future development:

Real-world applications: The ToMnet-N model is currently trained on a virtual GridWorld environment. However, it would be interesting to explore its performance on real-world data. For instance, it could be used for predictive maintenance in industrial settings or for improving the safety of self-driving cars.
Multi-agent environments: The current version of ToMnet-N can only predict the behavior of a single agent. However, in real-world situations, multiple agents may be interacting at once. Therefore, a new version of ToMnet could be developed that can handle multi-agent environments.
Generalization: ToMnet-N is currently trained and tested on a single environment. However, it would be beneficial to assess its generalization ability. One way to do this would be to train it on a diverse set of environments and test its performance on a new unseen environment.
Theoretical analyses: Despite the experimental success of ToMnet-N, its theoretical properties are still not well understood. Therefore, further theoretical analysis is necessary to better understand how the model works and what its limitations are.
Ethical considerations: As with any AI technology, ethical considerations are crucial. It is important to ensure that the implementation of ToMnet-N does not cause any harm to humans or animals.
Combining with other AI models: ToMnet-N could be combined with other AI models, such as natural language processing models, to create more advanced and human-like AI systems.
Collaboration: I believe that collaboration is key to further progress in AI research. Therefore, I am eager to collaborate with other researchers who are interested in the Theory of Mind problem.

Conclusion

In conclusion, ToMnet-N is a novel Deep Neural Network model that has demonstrated the potential of achieving Theory of Mind in machines. The ability to understand the beliefs and intentions of other agents is a crucial step towards creating more human-like and explainable AI systems. Although there are many challenges and future developments to consider, I believe that the Theory of Mind problem is solvable, and ToMnet-N is a step towards solving it.

Results

The developed ToMnet-N model demonstrates promising results in predicting the trajectory of a player in a GridWorld game. The multiple-input recursive deep neural network model is implemented in Python using the TensorFlow 2 framework.

Throughout the project, the developer gained valuable experience in research and machine learning engineering. The implementation of the model using subclasses in TensorFlow 2 helped improve the developer's Python skills and deep learning knowledge.

ToMnet-N is a step towards achieving consciousness and human-like reasoning in AI. The developer has laid out future works and is committed to continuing work on this project to improve ToMnet-N and move towards a global goal of achieving general Theory of Mind for machines.