Posted 6/23/17
This post continues the discussion of modeling social organizations using Neural Networks.
Over the past week, I have been working with a neural network that represents a DAG of the following scenario:
There are ten agents (people, computers, etc) in an organization
There are five “environment” nodes, representing information in the world
Each agent can either listen to an environment node, or listen to messages from another node
There is a cost to all communication, as described in my previous post
Each agent is calculating the average of the five environment values
The neural network is attempting to minimize communications costs, while maximizing the accuracy of each agent’s information. The equation for loss looks like the following:
for each agent:
difference += mean(environment) - mean(agent's estimate of each environment node)
cost += listening_cost_for_agent + speaking_cost_for_agent
total_loss = (difference / num_agents) + cost
To start with, all agents will listen to each agent under them (A9
can listen to everyone, A8
to everyone except A9
, …), and will listen to all environment nodes. The neural network will then remove edges until it reaches maximum efficiency. We’ll begin assuming messages between agents are just as expensive as listening to the environment.
In an ideal world, the solution to the problem is as follows:
This has the lowest total cost because each agent only has to listen to one other node, except for A0
, who listens to all the environment nodes and distributes the information. However, the neural network regularly produces solutions like the following:
In this model, A0
, A1
, and A2
all read from the environment nodes, and everyone else listens to A2
. Why?
This is a problem known as a local minima. As the neural network slowly took optimization steps (“learning”), it realized it could reduce cost if A1
stopped listening to A0
. Unfortunately, this means A1
must listen to all the environment nodes to get an accurate estimate. If it wanted to listen to A0
instead it would first have to add an edge to listen to A0
, which would increase cost, and is therefore unlikely to be chosen. This gets the entire network “stuck”.
The main issue leading to the above local-minima is that we are optimizing for an unnatural situation. With a directed acyclic graph, A0
must always listen to all the environment nodes, since it is forbidden from listening to any other agents. Therefore, we should not be optimizing for lowest communications cost, but for the fastest path from each agent to A0
.
The model is also inaccurate in that it minimizes only total cost, not per-agent cost. If we grow the environment, to say a million nodes, it becomes ridiculous to suggest a single agent should be listening to the whole environment. A more accurate representation of the world would minimize the cost for each agent, distributing work, and possibly coming up with something closer to the following:
This produces slightly higher total message costs, but each agent on the inner ring only needs to listen to two and send two messages, and those on the outer ring only need to listen to a single message.
Moving forward, we need a new model for network communication costs - preferably one not based on a DAG. From there we can develop other models, representing fragmented information (not every agent needs to know everything about the environment), redundancy problems, and so on.
This post is based off of research I am working on at the Santa Fe Institute led by David Wolpert and Justin Grana. Future posts on this subject are related to the same research, even if they do not explicitly contain this attribution.