Visualizing Organizational Structure From Communications Data
An excellent teaching example that illustrates an interesting intersection of physics, math, and computer science. This explains the concepts behind a force-directed graph, then uses a force-directed graph to generate an illustration of organizational structure based on the communications within the organization. This is an example of how significant amounts of information can be extracted from existing collections of data, sometimes without much additional effort. This can easily segue into a discussion of privacy and ethics.
Email and other communications archiving is increasingly widespread. This presents an underappreciated opportunity for data mining and visualization. Drawing an analogy between communications patterns and organizational structure yields a quick and interesting visualization. To create this visualization, we map the communications patterns to a physical model, then simulate the physical model using SVG and JavaScript.
The Model
Start with the assumption that the more communication there is between two people the closer they are within an organization, or at least the more they informally influence each other. Physically model this by putting a spring between these two people, where the more communications there are, the stronger the spring.
But, we don't want the model to collapse, which is what it would do if we have only springs. The next component then, is to put an electric charge on each person. Remember like charges repel, so this spreads out the model. The springs pull people together, and the charges spread them out.
Finally, we add friction so the model will stop eventually.
The Math
The conceptual model is easily translated into a mathematical model, which is in turn translated into a computational algorithm.
The Springs
Springs are governed by Hooke's law, . In this model is the number of message exchanged between two parties.
The total force on each party due to all the other parties is the sum of the spring force from all the other parties. where the location of each party is
The Charges
The force between two charges is governed by Coulomb's law where is Coulomb's constant.
The electric force on each party is the sum of the force due to each of the other parties.
The Total Force
The total force on each party is the sum of these two forces. We see this schematically below where we see curves for electric force, the spring force, and in red the sum of the two. It is the fundamentally different shape of these curves that makes the force-directed graphs work. At a large distance the spring force dominates and pulls the parties together. At a short distance the electric force will dominate and force the parties apart. The neutral position where the red line crosses the x axis and where the parties can come to rest will be some intermediate distance.
Friction
We add one more force, friction, which only acts while the parties are in motion. This force is in the opposite direction from the motion, and slows the parties down. Without it, the parties would remain in motion forever and never settle down into a stationary graph.
A Live Example
Let's try a live example so we can see all this in action. Start with a group of people, and the count of messages they have sent to each other. Message counts less than 10 have been dropped for clarity.
Person | Person | Message Count |
---|---|---|
Mary | Peter | 200 |
Mary | Jane | 200 |
Peter | Alex | 45 |
Peter | John | 50 |
Peter | Andrew | 48 |
Peter | Doug | 40 |
Andrew | Doug | 33 |
Andrew | Paul | 20 |
Peter | Paul | 30 |
Alex | Paul | 15 |
Jane | Peter | 20 |
Jane | Simon | 40 |
Jane | Brutus | 30 |
Jane | Rich | 28 |
Jane | Micah | 14 |
Each person is positioned randomly, and we see them move under the influence of the springs and charges eventually coming to a stop due to friction.
When the animation comes to a stop, the layout of the parties is determined by the level of communications. Closer parties are more closely associated with each other. The presence of the charges ensures that the graph is spread out and easy to view and interpret.
We quickly see one person at the center, Mary, the director of development. If we look at the code we see that she has been given a pair of red shoes. The ends of the barbell distribution are two separate development groups under the director.