Alas, this totally awesome feature from LinkedIn API will discontinue its availability sometime in May of 2015. My hopes are that LinkedIn either changes its mind or allows a partner to still continue lending the data to aspiring analysts (such as myself). Regardless, I wanted to take a moment and share a little bit about my experience with some extraordinarily powerful tools explaining network data using the soon-to-be-discontinued LinkedIn API.
Before I begin, I need to give the majority of the credit for this experience to the author of this post on Linkurio. The author goes by the pseudonym, Jean, and that’s all I’m able to find out. Thank you, Jean, whoever you are! Also, thanks to Thomas Cabrol for compiling some excellent code as a place to start.
Process
Ok, to start we must begin by creating an API account, which can be done by visiting the LinkedIn developer URL: https://www.linkedin.com/secure/developer
Add a new application, and then record the API and User keys because they are needed in the Python code below. Note: if you are unable to retrieve the user tokens, the Linkurio post provides an option on how to collect them. Here is the code I ran:
[su_heading]Step 1: Connect to LinkedIn API[/su_heading]
#!/usr/bin/env python
# encoding: utf-8
"""
linkedin-2-query.py
Created by Thomas Cabrol on 2012-12-03.
Copyright (c) 2012 dataiku. All rights reserved.
Building the LinkedIn Graph
"""
#Note: run this first, and then run cleaner.py
from __future__ import division, print_function
import os
os.chdir("C://Users//Peter//Documents//code//linkedin")
print(os.getcwd())
import oauth2 as oauth
import urlparse
import simplejson
import codecs
# Define CONSUMER_KEY, CONSUMER_SECRET,
# USER_TOKEN, and USER_SECRET from the credentials
# provided in your LinkedIn application
CONSUMER_KEY = xxxx;
CONSUMER_SECRET = xxxx;
USER_TOKEN = xxxxx;
USER_SECRET = xxxxx;
OAUTH_TOKEN = USER_TOKEN
OAUTH_TOKEN_SECRET = USER_SECRET
OUTPUT = "linked.csv"
consumer = oauth.Consumer(key=CONSUMER_KEY, secret=CONSUMER_SECRET)
token = oauth.Token(key=OAUTH_TOKEN, secret=OAUTH_TOKEN_SECRET)
client = oauth.Client(consumer, token)
# Fetch first degree connections
resp, content = client.request('http://api.linkedin.com/v1/people/~/connections?format=json')
results = simplejson.loads(content)
# File that will store the results
output = codecs.open(OUTPUT, 'w', 'utf-8')
# Loop thru the 1st degree connection and see how they connect to each other
for result in results["values"]:
con = "%s %s" % (result["firstName"].replace(",", " "), result["lastName"].replace(",", " "))
output.write("Peter Eliason," con "n")
# This is the trick, use the search API to get related connections
u = "https://api.linkedin.com/v1/people/%s:(relation-to-viewer:(related-connections))?format=json" % result["id"]
resp, content = client.request(u)
rels = simplejson.loads(content)
try:
for rel in rels['relationToViewer']['relatedConnections']['values']:
sec = "%s %s" % (rel["firstName"].replace(",", " "), rel["lastName"].replace(",", " "))
output.write(con "," sec "n")
except:
pass
This code will connect to my LinkedIn account and DOWNLOAD my ENTIRE 1st and 2nd degree network! Spooky, but awesome.
Now that the data has been downloaded, the next step is to clean it up. We may run the below to remove bad characters and set everything to lowercase:
[su_heading]Step 2: Cleaner code[/su_heading]
#!/usr/bin/env python
# encoding: utf-8
"""
linkedin-3-cleaner.py
Created by Thomas Cabrol on 2012-12-04.
Copyright (c) 2012 dataiku. All rights reserved.
Clean up and dedup the LinkedIn graph
"""
#note: run network_test.py first
from __future__ import division, print_function
import os
os.chdir("C://Users//Peter//Documents//code//linkedin")
print(os.getcwd())
import codecs
from unidecode import unidecode
from operator import itemgetter
INPUT = 'linked.csv'
OUTPUT = 'linkedin_total.csv'
def stringify(chain):
# Simple utility to build the nodes labels
allowed = '0123456789abcdefghijklmnopqrstuvwxyz_'
c = unidecode(chain.strip().lower().replace(' ', '_'))
return ''.join([letter for letter in c if letter in allowed])
def cleaner():
output = open(OUTPUT, 'w')
# Store the edges inside a set for dedup
edges = set()
for line in codecs.open(INPUT, 'r', 'utf-8'):
from_person, to_person = line.strip().split(',')
_f = stringify(from_person)
_t = stringify(to_person)
# Reorder the edge tuple
_e = tuple(sorted((_f, _t), key=itemgetter(0, 1)))
edges.add(_e)
for edge in edges:
output.write(edge[0] "," edge[1] "n")
if __name__ == '__main__':
cleaner()
The next part of the Python code uses a library called NetworkX to create a file format called graphml which can be imported by a network graphing tool called Gephi. NetworkX is actually capable of far more than simply converting API files to graphml, but we’ll hold off on that tangent for another post. For now, we’ll focus on Gephi and graphml.
[su_heading]Step 3: Create graphml file[/su_heading]
# Defining and Visualizing Simple Networks (Python)
# prepare for Python version 3x features and functions
from __future__ import division, print_function
import os
os.chdir("C://Users//Peter//Documents//code//linkedin")
print(os.getcwd())
# load package into the workspace for this program
import networkx as nx
import matplotlib.pyplot as plt # 2D plotting
# read Wikipedia Votes data creating a NetworkX directed graph object g
f = open('linkedin_total.csv', 'rb')
g = nx.read_edgelist(f, delimiter=",", create_using = nx.DiGraph(), nodetype = str)
f.close()
nx.write_graphml(g,'linkedin_total.graphml')
Ok, so now we’ve got our graphml file. The next step is to import it into this tool called Gephi. You’ll need to download Gephi as an executable — it’s not a Python library or anything like that. It is a standalone visualization tool.
I’m a Windows user and I had problems getting Gephi to install properly. I was able to work around this by UNINSTALLING Java, and then reinstalling an old version of Java, version 7. After I did this, I was able to install Gephi without problems.
I’m told that Mac users are able to install Gephi with no problems. Figures, ha!
Now, after importing the graphml file into Gephi, I took these steps:
- On the left-hand-side, ran “Force Atlas 2.” It takes a LONG time for the process to complete, so I cancelled it after about 10 minutes because the visualization was close enough for my needs.
- Activated the “Show node labels” to see who each node represented
- Ran the modularity algorithm in the Statistics panel (on the right). I went to the partition window (select Window > Partition) and choose to color the nodes according to their “Modularity class”
I’m left with a stunning graph of my network with me listed as the center node. Each color represents a certain cluster within my list of connections. If you look closely, you can see that some nodes have names next to them (I’ve purposefully left them obscenely small to protect the identities of my connections), but Gephi allows the analyst to zoom in and out in order to explore the network.
After only a couple minutes, it becomes blatantly clear which each of these clusters and colors represent. They’re a representation of ME and MY LIFE! The incredibly beautiful part of this entire process was that the analysis was entirely undirected! I had nothing to do with direction the creation of these clusters…NetworkX and Gephi did all of that for me by themselves!
To call attention to each of these clusters, I’ve gone ahead and named each cluster, here. Each cluster represents a key time and network (aka: clique) in my life.
The Undergrad section represents all of my connections from my undergrad school, Luther College in Decorah, IA.
MSPA represents grad school connections (in another state, and 10 years after undergrad, so not much connection between those two networks!) as part of Northwestern University in Evanston, IL.
Also interesting, Best Buy had some hard years back in 2008-2010 and a lot of folks left Best Buy to join SUPERVALU, which explains the many connections between the two.
The fascinating thing about this analysis, is that through LinkedIn, I have a basic map of my Personal AND Professional life.
Conclusion
While this particular map may not be extraordinarily useful for advancing my career, it allows me to be reflective on the state of my network, and in essence, a brief story of my life.
In a business setting, however, I can see how this process might be interesting in identifying clusters, tribes, and influencers using relationship data to understand influence of products, lifestyles, and consumer choices..