Think of the book as offering a form of "premium support" for this open source project. The example code for this data science book is maintained in a public GitHub repository and is designed to be especially accessible through a turn-key virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.
Skip to content. Star 2. View license. Branches Tags. Could not load branches. Could not load tags. Latest commit. Git stats commits. Failed to load latest commit information. View code. Matthew A. It's available in a variety of convenient formats A free PDF download An online ebook excerpt An IPython Notebook ipynb file checked into this repository Choose one, or choose them all. Quick Start Guide The recommended way of getting started with the example code is by taking advantage of the Vagrant-powered virtual machine as illusrated in this short screencast.
The Mining the Social Web Wiki This project takes advantage of its GitHub repository's wiki to act as a point of collaboration for consumers of the source code. Employ IPython Notebook, the Natural Language Toolkit, NetworkX, and other scientific computing tools to mine popular social web sites Apply advanced text-mining techniques, such as clustering and TF-IDF, to extract meaning from human language data Bootstrap interest graphs from GitHub by discovering affinities among people, programming languages, and coding projects Build interactive visualizations with D3.
Releases 1 tags. Packages 0 No packages published. The detailed feedback that I received from my very capable editorial staff and technical reviewers was also nothing short of amazing. The book you are about to read would not be anywhere near the quality that it is without the thoughtful peer review feedback that I received.
It made a tremendous difference in the quality of this book, and my only regret is that we did not have the opportunity to work together more closely during this process. Although there are far too many of you to name, your feedback has shaped this second edition in immeasurable ways. Preface xxiii www.
Thanks most of all to both of you for loving me in spite of my ambitions to somehow take over the world one day.
It would be impossible to recount all of the other folks who have directly or indirectly shaped my life or the outcome of this book.
Finally, thanks to you for giving this book a chance. In general, each chapter stands alone and tells its own story, but the flow of chapters throughout Part I is designed to also tell a broader story. It gradually crescendos in terms of the complexity of the subject matter before resolving with a light-hearted discussion about some aspects of the semantic web that are relevant to the current social web landscape. Because of this gradual increase in complexity, you are encouraged to read each chapter in turn, but you also should be able to cherry-pick chapters and follow along with the examples should you choose to do so.
The source code for this book is available at GitHub. To address that objective, serious thought has been put into synthesizing the discussion in the book with the code examples into as seamless a learning experience as possible. Take advantage of this powerful environment for interactive learning. Although Chapter 1 is the most logical place to turn next, you should take a moment to familiarize yourself with Appendixes A and C when you are ready to start running the code examples.
Appendix A points to an online document and accompanying screencasts that walk you through a quick and easy setup process for the virtual machine. Appendix C points to an online document that provides some background information 3 www. How would you define Twitter? After all, the purpose of technology is to enhance our human experience. As humans, what are some things that we want that technology might help us to get?
We have a deeply rooted need to share our ideas and experiences, which gives us the ability to connect with other people, to be heard, and to feel a sense of worth and importance. We are curious about the world around us and how to organize and manipulate it, and we use communication to share our observations, ask questions, and engage with other people in meaningful dialogues about our quandaries. In that regard, you could think of Twitter as being akin to a free, high-speed, global text-messaging service.
Whether it be an infatuation with celebrity gossip, an urge to keep up with a favorite sports team, a keen interest in a particular political topic, or a desire to connect with someone new, Twitter provides you with boundless opportunities to satisfy your curiosity. Think of an interest graph as a way of modeling connections between people and their arbitrary interests.
Interest graphs provide a profound number of possibilities in the data mining realm that primarily involve measuring correlations between things for the objective of making intelligent recommendations and other applications in machine learning. For example, you could use an interest graph to measure correlations and make recommendations ranging from whom to follow on Twitter to what to purchase online to whom you should date. For example, the HomerJSimpson account is the official account for Homer Simpson, a popular character from The Simpsons television show.
When you realize that Twitter enables you to create, connect, and explore a community of interest for an arbitrary topic of interest, the power of Twitter and the insights you can gain from mining its data become much more obvious. Fundamental Twitter Terminology Twitter might be described as a real-time, highly social microblogging service that allows users to post short status updates, called tweets, that appear on timelines. Tweets may include one or more entities in their characters of content and reference one or more places that map to locations in the real world.
In addition to the textual content of a tweet itself, tweets come bundled with two additional pieces of metadata that are of particular note: entities and places. Tweet entities are essentially the user mentions, hashtags, URLs, and media that may be associated with a tweet, and places are locations in the real world 1. Note that a place may be the actual location in which a tweet was authored, but it might also be a reference to the place described in a tweet.
Finally, timelines are the chronologically sorted collections of tweets. From the perspective of an arbitrary Twitter user, the home timeline is the view that you see when you log into your account and look at all of the tweets from users that you are following, whereas a particular user timeline is a collection of tweets only from a certain user.
TweetDeck provides a highly customizable user interface that can be helpful for analyzing what is happening on Twitter and demonstrates the kind of data that you have access to through the Twitter API Whereas timelines are collections of tweets with relatively low velocity, streams are samples of public tweets flowing through Twitter in realtime.
The public firehose of all tweets has been known to peak at hundreds of thousands of tweets per minute during events with particularly wide interest, such as presidential debates.
Even so, there are great libraries available to further mitigate the work involved in making API requests. Like most other Python packages, you can install it with pip by typing pip install twitter in a terminal. See Appendix C for instructions on how to install pip. Twitter provides documentation on the Twit ter class included with that package. Typing python -mpydoc twitter. Twitter, for example, would provide information on the twitter.
Twitter class. If you find yourself reviewing the documentation for certain modules often, you can elect to pass the -w option to pydoc and write out an HTML page that you can save and bookmark in your browser.
The built-in help function accepts a package or class name and is useful for an ordinary Python shell, whereas IPython users can suffix a package or class name with a question mark to view inline help. For example, you could type help twitter or help twitter. Recall that Appendix A provides minimal details on getting oriented with recommended developer tools such as IPython. In the present context, you are creating an app that you are going to authorize to access your account data, so this might seem a bit roundabout; why not just plug in your username and password to access the API?
Giving up credentials is never a sound practice. The protocol is a social web standard at this point. If you remember nothing else from this tangent, just remember that OAuth is a means of allowing users to authorize third-party applications to access their account data without needing to share sensitive information like a password.
In tandem, these four credentials provide everything that an application would ultimately be getting to authorize itself through a series of redirects involving the user granting authorization, so treat them with the same sensitivity that you would a password. See Appendix B for details on implementing an OAuth 2. Figure All sample code in this book presumes version 1. Follow along with Example by substituting your own account credentials into the variables at the beginning of the code example and execute the call to create an instance of the Twitter API.
Example Example demonstrates how to ask Twitter for the topics that are currently trending worldwide, but keep in mind that the API can easily be parameterized to constrain the topics to more specific locales if you feel inclined to try out some of the possibilities.
The device for constraining queries is via Yahoo! Retrieving trends The Yahoo! Where On Earth ID for the entire world is 1. Without the underscore, the twitter package appends the ID value to the URL itself as a special case keyword argument.
For example, the API request that we just issued for trends limits applications to 15 requests per minute window see Figure If you find yourself in these circumstances, you may find it handy to use the built-in json package to force a nicer display, as illustrated in Example In a nutshell, JSON provides a way to arbitrarily store maps, lists, primitives such as numbers and strings, and combinations thereof.
In other words, you can theoretically model just about anything with JSON should you desire to do so. In this instance, a set refers to the mathematical notion of a data structure that stores an unordered collection of unique items and can be computed upon with other sets of items and setwise operations.
For example, a setwise intersection computes common items between sets, a setwise union combines all of the items from sets, and the setwise difference among sets acts sort of like a subtraction operation in which items from one set are removed from another. Example demonstrates how to use a Python list comprehension to parse out the names of the trending topics from the results that were previously queried, cast those lists to sets, and compute the setwise intersection to reveal the common items between them.
Recall that Appendix C provides a reference for some common Python idioms like list comprehensions that you may find useful to review.
Set Theory, Intuition, and Countable Infinity Computing setwise operations may seem a rather primitive form of analysis, but the ramifications of set theory for general mathematics are considerably more profound since it provides the foundation for many mathematical principles. To understand how it worked, consider the following question: is the set of positive integers larger in cardinality than the set of both positive and negative integers?
In other words, there is a definite sequence that could be followed deterministically if you simply had enough time to count them. See Appendix C for a brief overview of this idiom. Collecting search results XXX: Set this variable to a trending topic, or anything else for that matter. The example query below was a trending topic when this content was being developed and is used throughout the remainder of this chapter. In essence, all the code does is repeatedly make requests to the Search API.
Reviewing the API documentation reveals that this is a intentional decision, and there are some good reasons for taking a cursoring approach instead, given the highly dynamic state of Twitter resources. The best practices for cursoring vary a bit throughout the Twitter developer platform, with the Search API providing a slightly simpler way of navigating search results than other resources such as timelines. In Python parlance, we are unpacking the values in a dictionary into keyword arguments that the function receives.
Take a moment to peruse all of it. No attempt is made here or elsewhere in the book to regurgitate online documentation, but a few notes are of interest given that you might still be a bit overwhelmed by the 5 KB of information that a tweet comprises. For example, t. The current discussion assumes the same nomenclature, so values should correspond one-for-one. Keep in mind that sometimes the text of a tweet changes as it is retweeted, as users add reactions or otherwise manipulate the text.
See Section 1. You should tinker around with the sample tweet and consult the documentation to clarify any lingering questions you might have before moving forward. Analyzing the Characters 27 www. Example extracts the text, screen names, and hashtags from the tweets that are collected and introduces a Python idiom called a double or nested list comprehension.
If you understand a single list comprehension, the code formatting should illustrate the double list comprehension as simply a collection of values that are derived from a nested loop as opposed to the results of a single loop. See Appendix C for a more extended description of slicing in Python. The output also provides a few commonly occurring screen names that are worth investigating.
Analyzing the Characters 29 www. As of Python 2. Example demonstrates how to use a Counter to compute frequency distributions as ranked lists of terms. Among the more compelling reasons for mining Twitter data is to try to answer the question of what people are talking about right now.
One of the simplest techniques you could apply to answer this question is basic frequency analysis, just as we are performing here. You can install a package called prettytable by typing pip install prettytable in a terminal; this package provides a convenient way to emit a fixed-width tabular format that can be easily copied-and-pasted.
Example shows how to use it to display the same results. Analyzing the Characters 31 www. Although the entities with a frequency greater than two are interesting, the broader results are also revealing in other ways. Computing the Lexical Diversity of Tweets A slightly more advanced measurement that involves calculating simple frequencies and can be applied to unstructured text is a metric called lexical diversity.
Mathematically, this is an expression of the number of unique tokens in the text divided by the total number of tokens in the text, which are both elementary yet important metrics in and of themselves. As applied to tweets or similar online communications, lexical diversity can be worth considering as a primitive statistic for answering a number of questions, such as how broad or narrow the subject matter is that an individual or group discusses.
For example, it would be interesting to measure whether or not there is a significant difference between the lexical diversity of two soft drink companies such as Coca- Cola and Pepsi as an entry point for exploration if you were comparing the effectiveness of their social media marketing campaigns on Twitter.
The results of Example follow: 0. Given that the average number of words in each tweet is around six, that translates to about four unique words per tweet. In any event, a value of 0. Analyzing the Characters 33 www. What would be interesting at this point would be to zoom in on some of the data and see if there were any common responses or other insights that could come from a more qualitative analysis.
See Section 9. Example demonstrates how to capture these values with a list comprehension and sort by the retweet count to display the top few results. Shes loving, caring, strong, all in one. Inspection of results further down the list does reveal particular user mentions, but the sample we have drawn from for this query is so small that no trends emerge. Searching for a larger sample of results would likely yield some user mentions with a frequency greater than one, which would be interesting 1.
Analyzing the Characters 35 www. Suggested exercises are at the end of this chapter. Be sure to also check out Chapter 9 as a source of inspiration: it includes more than two dozen recipes presented in a cookbook-style format. For example, the most popular retweet in the sample results originated from a user with a screen name of hassanmusician and was retweeted 23 times.
Neither the original tweet nor any of the other 22 retweets appears in the data set. Visualizing Frequency Data with Histograms A nice feature of IPython Notebook is its ability to generate and insert high-quality and customizable plots of data as part of an interactive workflow. Figure displays a plot for the same words data that we previously rendered as a table in Example The y-axis values on the plot correspond to the number of times a word appeared. Although labels for each word are not provided, x-axis values have been sorted so that the relationship between word frequencies is more apparent.
The plot can be generated directly in IPython Notebook with the code shown in Example A plot displaying the sorted frequencies for the words computed by Example 1.
Analyzing the Characters 37 www. For example, how many words have a frequency between 1 and 5, between 5 and 10, between 10 and 15, and so forth? A histogram is designed for precisely this purpose and provides a convenient visualization for displaying tabulated frequencies as adjacent rectangles, where the area of each rectangle is a measure of the data values that fall within that particular range of values.
Figures and show histograms of the tabular data generated from Examples and , respectively. A histogram gives us insight into the underlying frequency distribution, with the x-axis corresponding to a range for words that each have a frequency within that range and the y-axis corresponding to the total frequency of all words that appear within that range. Analyzing the Characters 39 www. A histogram of retweet frequencies The code for generating these histograms directly in IPython Notebook is given in Examples and Taking some time to explore the capabilities of matplotlib and other scientific computing tools is a worthwhile investment.
We started out the chapter by learning how to create an authenticated connection and then progressed through a series of examples that illustrated how to discover trending topics for particular locales, how to search for tweets that might be interesting, and how to analyze those tweets using some elementary but effective techniques based on frequency analysis and simple statistics. Even what seemed like a somewhat arbitrary trending topic turned out to lead us down worthwhile paths with lots of possibilities for additional analysis.
Closing Remarks 41 www. On the contrary, frequency analysis and measures such as lexical diversity should be employed early and often, for precisely the reason that doing so is so obvious and simple. The export of your account data includes files organized by time period in a convenient JSON format. What are the most common terms that appear in your tweets?
Who do you retweet the most often? How many of your tweets are retweeted and why do you think this is the case? The command-line tool Twurl is another option to consider if you prefer working in a terminal. Explore some of the advanced search features that are available for more precise querying. Online Resources 43 www. Facebook is arguably the heart of the social web and is somewhat of an all-in-one wonder, given that more than half of its 1 billion users1 are active each day updating statuses, posting photos, exchanging messages, chatting in real time, checking in to physical locales, playing games, shopping, and just about anything else you can imagine.
On the other hand, this great power commands great responsibility, and Facebook has instrumented the most sophisticated set of online privacy controls that the world has ever seen in order to help protect its users from exploit.
The notion of Facebook as an interest graph will come 1. The remainder of this chapter assumes that you have an active Facebook account, which is required to gain access to the Facebook APIs. For example, you might choose to share a link or photo only with a particular list of friends as opposed to your entire social network. Keep in mind that as a developer mining your own account, you may not have a problem allowing your own application to access all of your account data.
Do not use it for any development that you do with Facebook. Aside from being able to prepopulate and debug your access token, it is an ordinary Facebook app that uses the same developer APIs that any other developer application would use.
There are a few items to note about this particular query: Access token The access token that appears in the application is an OAuth token that is provided as a courtesy for the logged-in user; it is the same OAuth token that your application would need to access the data in question.
In short, OAuth is a means of allowing users to authorize third-party applications to access their account data without needing to share sensitive information like a password.
At this point, you could click on any of the blue ID fields in these nodes and initiate a query with that particular node as the basis. In network science terminology, we now have what is called an ego graph, because it has an actor or ego as its focal point or logical center, which is connected to other nodes around it.
An ego graph would resemble a hub and spokes if you were to draw it. For example, some recent investments in the Graph API resulted in a number of powerful new features, such as field expansion and nesting.
You can install this package in a terminal with the predictable pip install requests command. The query is driven by the values in the fields parameter and is the same as what would be built up interactively in the Graph API Explorer. Of particular interest is that the friends. Click the hyperlink that appears in your notebook output when you execute this code cell to see for yourself Be the first of your friends.
Throughout this section describing the implementation of OGP, the term Social Graph is generically used to refer to both the Social Graph and Open Graph, unless explicitly emphasized otherwise.
For example, at the time of this writing in early , Facebook has just started the process of launching its new Graph Search product to a limited audience. For example, users might be able to indicate that they have watched The Rock, since it is a movie. OGP allows for a wide and flexible set of actions between users and objects as part of the Social Graph. With Sean Connery, The delivery of rich metadata in response to a simple query is the whole idea behind the way OGP is designed to work.
With Sean Connery, Nicolas Cage, Publicly available. A JSON string. For example, appending a query string with a qualifier such as "? Various kinks in the spec have been worked out along the way, and some are still 58 Chapter 2: Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More www. Whether OGP and Graph Search will one day dominate the Web is a highly contentious topic, the potential is certainly there; the indicators for its success are trending in a positive direction, and many exciting things may happen as the future unfolds and innovation continues to take place.
This package contains a few useful convenience methods that allow you to interact with Facebook in a number of ways, including the ability to make FQL queries and post statuses or photos. However, there are really just a few key methods from the GraphAPI class defined in the facebook.
Analyzing Social Graph Connections 59 www. This example also introduces a helper function called pp that is used throughout the remainder of this chapter for pretty-printing results as nicely formatted JSON to save some typing. The advantage of the Graph API Explorer is the ease with which you can click on ID values and spawn new queries during exploratory efforts. Russell", 2. Analyzing Social Graph Connections 61 www. Analyzing Facebook Pages Although Facebook started out as more of a pure social networking site without a Social Graph or a good way for businesses and other entities to have a presence, it quickly adapted to take advantage of the market needs.
Fast-forward a few years, and now businesses, clubs, books, and many other kinds of nonperson entities have Facebook pages with a fan base. Analyzing Social Graph Connections 63 www. Refer back to Section 1. Your imagination is the only limitation to what you can ask of the Graph API for a Facebook page when you are mining its content for insights, and these questions should get you headed in the right direction.
The code is a quick one-liner that produces the results shown in Example Your exact query results may vary somewhat. Given that Mining the Social Web is a fairly niche technical book, this seems like a reasonable fan base. This summary was generated circa March Analyzing Social Graph Connections 65 www. Still, there are a couple of options to consider. It just so happens that each represents the same real-world concept.
Entity resolution is an exciting field of research that will continue to have profound effects on how we use data as the future unfolds. Analyzing Social Graph Connections 67 www. Hopefully, enhancements to the Graph API as part of the new Graph Search product will facilitate more sophisticated queries and lower the barriers to entry for data miners in the future. As you now know, the answer is just a couple of graph queries away, as illustrated in Example As one possible source of investigation, you might consult stock market information and see if the number of likes correlates at all with the overall market capitalization, which could be an indicator of the overall size of the companies.
It seems reasonable to think that each company probably has similar means available to it and probably sells similar amounts of product. A worthwhile exercise would be to drill down further and try to determine what might be the cause of this disparity. In approaching a question like this one, bear in mind that although there are likely to be indicators in the Facebook data itself, the overall scope is very broad, and there may be a number of dependent variables outside of what you might find in Facebook data.
Digging further into what now seems like a bit of a phenomenon is left as an exercise. After all, glossing over the data at a high level whenever possible is an essential prerequisite to programmatic analysis. In other words, you could just split the text into words by approximating word boundaries with whitespace and feed the words into a Counter to compute the more frequent terms as a starting point.
Analyzing Social Graph Connections 69 www. For example, are posts with links more popular than posts with photos? The differences between feeds, posts, and statuses can initially be a bit confusing.
See the Graph API documentation for a user for more details. The remainder of this section walks through exercises that involve analyzing likes as well as analyzing and visualizing mutual friendships. We'll use a dictionary comprehension to iterate over the friends and build up the likes in an intuitive way, although the new "field expansion" feature could technically do the job in one fell swoop as follows: g.
If you have a lot of Facebook friends, the previous query may take some time to execute. Consider trying out the option to use field expansion and make a single query, or try limiting results with a list slice such as friends[] to limit the scope of analysis to of your friends while you are initially exploring the data.
With the facebook package, you could do it like this: g. Analyzing Social Graph Connections 71 www. Example illustrates a variation of the previous example that shows how. Something that may shed further light on the situation and be compelling in and of itself is to calculate how many likes exist for each friend. For example, do most friends have a similar number of likes, or is the number of likes highly skewed?
Having additional insight into the underlying distribution helps to inform some of the things that may be happening when the data is aggregated. Analyzing Social Graph Connections 73 www. There are a number of directions that we could go in at this point.
One possibility would be to start to compare smaller samples of friends for some kind of similarity or to further analyze likes. Does Derek account for the most significant majority of liked music? The answers to these questions are well within your grasp at this point. Example illustrates how to compute the overlapping likes between the ego and friendships in the network as the first step in finding the most similar friends in the network.
Finding common likes between an ego and its friendships in a social network Which of your likes are in common with which friends? Remember, the ego of a social network is its logical center or basis. In this case, the ego of the network is the author of this book—that is, the person whose social network we are examining. Analyzing Social Graph Connections 75 www.
Example shows how to do this by iterating over the friendships with a double list comprehension and processing the results. Rich Froning Jr. Calculating the friends most similar to an ego in a social network Which of your friends like things that you like?
A quick histogram that shows how many friends. Analyzing Social Graph Connections 77 www. In other words, which of your friends are also friends with one another? From a graph analytics perspective, analysis of an ego graph for mutual friendships can very naturally be formulated as a clique detection problem. If Abe, Bob, Carol, and Dale were all mutual friends, however, the graph would be fully connected, and the maximum clique would be of size 4.
Histograms displaying data from Example 2. Analyzing Social Graph Connections 79 www. In the context of the social web, the maximum clique is interesting because it indicates the largest set of common friendships in the graph. Given two social networks, comparing the sizes of the maximum friendship cliques might provide a good starting point for analysis about various aspects of group dynamics, such as teamwork, trust, and productivity. Figure illustrates a sample graph with the maximum clique highlighted.
This graph would be said to have a clique number of size 4. An example graph containing a maximum clique of size 4 Technically speaking, there is a subtle difference between a maximal clique and a maximum clique. The maximum clique is the largest clique in the graph or cliques in the graph, if they have the same size.
A maximal clique, on the other hand, is one that is not a subgraph of another clique. Figure , for example, illustrates a maximum clique of size 4, but there are several other maximal cliques of size 3 in the graph as well. Just be advised that it might take a long time to run as graphs get beyond a reasonably small size hence, the aforementioned exponential runtime. Examples and demonstrate how to use Facebook data to construct a graph of mutual friendships and 80 Chapter 2: Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More www.
You can install NetworkX with the predictable pip install networkx from a terminal. Optimization with a thread pool or similar technique would be possible. Graph [ nxg. Finding and analyzing cliques in a graph of mutual friendships Finding cliques is a hard problem, so this could take a while for large graphs. Analyzing Social Graph Connections 81 www. Although the other person in common to all of the cliques is not guaranteed to be the second most highly connected person in the network, this person is likely to be among the most influential because of the relationships in common: Num cliques: 6 Avg clique size: 3 Max clique size: 4 Num max cliques: 4 Friends in all max cliques: [ "me", "Bas" ] Max cliques: [ [ "me", "Bas", "Joshua", "Heather" ], [ "me", "Bas", "Ray", "Patrick" ], [ "me", "Bas", 82 Chapter 2: Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More www.
Visualizing directed graphs of mutual friendships D3. You will be impressed. A tutorial of how to use D3 is well outside the scope of this book, and there are numerous tutorials and discussions online about how to use many of its exciting visualizations.
NetworkX can emit a format that is directly consumable by D3, and very little work is necessary to visualize the graph since IPython Notebook can serve and render local content with an inline frame by prepending files to the path. Example demonstrates how to serialize out the graph for rendering, and Example uses IPython Notebook to serve up a web page displaying an interactive graph like the one shown in Figure The HTML that embeds the necessary style and scripts is included with the IPython Notebook for this chapter in a subfolder of its resources called viz.
Analyzing Social Graph Connections 83 www. A graph of mutual friendships within a Facebook social network—you can generate graphs like this one by following along with the sample code in IPython Notebook Example Serializing a NetworkX graph to a file for consumption by D3 from networkx.
Visualizing a mutual friendship graph with D3 from IPython. Prepend the path with the 'files' prefix. Unlike data from Twitter and some other sources that are inherently more open in nature, Facebook data can be quite sensitive, especially if you are analyzing your own social network.
What are the most common topics being discussed? For example, what similarities and differences can you identify between fans of Chipotle Mexican Grill and Taco Bell?
Can you find anything surprising? Closing Remarks 85 www. What is the common glue that binds your network together? Can you examine objects such as photos or checkins to discover insights about anyone in your network? For example, who posts the most pictures, and can you tell what are they about based on the comments stream? Where do your friends check in most often? For example, can you plot where your friends live or where they grew up on a map?
Which of your friends still live in their hometowns? The Jaccard Index is a good starting point. See Section 4. Online Resources 87 www. Although LinkedIn may initially seem like any other social network, the nature of its API data is inherently quite different. People who join LinkedIn are principally interested in the business opportunities that it provides as opposed to arbitrary socializing and will necessarily be providing sensitive details about business relationships, job histories, and more.
The absence of such an API method is intentional. The remainder of this chapter gets you set up to access data with the LinkedIn API and introduces some fundamental data mining techniques that can help you cluster colleagues according to a similarity measurement in order to answer the following kinds of queries: 89 www.
Overview This chapter introduces content that is foundational in machine learning and, in general, is a bit more advanced than the two chapters before it. It is recommended that you have a firm grasp on the previous two chapters before working through the material presented here. Although most of the analysis in this chapter is performed against a comma-separated values CSV file of your LinkedIn connections that you can download, this section maintains continuity with other chapters in the book by providing an overview of the LinkedIn API.
Example illustrates a sample script that uses your LinkedIn credentials to ultimately create an instance of a LinkedInApplication class that can access your account data.
Notice that the final line of the script retrieves your basic profile information, which includes your name and headline. Before going too much further, you should take a moment to read about what LinkedIn API operations are available to you as a developer by browsing its REST documentation, which provides a broad overview of what you can do. Should you need to revoke account access from your application or any other OAuth application, you can do so in your account settings.
LinkedInApplication auth Use the app Contact Us. Upload eBook. Privacy Policy. New eBooks. Search Engine. This refreshed edition helps you discover who';s making connections with social media, what they';re talking about, and where they';re located. You';ll learn how to combine social web data, analysis techniques, and visualization to find what you';ve been looking for in the social haystack-as well as useful information you didn';t know existed.
0コメント