Social data analysis is the data-driven analysis of how people interact in social contexts, often with data obtained from social networking services. The goal may be to simply understand human behavior or even to propagate a story of interest to the target audience. Techniques may involve understanding how data flows within a network, identifying influential nodes (people, entities etc.), or discovering trending topics.
Social data analysis usually comprises two key steps: 1) gathering data generated from social networking sites (or through social applications), and 2) analysis of that data, in many cases requiring real-time (or near real-time) data analysis, measurements which understand and appropriately weigh factors such as influence, reach, and relevancy, an understanding of the context of the data being analyzed, and the inclusion of time horizon considerations. In short, social data analytics involves the analysis of social media in order to understand and surface insights which is embedded within the data.[1]
Social data analysis can provide a new slant on business intelligence where social exploration of data can lead to important insights that the user of analytics did not envisage/explore. The term was introduced by Martin Wattenberg in 2005[2] and recently also addressed as big social data analysis in relation to big data computing.
Systems are available to assist users in analyzing social data. They allow users to store data sets and create corresponding visual representations. The discussion mechanisms often use frameworks such as a blogs and wikis to drive this social exploration/Collaborative intelligence.
Obtaining social data
Social networking services are increasingly popular with the development of Web 2.0. Many of these services provide APIs that allow easy access to their data by responding to user queries with the requested data in the form of XML or JSON formatted strings. In order to protect privacy of their users, services such as Facebook require that the person requesting data has the necessary data access permissions. Services may also charge users for access to their data. Sources of social data include Twitter, Facebook, news websites, Wikipedia and We Feel Fine.
Some APIs only allow access to data in small quantities, hence indexing the data in bulk can become a challenge. Six_Apart was the first social media company to provide a (free) firehose of content for all the posts in their network (provided over XMPP). Twitter later came along and provided a firehose as did companies like Spinn3r, Datasift, and GNIP.
Methods of analysis
In most cases, we want to find out the relationships between social data and another event or we want to get interesting results from social data analyses to predict some events. There are some outstanding articles in this field, including Twitter Mood Predicts The Stock Market,[3] Predicting The Present With Google Trends[4] etc. In order to accomplish these goals, we need the appropriate methods to do the analyses. Usually, we use statistic methods, methods of machine learning or methods of data mining to do the analyses.
Universities all over the world are opening graduate program in Social Data Analysis.
Key concepts
When talking about social data analytics, there are a number of factors it's important to keep in mind (which we noted earlier):[1]
- Sophisticated Data Analysis: what distinguishes social data analytics from sentiment analysis is the depth of the analysis. Social data analysis takes into consideration a number of factors (context, content, sentiment) to provide additional insight.
- Time consideration: windows of opportunity are significantly limited in the field of social networking. What's relevant one day (or even one hour) may not be the next. Being able to quickly execute and analyze the data is an imperative.
- Influence Analysis: understanding the potential impact of specific individuals can be key in understanding how messages might be resonating. It's not just about quantity, it's also very much about quality.
- Network Analysis: social data is also interesting in that it migrates, grows (or dies) based on how the data is propagated throughout the network. It's how viral activity starts—and spreads.
See also
References