While formal market research has historically represented consumer-centric information, social listening is now considered the most insightful. Review the model, Figure 1, the schematics of social listening, and its application to the social media platform Twitter. List the advantages and disadvantages of social listening for strategic insight, paying particular attention to the error factors discovered from this study.
Twitter is a social media sharing site considered a snapshot of consumer and industry sentiment. Review Figure 1 in this article, then identify how the collective opinion fits into the signal of opportunity. Pay attention to the secondary retweets that measure the interest level of those who have chosen to follow the tweet trails.
We present the details of how to discover a target audience of Twitter users and their collective voice from raw Twitter data. First, in order to identify candidate users that meet certain criteria, we explore available Twitter resources for data collection and existing approaches to user profiling. Next, we discuss enriching user profiles utilizing hashtags in the tweets posted by the target users. Lastly, we present developing topical and social insights from the collective voice of the target users.
Before we go into details, we first present formal modeling of the data space that we analyze in this paper. Our Twitter data space can be noted as , where is a set of users on Twitter, is a set of tweets created by the users, and is a set of hashtags used in the tweets by the users. This implies that a user creates a tweet using a set of hashtags .
User profiling is an essential component to our approach, which defines user attributes needed for a study and populates the attribute values for each user. We define the profile of a Twitter user u∈U as a set of tuples consisting of an attribute and its value where, with respect to user u for an attribute a∈A, its value p(u, a) is computed by a user profiling function p, as in Eq. (1):
(1)
where A is a set of user attributes. Determining the user profiling function p for each user attribute is the goal of the user profiling phase.
Fig. 1
The flow map of our unified scheme for developing social insights from the collective voice of target users
Figure 1 illustrates the flow of our unified scheme for developing social insights from the collective voice of target users. First, attributes of Twitter users are identified in the user profiling stage such as demographic attributes and other personal attributes. When some user attributes are missing due to data availability, researchers can consider developing their own customized solution to a specific user profiling task. A supervised machine learning model can be built by utilizing hashtags as the features for prediction. Second, once this user profiling phase is completed, researchers select only the users of interest based on the identified user attributes. Finally, researchers proceed to develop topical and social insights from the collective voice of these target users.
In general, sampling of Twitter users is less common than sampling of tweets due to the limited functionality of Twitter API for collecting users. For this reason, we begin with a large pool of random tweets, which are known to be much easier to collect via Twitter API mentioned earlier in "Introduction" section. Each tweet collected contains author information describing the user who created the tweet. Some user attributes for the users in the pool are already known or can be easily acquired, while other attributes need to be inferred, are difficult, or impossible to identify. It is worth noting that raw user data collected from Twitter via Twitter API provides surprisingly useful information about users. Table 2 lists native Twitter objects and their fields along with user attributes that can be derived from the fields. Twitter API provides several types of objects encoded in JavaScript Object Notation (JSON), of which User and Tweet objects are the most useful in user profiling.
Table 2 Summary of the user attributes derivable from native Twitter objects
Object |
Field |
Description |
Derivable user attributes |
---|---|---|---|
User |
name |
Name of the user |
Name, gender, age, race/ethnicity |
location |
User-defined location for the account's profile |
Location |
|
url |
URL provided by the user in association with their profile |
Web site, blog, or other social media accounts |
|
description |
User-defined description of their account |
Demographics, expertise, hobbies, interests, personality traits, political orientation |
|
verified |
Whether Twitter has verified that the account of public interest is authentic |
Popularity |
|
followers_count |
Number of users following the account |
Popularity |
|
friends_count |
Number of users the account is following |
Sociability |
|
listed_count |
Number of public lists that the user is a member of |
Popularity |
|
favourites_count |
Number of tweets the user has liked in the account's lifetime |
Posting activeness |
|
statuses_count |
Number of tweets (including retweets) issued by the user |
Posting activeness |
|
created_at |
UTC datetime that the user account was created on Twitter |
Account age |
|
profile_image_url_https |
HTTPS-based URL pointing to the user's profile image |
Gender, age, race/ethnicity |
|
followers* |
List of users following the account |
Network |
|
friends* |
List of users the account is following |
Network |
|
Tweet |
created_at |
UTC time when the tweet was created |
Behavior |
text |
Actual text of the status update |
Demographics, expertise, interests, personality traits, political orientation |
|
coordinates |
Geographic location of the tweet as longitude and latitude coordinates |
Location, behavior |
|
place |
Known place as city, state, or country |
Location, behavior |
|
reply_count |
Number of times the tweet has been replied to |
Popularity |
|
retweet_count |
Number of times the tweet has been retweeted by other users |
Popularity |
|
favorite_count |
Number of times the tweet has been liked by other users |
Popularity |
|
lang |
Machine-detected language of the tweet |
Language |
|
retweeted_status |
Original tweet object if the tweet is a retweet |
Typical tweet or retweet |
A User object, which describes an individual user on Twitter, has several fields that can be directly used as user attributes, such as name, location, and url, while the other fields can be analyzed to infer new attributes. For example, from the description field that has a user-defined description or bio of an account, one can infer many different types of user attributes, such as demographic attributes (e.g., age, education, gender, location, marital status, language, occupation, and race/ethnicity) and other personal attributes (e.g., expertise, hobbies, interests, personality traits, and political orientation), depending on the information included in the text of the field. A wide range of natural language processing (NLP) and text mining techniques can be applied to this field. The other fields in a User object can be good indicators of the account's popularity, sociability, or activeness. For example, the followers_count and the listed_count fields indicate how popular the account is, while the friends_count field indicates how sociable the account is. One may want to compare the followers_count to the friends_count, to see if there is a large or small gap between the two fields. For example, celebrities tend to have a very large number of followers but a smaller number of friends, whereas spam accounts or bot accounts tend to have many friends but few followers.
The favourites_count and the statuses_count fields can be used to measure how active the account is in terms of posting tweets. The created_at field can be used to calculate the account age in days, months, or years, which can be combined with other fields for normalization. For example, users who have been using Twitter for ten years would probably have more followers or have posted more tweets than those who just began to use Twitter. In this case, one may need to divide the number of followers or number of statuses by the account age, so that the indicators can be normalized for each user.