Applications

Learn more about our current applications:

AppScience Analytics

Click here to check out our internally-developed Sentiment Analysis statistical model used behind ChatterStock, TweetVote, and Moodisphere.

Follow Us

Click here to visit us on Facebook.
AppScience Analytics

AppScience Analytics

AppScience Analytics in a Nutshell:

AppScience, LLC supports a number of applications driven by the internally-developed and maintained AppScience Analytics™ methodology—a statistical approach used to derive relative change in sentiment or public perception across locations, individuals, or concepts using real-time social media content.  The following Whitepaper outlines the business logic and statistical approach used within the model.

Overview

There is an emerging interest by both the academic community and private sector in the practical application of social-media content, attributed in large parts no doubt to its volume, general availability, and semantic value. One application gaining saliency is that of “Sentiment Analysis” or “Opinion Mining” which aims to derive public opinion and sentiments conveyed within written text. AppScience Analytics™ is a statistical methodology designed to abstract, from social media sources, the relative change of sentiment or public opinion for an entity of interest (e.g. a person, location, concept etc.)

Within the methodology, a composite sentiment score is calculated at the message-level (e.g. a “tweet” or “post”) and subsequently aggregated by entity into two time-based groupings—a rolling hourly value and a rolling historical baseline, thus providing the basis of the hypothesis test used to determine the relative fluctuation of an entity’s sentiment value within a point of time.

Sampling and Message-Level Computation

To obtain statistically valid samples of social messages within the geography or entity of interest, APIs from social-networking sites are polled at routine intervals. Collected messages are analyzed against a subjectivity lexicon tailored for social media content to produce a message-level composite score derived from polarity and severity scaling factors assigned to those key words.

The use of a message-level composite score, rather than an average of scored words across messages, proves helpful within the model by appropriately weighting the values of each grouped message.  Without composite scores, messages with intense use of either positive or negative words within the lexicon would not be represented fairly within the population.

Aggregation and Statistical Significance

A geometric mean is used to calculate the hourly and group averages, so as to account for undue influence by statistical outliers potentially introduced through the use of composite scores.  In such circumstances, outliers are eliminated if exceeding a ninety-nine percentile threshold.

The rolling hourly and baseline means within a grouped population are compared using a two-tail z-test to determine statistically significant fluctuations in sentiment within the category grouping.

The themed image (as seen in Moodisphere) is driven by the probability value from the standard normal distribution table and has confidence boundaries at each decile.  Therefore probability or “mood score”, ipso facto ranges from between 0% and 100%.

In addition to accounting for the natural variation with the two aggregate groups, the inferential test within the model further accounts for the inherent bias within the subjectivity lexicon in that a greater frequency of positive or negative words would introduce bias into a raw mean. As seen in the user-interface, decile-based confidence intervals are derived from the historical mean and standard error in which the moving hourly data point is measured (via the z-score). Hence positive or negative bias within the subjectivity lexicon is accounted for through the adjusted baseline. In other words, the probability value is always seen relative to the historical baseline to determine significance changes in sentiment or public perception.

A further consideration was that the relative baseline would cause the probability value to assume varying meaning across aggregate groups. For example, a positive sentiment score showing relative improvement in sentiment could in fact have a lower raw hourly raw value (or hourly mean) than another location with less relative improvement. Analysis of the data across locations showed that the upper and lower bounds became increasingly similar as greater volumes of data were collected (greater than 4 million), therefore making the meaning of the probability score more consistent across locations or aggregate groups.

The AppScience Analytics methodology was created and is routinely managed by AppScience LLC.  Questions and comments can be directed to info@appscience.org.