Why is sentiment analysis hard?

openlog 2016-04-26

展開全文

Sentiment analysis is the process of identifying people’s attitudes and emotional states from language. In Natural Language Processing, sentiment analysis is an automated task where machine learning is used to rapidly determine the sentiment of large amounts of text or speech. Applications include tasks like determining how excited someone is about an upcoming movie; correlating statements about a political party with people’s likeliness to vote for that party; or converting written restaurant reviews into 5-star scales across categories like ‘quality of food’, ‘a(chǎn)mbience’, and ‘value for money’.

With the amount of information that is shared on social media, forums, blogs, etc, it is easy to see why we need to automate sentiment analysis: there is simply too much information to manually process. The problem is that machine learning approaches are typically not that accurate. For a simple task separating ‘positive’ from ‘negative’ sentiment on social media, many automated solutions only perform with around 80% accuracy. This can still be useful for tracking broad trends over time, but it limits fine-grained analysis.

Some natural language processing tasks are much easier to automate. For example, imagine if you were separating articles about basketball from articles about football. There are many features that an automated approach can use to correctly guess which sport an article is about: the terminology specific to each sport, names of famous players, etc. We expect automated approaches to be almost equal to human performance for this kind of topic identification, even if they look at nothing more than the collection of words being used.

Mixed Sentiment

The subtleties of sentiment

The way that we express sentiment is a complex mix of the linguistic structures of our utterances and the assumed knowledge of the people who we are addressing. Unlike topic identification, pulling out a few key phrases like “good”, “bad” and “I love/hate” will only get you to around 70% accuracy. You need the machine learning algorithms to have a deeper understanding of the context. Compare these examples:

It was a great restaurant.
It should have been a great restaurant.
The restaurant was great in that it will make all future meals seem more delicious.
Despite a pleasant experience I can’t support the many reviews that it was a great restaurant.

The first sentence is positive, but the rest are all negative sentiment, despite having the first sentence embedded within them. The second sentence “should have been” indicates the desired outcome, leaving the actual sentiment implied (it’s possible that this sentence could followed by “… and it was!”, but the implication is that it was not). The third is the more complicated, bordering on the kinds of sarcasm that are very hard for machines to identify: even a person might misread this as positive when skimming quickly. The fourth statement is even more complicated because the overall sentiment is negative, but it begins with the (weakly) positive “pleasant experience” and also finishes by reporting that many other people expressed positive sentiment.

Abstracting beyond words and phrases

They key to understanding more complicated expressions is to allow the machine learning algorithms to understand language at a deeper level. For the fourth example above, this is possible by understanding the syntactic and semantic structures of the sentence itself. A recent paper by Richard Socher and colleagues at Stanford University shows a system that does exactly this:

Demonstration of negative scope over positive scope from http://nlp./sentiment/index.html (reproduced with permission)

Demonstration of negative sentiment taking scope over positive sentiment

The image is taken with permission from http://nlp./sentiment/index.html, which contains a live demo of the work of Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher Manning, Andrew Ng and Christopher Potts, to be presented in the paper “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank” at this year’s Conference on Empirical Methods in Natural Language Processing (EMNLP 2013).* The demo is fun to play around with: I recommend that you try it out!

The tree-structure in the image shows the recursive structure of the sentence itself, and how a correct parse will mean that the negation in the main verb of the sentence (“can’t support”) has scope over the positive phrases embedded within. The resulting orange-color of the root of the tree indicates that the sentiment was correctly identified as ‘negative’, meaning that Socher et al’s system gets this one correct. The results of their system improve previous systems by 5%, reaching about 85% accuracy. This might not sound like a lot, but research projects in Natural Language Processing typically only improve accuracy by around 1% a year (at best) so this is a significant leap.

Why so serious?

These examples are typical of the ways in which people express negative sentiment. In some cases, people choose to pad their criticisms with qualifiers to be polite (even on the internet), and more broadly, people just tend to be more creative in how they choose to describe the things that they don’t like. If Tolstoy worked on the problem he would no doubt say:

Happy sentiments are all alike; every unhappy sentiment is uniquely expressed.

This makes negative sentiment particularly tough for machine learning, as machine learning typically works by learning from examples. The greater the variety of the potential examples, the more likely that an ambiguous sentence appears, and the lower the accuracy as a result. Abstractions like the example above get us part of the way, but we are not yet at the point of reliably detecting sarcasm, irony, and other forms of expression where the literal meaning is opposite to the intended.

These socially determined negations get particularly complicated when we look across different languages, as every language does this a little differently. For example, in English we might call something ‘little’ when it is actually large (think “Little John” in Robin Hood), but in Spanish you might use the diminutive suffix ‘-ita’. Even within English, we get cultural differences, especially with the famously understated British English:

WHAT THE BRITISH SAY	WHAT THE BRITISH MEAN	WHAT FOREIGNERS UNDERSTAND
I hear what you say	I disagree and do not want to discuss it further	He accepts my point of view
With the greatest respect	You are an idiot	He is listening to me
That’s not bad	That’s good	That’s poor
That is a very brave proposal	You are insane	He thinks I have courage
Quite good	A bit disappointing	Quite good
I would suggest	Do it or be prepared to justify yourself	Think about the idea, but do what you like
Oh, incidentally/ by the way	The primary purpose of our discussion is	That is not very important
I was a bit disappointed that	I am annoyed that	It doesn’t really matter
Very interesting	That is clearly nonsense	They are impressed
I’ll bear it in mind	I’ve forgotten it already	They will probably do it
I’m sure it’s my fault	It’s your fault	Why do they think it was their fault?
You must come for dinner	It’s not an invitation, I’m just being polite	I will get an invitation soon
I almost agree	I don’t agree at all	He’s not far from agreement
I only have a few minor comments	Please rewrite completely	He has found a few typos
Could we consider some other options	I don’t like your idea	They have not yet decided

With thanks to the (unknown) author of this table, first posted by Duncan Green of Oxfam.

Despite the differences to English in the US and elsewhere, it is relatively easy for machines to learn that a “bit disappointed” means “annoyed” after the algorithms have seen a few examples. The crucial point is to know to apply ‘British English’ criteria. This is the most important next step in sentiment analysis: automatically knowing what kind of analysis to apply, depending on the genre, language or source of the utterance. By knowing something about the social and cultural context of the utterance, we can make smarter assumptions about the assumed the knowledge of speaker and more accurately tailor the sentiment predictions to specific types of communication. As for how we should do this in an automated system? We’ll leave that for a future post.

– Rob Munro

* Disclosure: two of the authors, Christopher Manning and Christopher Potts, are advisors to Idibon

Rob drives the company’s vision and strategic direction. He is a leader in applying big data analytics to human communications, having worked in diverse environments, from Sierra Leone, Haiti and the Amazon to London, Sydney and San Francisco. He completed a PhD in Computational Linguistics as a Graduate Fellow at Stanford University. Outside of work, he learned about the world’s diversity by cycling more than 20,000 kilometers across 20 countries.

Blog Home

Next Read

Bringing sentiment analysis into the real world: Chris Potts at Idibon

It’s long been known that sarcasm, metaphors, and complex sentence structure can all confound sentiment analysis models. When we try to make predictions about texts in isolation, we lose the social context and the personal history of the author, both of which can be strong predictors of sentiment. Professor Chris Potts, Associate Professor of Linguistics…

Next Read

Don’t mention museums! Tips for couchsurfers and sentiment analysers

I had the great pleasure of hosting a webinar with Vita Markman and Chris Potts. Vita joined us from LinkedIn where she is an engineer handling all sorts of natural language processing (NLP) tasks. Chris joined us from Stanford, where he is an associate professor of linguistics and director of the Center for the Study…

Next Read

Crazy good: More nuanced sentiment analysis

What’s “crazy” in social media? Good sentiment analysis even when things are complicated. (Short answer: movies, sports, life, women, & Kevin Durant.)

Join The Conversation (5)

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自： openlog > 《人工智能》

舉報(bào)/認(rèn)領(lǐng)