Things to consider before using sentiment analysis
- How do you handle irony?
- What is the age group you are surveying? (Different datasets are targeted at different age groups)
- What slang words will the dataset not understand? And on the flip-side what technical language may be
confused?
- Do certain slang terms mean different things in different countries? (e.g. "fag" in Britain means
cigarette, but in America can be used as a derogatory term for homosexuals)
- What are the differences in culture between those who are being surveyed? (i.e. certain colours may have
different connotations in different areas)
- How will you deal with compound sentences? (i.e. "This problem was making me so angry, thanks for fixing
it")
- How will it cope with emojis?
- How will you cope with differences in culture? 🙏 can mean prayer in Western, Christian
societies but simply high-five in others.
- Might some emojis get confused based on your age group? 💀 can be used ironically by younger
people but quite literally for those of an older generation
- Do you trust the emoji labels given by companies? 😥 is labelled as "DISAPPOINTED BUT
RELIEVED FACE" by quackit.com
but it could easily be confused as "disappointed and crying"
-
If you are making a training dataset, who are you creating the dataset for? What kind of people do you want
to target in your survey?
-
Is it ethical to make decisions based off "mined" sentiment? (i.e. from using Twitter or other social media
platforms)
-
Does tone or content affect sentiment polarity the most?
-
How might your system cope with idioms?
-
How might you deal with spelling errors? It is unlikely a sentiment system will notice that "hapy" is
"happy"
-
How might it deal with phrases that come down purely to tone of voice? "It's not amazing" can be both used
as something that is okay/quite good, or can come across as incredibly rude
-
What functions will you apply to your polarity scores? If you read 3.5.1 of [1] you can
see some techniques you might use
-
At what cut-off point does polarity become positive or negative?
-
What sentences are difficult for simple sentiment analysers to work?
-
"He is using you" Negative (marked as neutral, connotation not understood)
-
"He is using the computer" Neutral
-
"Sample mean." Neutral (marked as negative because of "mean")
-
Would analysing a likert scale make more sense?
Further Reading
-
On the Subjectivity of Emotions in Software Projects: How
Reliable are Pre-Labeled Data Sets for Sentiment Analysis?