Things to consider before using sentiment analysis

How do you handle irony?
What is the age group you are surveying? (Different datasets are targeted at different age groups)
What slang words will the dataset not understand? And on the flip-side what technical language may be confused?
- Do certain slang terms mean different things in different countries? (e.g. "fag" in Britain means cigarette, but in America can be used as a derogatory term for homosexuals)
What are the differences in culture between those who are being surveyed? (i.e. certain colours may have different connotations in different areas)
How will you deal with compound sentences? (i.e. "This problem was making me so angry, thanks for fixing it")
How will it cope with emojis?
- How will you cope with differences in culture? 🙏 can mean prayer in Western, Christian societies but simply high-five in others.
- Might some emojis get confused based on your age group? 💀 can be used ironically by younger people but quite literally for those of an older generation
- Do you trust the emoji labels given by companies? 😥 is labelled as "DISAPPOINTED BUT RELIEVED FACE" by quackit.com but it could easily be confused as "disappointed and crying"
If you are making a training dataset, who are you creating the dataset for? What kind of people do you want to target in your survey?
Is it ethical to make decisions based off "mined" sentiment? (i.e. from using Twitter or other social media platforms)
Does tone or content affect sentiment polarity the most?
How might your system cope with idioms?
How might you deal with spelling errors? It is unlikely a sentiment system will notice that "hapy" is "happy"
How might it deal with phrases that come down purely to tone of voice? "It's not amazing" can be both used as something that is okay/quite good, or can come across as incredibly rude
What functions will you apply to your polarity scores? If you read 3.5.1 of [1] you can see some techniques you might use
At what cut-off point does polarity become positive or negative?
What sentences are difficult for simple sentiment analysers to work?
- "He is using you" Negative (marked as neutral, connotation not understood)
- "He is using the computer" Neutral
- "Sample mean." Neutral (marked as negative because of "mean")
Would analysing a likert scale make more sense?

Things to consider before using sentiment analysis

Further Reading