Developing adjective scales from user-supplied textual metadata

This page provides data and associated documentation for this talk:

Christopher Potts. 2011. Developing adjective scales from user-supplied textual metadata. NSF Workshop on Restructuring Adjectives in WordNet. Arlington,VA, September 30–Oct 1.

The goal of the talk is to develop and evaluate methods for using naturally occurring metadata (star ratings on service and product reviews) to inform WordNet annotators in constructing modifier scales.

Data

File (zipped CSV file): wn-asr-multicorpus.csv.zip

	Column name	Explanation
1	Word	In the format WORD/tag where tag is a or r
2	Rating	1..10 for IMDB; 1..5 for the other corpora
3	Category	Rating on the scale -0.5..0.5
4	Count	Token count for Word in reviews with Rating in Corpus
5	Total	Total token count for words in reviews with Rating in Corpus
6	Corpus	IMDB, Goodreads, OpenTable, Amazon/Tripadvisor

Single-word assessment values

File (zipped CSV file): wn-asr-multilevel-assess.csv.zip

	Column name	Explanation
1	Word	In the format WORD/tag where tag is a or r
2-5	fit1.coef1, fit1.coef1.p, fit1.coef2, fit1.coef2.p	The linear model coefficients with associated p-values; the fitted values can be obtained with `invlogit(fit1.coef1 + fit1.coef2*x)`
6-8	fit1.aic, fit1.bic, fit1.loglik	Values for assessing the goodness of fit for the linear model (Akaike Information Criterion, Bayesian Information Criterion, Log-Likelihood)
9-14	fit2.coef1, fit2.coef1.p, fit2.coef2, fit2.coef2.p, fit2.coef3, fit2.coef3.p	The quadratic model coefficients with associated p-values; the fitted values can be obtained with `invlogit(fit1.coef1 + fit1.coef2x + fit1.coef2x²)`
15-17	fit2.aic, fit2.bic, fit2.loglik	Values for assessing the goodness of fit for the quadratic model (Akaike Information Criterion, Bayesian Information Criterion, Log-Likelihood)
18	Inquirer	The Harvard Inquirer classification: Positiv, Negativ, Neutral; NA iff the word is not in the Harvard Inquirer
19	SentiWordNetPositive	The SentiWordNet positive score: [0-1] or NA iff the word is not in SentiWordNet
20	SentiWordNetNegative	The SentiWordNet negative score: [0-1] or NA iff the word is not in SentiWordNet
21	SentiWordNetPolarity	positive if SentiWordNetPositive > SentiWordNetNegative; negative if SentiWordNetPositive < SentiWordNetNegative, else neutral; NA iff the word is not in SentiWordNet
22	MicroWNOpPositive	The MicroWNOp positive score: [0-1] or NA iff the word is not in MicroWNOp
23	MicroWNOpNegative	The MicroWNOp negative score: [0-1] or NA iff the word is not in MicroWNOp
24	MicroWNOpPolarity	positive if MicroWNOpPositive > MicroWNOpNegative; negative if MicroWNOpPositive < MicroWNOpNegative, else neutral; NA iff the word is not in MicroWNOpNegative
25	MqapPolarity	positive, negative, or neutral; NA iff the word is not in the MQAP subjectivity lexicon
26	MqapStrength	1 if the strength is weaksubj; 2 if the strength is strongsubj; NA iff the word is not in the MQAP subjectivity lexicon
27	Predicted	If Model == Linear, then positive if fit1.coef2 (column 4) is ≥ 0, else negative; if Model == Quadratic, then positive if fit2.coef2 (column 11) is ≥ 0, else negative; if Model == None, then neutral
28	Model	Values: Linear, Quadratic, None. The preferred model choice: if only one is significant, then it is chosen; if both are significant, then we pick the one with the greater log-likelihood (columns 8 and 17); if neither model is significant, then we choose None. Throughout, the p-value threshold is < 0.05.
29	RawScore	fit1.coef2 (column 4) if Model == Linear; fit2.coef2 (column 11) if Model == Quadratic; else 0
30	NormedScore	RawScore z-score adjusted relative to the population of significant coefficients for fit1.coef2 or fit2.coef2, depending on which value is in RawScore

Word comparison assessment values

File (zipped CSV file): wn-asr-multilevel-cmp.csv.zip


1.	Word	In the format WORD/tag where tag is a or r
2	SimWord	In the format WORD/tag where tag is a or r; this word is related to Word via the WordNet similar_to relation
3	Polarity	The polarity assigned both by the MPQA subjectivity lexicon and the method proposed in the talk. (We limit attention to pairs where this category value is agreed upon; the classification experiments assess the agreement level for this problem.)
4	MqapWordStrength	The MPQA strength for Word: 1 == weaksubj; 2 == strongsubj
5	MqapSimStrength	The MPQA strength for SimWord: 1 == weaksubj; 2 == strongsubj
6	WordScore	Our predicted score for Word; same as NormedScore from wn-asr-multilevel-assess.csv
7	SimScore	Our predicted score for SimWord; same as NormedScore from wn-asr-multilevel-assess.csv
8	MqapCmp	Comparison value from MPQA: stronger if MqapWordStrength > MqapSimStrength; weaker if MqapWordStrength < MqapSimStrength; same otherwise
9	PredictedCmpInformal	Comparison value for our informal method: stronger if WordScore > SimScore; weaker if WordScore < SimScore; same otherwise
10-11	category.coef, category.p	Coefficient and p-value for the basic Category predictor in the comparison model
12-13	interaction.coef, interaction p	Coefficient and p-value for the interaction term Category*Stronger in the comparison model
14	PredictedCmpFormal	if category.p ≥ 0.05 or interaction p ≥ 0.05, same; else if sign(category.coef) == sign(interaction.coef), stronger; else weaker