Hi,
I am using the MaxDiff project provided by Qualtrics. Overall, I love the tool and the functionality it offers.
However, in reviewing the results, it seems that there is disagreement between the output provided by the different forms of MaxDiff. I have a couple notes and confusions about the output of the MaxDiff summary in multiple points.
(1) Preference Share is inconsistent with Average Feature Utility in terms of rank ordering and in terms of estimates: In the two outputs, the Preference Share’s second ranked item becomes the fifth ranked item in the Average Feature Utility plot. There are other examples of this shifting of rank-ordering occurring.
The Qualtrics MaxDiff white paper states that the preference share “is derived by exponentiating the item utility and dividing that by the sum of all of the exponentiated items’ utilities.”. I extracted the average item utilities to test this myself and computed the exp(X_i)/sum(exp(X_j)) for all the items 1 through i. The probabilities I estimated using this procedure were in the same ball park but were definitely not the same as the probabilities provided by the Preference Share output.
Why is the rank-ordering different between Preference Share vs. Average Feature Utility? Why is the softmax procedure of generating probabilities from the item utilities not generating the probabilities provided by Preference Share?
(2) Feature Count and Average Feature Utility have different rank-orderings of items: In my MaxDiff, item ranked 2 in the Feature Count becomes item ranked 4 in the Average Feature Utility. I computed estimates and rankings using bwsTools package in R which fully agrees with the rankings provided by Feature Count but not Average Feature Utility. This inconsistency I am perhaps more able to make sense of than the problem described in (1), as I suspect it has something to do with Feature Count being ordering based on rankings of raw counts of Best minus Worst, or something of that nature? Meanwhile, the Average Feature Utility plot uses Hierarchical Bayesian model which leverages both individual-level and group-level information information to regularize estimates.
I would love if someone could help confirm or clarify the nature of why there may be certain discrepancies between Feature Count and Average Feature Utility? Does this occur due to Hierarchical Bayesian estimation whereby perhaps group-level estimation pulls the rank-ordering of item X down from #2 in Feature Count to #4 in Feature Utility?
Thank you for any information in clarifying these discrepancies in outputs provided.