Nick Brown's blog: More problematic sexual attraction research, this time with high heels

Another post about some strange issues in the work of Dr. Nicolas Guéguen. Today's article is:

Guéguen, N. (2015). High heels increase women's attractiveness. Archives of Sexual Behavior, 44, 2227–2235. http://dx.doi.org/10.1007/s10508-014-0422-z

There are four studies reported in this article; I want to concentrate on Study 4, although as you will see if you read the whole thing, there are plenty of questions one could ask about the other studies as well.

Brief summary of the study

Participants were male customers in bars. The author's hypothesis was that men would be quicker to approach a woman drinking on her own in her bar if she was wearing shoes with high (versus medium or flat) heels. A female confederate was instructed to sit on her own "at a free table near the bar where single men usually stand" (p. 2231). She was identically dressed in all conditions apart from the size of her heels, and she was told to "cross her legs on one side so that people around could clearly view her shoes" (p. 2231). Meanwhile, two male observers seated nearby timed how long it took before a man approached the female confederate. When this happened, she told the man that her friend was expected to arrive shortly, and one of the observers then "arrived" to meet her, thus ending the interaction with the participant. If no contact was made within 30 minutes, the confederates were instructed to leave the bar.

The results showed that the mean time before a male customer of the bar approached the woman was lower when her heels were higher. This difference was statistically significant for high heels (versus medium or flat heels), but not for medium heels (versus flat heels). Although it was not reported whether contact was made in every case, the degrees of freedom of the reported ANOVA imply that it was, even when the woman was wearing flat shoes.

There are a few readily apparent problems with this study.

1. The research design is inefficient and implausible

This study seems to be a very inefficient way of gathering data. You need three young volunteers (it's not exactly clear why two male confederates were necessary rather than just one) to give up their Wednesday and Saturday evenings for six straight weeks. They have to visit three bars each time, and no mention is made of funding to pay for the drinks that they would presumably need to buy in order to maintain their credibility as ordinary customers. As soon as contact is made between a participant and the female confederate, data collection ends. The three confederates leave the bar and walk to the next, taking care to spend half an hour on the walk so they don't arrive too early for the next session. (Or maybe they drive and spend 26 minutes chatting in the car. Sounds like fun.) And after all this, you get a maximum of three data points in an entire evening.

Even the choice of "time taken before someone approaches the female confederate" as the dependent variable seems strange. Let's imagine for a moment that you are the kind of man who goes to bars in the hope of meeting attractive single women. Today is your lucky day; one such individual has just come into the bar and sat down on her own, close to where you hang out with your fellow bachelor drinkers. She is wearing "a skirt and an off the shoulder tight fitting top" (p. 2231). You have to decide whether or not to approach her (presumably before anyone else does, if I may be allowed to show off my limited knowledge of what one might call "folk evolutionary psychology" for a moment). The apparent claim of the study is that the degree of sexual availability conveyed only by the height of the woman's her heels will affect, not whether you ultimately decide to approach her or not, but how long you will hesitate before doing so. I don't find this very convincing. What else are all of the single males in the bar (the number of whom, incidentally, is not reported anywhere in the article) thinking about during that time? Whether they can get a "better deal" if her identical twin appears at the next table wearing slightly higher heels? See also point 4, below.

2. Repeated use of the same bars

The study took place in each of three different bars on twelve different nights (Wednesdays and Saturdays). The same female confederate thus made twelve visits to each bar, in each case sitting on her own at a table "near the bar where single men usually stand" (p. 2231). You might imagine that the staff or the regular customers of the bar might notice what was going on, as a different man each evening attempted to make contact with the same female confederate who was always identically dressed (apart from her heels) and sitting in an area of the bar where one might not expect a woman who was waiting for her boyfriend to feel comfortable, only to be told that her friend would be arriving shortly (which, indeed, transpired every time). But the article describes no precautions that might have been taken to deal with this issue, which has obvious implications for the validity of the study. After a few visits, the regular customers might have started taking bets among themselves as to who was going to try his luck this evening (perhaps trying not to giggle as he introduced himself with "Hello, I’ve never seen you here before"), only for the woman's boyfriend to show up immediately afterwards. Even if the staff were aware of the experiment, it would seem to be hard to take into account the possible range of behaviours of young single men in a bar, especially just before midnight on a Saturday evening.

3. The effect size is huge

Remember that the only difference between the conditions was the height of the woman's heels, which, even with her legs crossed as described in the article, were probably not going to be something that many people --- even single men on the lookout for some action --- would necessarily even notice. Yet, Cohen's (1988, pp. 274–277) formula gives an effect size (f) of 0.67 for the numbers in Table 4 of Guéguen's article, which corresponds (for k=3 groups) to 1.64 in the more familiar terms of Cohen's d. Such effect sizes are very rare in psychological studies, and indeed in real life (James will be covering this in his next post). It seems highly implausible that a manipulation of this kind could have such an effect.

4. The pattern of behaviour by the men is very strange
Despite my advanced age, my personal lifetime experience of hanging around in bars waiting to hit on single women is exactly zero. However, it seems to me that for individuals who list that particular activity as one of their hobbies, time is probably of the essence. If you're going to start talking to a girl who has just sat down and crossed her legs so you can see how high her heels are, you probably want to do it fairly quickly, if only to stake your claim before any of your buddies does.

So what would we expect the distribution of the waiting-time-until-contact to look like? I don't think we can apply something like queuing theory here since the behaviour of the men probably can't be assumed to be random, but I'm guessing it's likely to look like some kind of Poisson or negative binomial distribution, with a lot of guys trying their luck in the first few minutes, resulting in a big right skew.

So I decided to simulate some data. For each condition, I generated 12-item samples from a uniform distribution, with a minimum of 0 minutes and a maximum that I determined with some preliminary testing to be the largest possible time that could give the mean and SD reported in the article, plus or minus 0.05 in each case. I ran this simulation until I had 400 samples for each condition, which required about 250 million iterations per condition. Then I plotted the simulated amounts of time to make contact, to the nearest minute, from those samples:

Given that the high heels were meant to be especially irresistible, you might expect a certain number of contacts to have been made within the first minute in that condition. But you can see from the plot that in the high heels condition (blue bars) that no values below 2 minutes were returned by my simulation. In fact when I forced one of the 12 values in the sample to be 30 seconds, I didn't find a single valid sample in 100 million iterations in the high heels condition. When I set the minimum to 1 minute, I found three valid samples, but they all had looked weird: the value of 1 meant that the other values were all very close to 8 minutes (i.e., when the woman was wearing high heels, if one man approached her after a minute, the other 11 would all have had to approach her after 8 minutes, plus or minus a few seconds).

You can also see in the above plot that the aggregate of the simulated values in each condition is nicely normally distributed. The most highly skewed 12-item samples were not in fact very badly skewed at all; for example, here is the most right-skewed sample out of 400 in the high heels condition:

So even here, we can see that these single men are taking a certain amount of time before talking to the woman, even though their tongues are apparently all hanging out at the height of her heels. The limiting factor here is that the standard deviations (4.87, 3.67, and 2.18 minutes, for the flat, medium, and high heels conditions, respectively) are too small, relative to the range of values allowed (0 to 30), to allow any of the 12 responses to be very far from the others (or, if one value is a little bit further away, this requires all of the others to bunch up). As we saw in James's post about dead plants and global warming, the subjects in this study all appear to be intensely moderate in their behaviour; the manipulation (increasing the size of the woman's heels) simply reduces the diversity of that moderation somewhat.

5. The reported statistics are incorrect

Readers who are familiar with some recent corrections of work from the Cornell Food and Brand Lab may have been anticipating this problem: the reported F statistic (7.18 with 2 and 33 degrees of freedom) is incorrect. With the given means and SDs, the correct F statistic should be between 8.06 and 8.16, depending on rounding. This does not change the statistical significance of the reported result, but it makes one wonder what numbers were run in order to produce the incorrect F statistic, and where those numbers came from. (Just as an aside, the standard deviations appear to be substantially different between the groups, but no indication is given in the article about whether the standard ANOVA checks for homogeneity of variance were made; however, given the context, perhaps asking for this is like criticising Donald Trump for not having his tie straight.)

Conclusion

The report of this study sounds like it is describing a thought experiment for an undergraduate methods class (in a world where nobody is too concerned about crass sexist stereotypes), rather than the results of a field experiment carried out under real-world conditions. The premise is based on a pastiche of evolutionary psychology (skeptics of this subfield can fill in their own joke here), the scenario is a minefield of strange decisions, the effect size is absurdly large, the implied behaviour patterns of the participants are weird, and the statistics haven't been reported correctly. Yet, this article was the subject of uncritical pieces in Huffington Post (under the headline "High Heels Increase A Woman's Attractiveness, And For Once It's Not A Bogus Survey"), the Boston Globe, and Psychology Today (twice). It seems that there is quite a market for sexist junk science out there.

9 comments:

Random Factor Y5 December 2017 at 09:18
Please be quick with the other posts. On his website, Dr. Gueguen announces that he will soon publish a book, in French, on 'The psychology of sexuality'. So he claims to be an expert on the subject. Maybe just like Freud, he has a rich dream life and personal experience in the use of cocaine, I don't know.
Klaas van Dijk5 December 2017 at 14:18
Are there any details in this kind of articles of Guéguen about the backgrounds of these volunteers (students?, for example in the acknowledgements)?

See, eg, https://www.rug.nl/research/portal/files/49241950/Chapter_2.pdf ("We also thank the animal ecology course students from 2012 to 2014 for gathering data.") and https://www.rug.nl/research/portal/files/47421513/Chapter_3.pdf ("We also thank the animal ecology course students from 2010 to 2012 for helping to gather the data.").
Janet Lafler5 December 2017 at 21:40
I can see a couple of other problems, one with the study design and one with data interpretation.

1. The study design doesn't take into account how long the man who approaches the woman has been in the bar. What if she'd been sitting there for 20 minutes when he entered and immediately approached her? Of course it would be nearly impossible to collect this kind of information, because the observers wouldn't know in advance which man would approach the woman, and therefore would have to keep track of all of the men in the bar and when they arrived, or got back from a trip to the bathroom, etc.

2. Clothing has a social meaning. A man might think (or subconsciously assume) that a woman wearing high heels is dressing to attract male attention, and therefore that she might be more open to being approached.
Anonymous6 December 2017 at 10:16
The paper is obviously a steaming pile of **** - it hardly seems worth even reading it let along carrying out a time consuming forensic analysis. It clearly stinks. And even if the study had been carried out effectively – what’s the point? Manolo Blahnik wouldn’t be able to sell heels for $1000 if they were having zero effect on women’s attractiveness.

However, #4 is not persuasive. Essentially you have demonstrated that, given the means and SDs reported in the paper, it is extremely unlikely that any men approached the woman less than 2 minutes after she sat down.

But the author never claimed that anyone approached the woman in less than 2 minutes. So, this criticism boils down to the fact that the data don’t fit with your intuitions about how men in a bar will behave.

Other people’s intuitions might be that – given that n = 12 in the high heels condition – it really isn’t that implausible that not one approached in < 2 mins (particularly since “nonverbal behavior such as a fixed gaze or a smile” were not counted as an approach).

Now if the author had reported that some men had approached the woman in less than 2 minutes – then your analysis would obviously show that this was extremely implausible given the reported means and SDs.
Nick Brown6 December 2017 at 12:41
You have a fair point, although as the charts show, the conditions in which men were more likely to approach the woman in the first couple of minutes are those in which she does not have high heels. It is sometimes difficult to maintain full analytical rigour when confronted with a study that appears to "offer multiple opportunities for improvement" (had I written what I really think about this study, I would doubtless have been accused of adopting an unprofessional tone).

The broader point is that "mean time before approaching" clearly does not make sense as a model for the effect of high heels on women's attractiveness, because the author has not established any reason why this should be normally distributed.

04 December 2017

More problematic sexual attraction research, this time with high heels

9 comments: