28 November 2017

Some problems in a field study of sexual attraction and hitchhiking

This is the first in a series of blog posts by me and James Heathers on the research of Dr. Nicolas Guéguen, of the Université Bretagne-Sud in France. We will be examining one of Dr. Guéguen's studies in each post. Cathleen O'Grady at Ars Technica has written an excellent overview of the situation.

We start with this article:
Guéguen, N. (2012). Color and women hitchhikers’ attractiveness: Gentlemen drivers prefer red. Color Research & Application37, 76–78. http://dx.doi.org/10.1002/col.20651

Brief summary of the article
The author's hypothesis was that male (but not female) drivers would be were more likely to offer a lift to a female hitchhiker if she was wearing a red (versus any other colour) t-shirt. The independent variables were the colour of the woman's t-shirt and the sex of the driver, and the dependent variable was the driver's decision to stop or not.

Participants were drivers on a road in Brittany (western France). A female confederate wearing one of six colours of t-shirt (black, white, red, blue, green, or yellow) stood by the side of the road posing as a hitchhiker (with a male confederate stationed out of sight nearby for security purposes). She noted whether each driver who stopped to offer her a lift was male or female. In order to establish the number of drivers of each sex who drove along the road (whether or not they stopped), two other confederates were stationed 500 metres away in a car by the side of the road, facing the approaching traffic. As each car passed, they noted whether the driver was a man or a woman. Using the count of male vs female drivers who stopped, and the count of male vs female drivers who passed, it was found that male drivers were considerably more likely to stop when the hitchhiker was wearing a red t-shirt compared to any other colour.

There are some puzzling aspects to this article.

1. Number of volunteers
The article states (p. 77) that the five female confederates were chosen by a group of men who
rated the facial attractiveness of “18 female volunteers with the same height, with the same breast size (95 cm of bust measurement and bra with a ‘B’ size cup), and same hair color”.  It is interesting to think about how many there women must be in the volunteer pool at the Université Bretagne-Sud in order for 18 with the same height, bra size, and hair colour to put themselves forward to stand for hours on end (see point 4, below) to stop passing drivers.  Once the attractiveness of the participants had been established, the five who were rated closest to the middle of the scale were chosen, and "precautions were taken to verify that the rates of attractiveness were not statistically different between the confederates", whatever that means.  Oh, and "All of the women stated that they were heterosexuals"—presumably to ensure that they gave off the right vibes through the windscreens of approaching cars.

2. Two different sample sizes
There is a curious inconsistency between Table 1 and the main text.  In Table 1, the numbers of male and female drivers are listed as 3,474 and 1,776, respectively.  However, these two numbers sum to 5,250, rather than 4,800 (which was the sample size reported elsewhere in the article, with 3,024 male and 1,776 female drivers).  It is not clear how such an error might creep in by accident, since it requires two very different digits (4 instead of 0 and 7 instead of 2) to be mistyped.

3. The colours of the t-shirts
The article reports the colours of the T-shirts worn by the hitchhikers in very precise terms, even going so far as to give their HSL (Hue, Saturation, Luminance) values.  However, in several cases, those values do not correspond to the reported colours.  As the table here shows, the colour described by the HSL values corresponding to “red” is probably best described as a salmon-pink colour, while “yellow” is a very pale pink, and “blue” is pure white.

Stated colour
Hex colour

It would be interesting to learn how these HSL numbers were obtained, since several of them are so badly wrong. Indeed, it is not clear why it was considered necessary to report the colours with such precision; it would surely have been enough to state that bright, unambiguous examples of each color had been selected. For that matter, given how long it must have taken to test so many drivers (see the next point) and that the author had a clear hypothesis about the effects of the colour red, it is not clear why so many different colours of t shirt were tested.

4. How long did all this take?
The article states (p. 77) that “Each hitchhiker was instructed to test 960 drivers. After the passage of 240 drivers, the confederate stopped and was replaced by another confederate.” No indication is given of how long it took for 240 drivers to pass. However, the article also tells us (p. 77) that the research was conducted "at the entry of a famous peninsula of Brittany in France". So perhaps we can get a clue from another Guéguen article:

Guéguen, N. (2007b). Bust size and hitchhiking: A field study. Perceptual and Motor Skills, 105, 1294–1298. http://dx.doi.org/10.2466/pms.105.4.1294-1298

Yes, you read that right. Dr. Guéguen did indeed conduct a study to see whether women with larger breasts get more offers of lifts from men. (I bet you can't guess what the result was.) Anyway, that studywhich had a very similar procedure to the one we're discussing here, except that the, er, manipulated (!) independent variable was the apparent size of the female hitchhiker's breasts, rather than the colour of her t-shirtwas conducted "at the entry of a famous peninsula ('Presqu'Île de Rhuys') of Brittany in France". Assuming that the “famous peninsula” mentioned in both studies is the same place (which would make sense, if that location is indeed particularly propitious for lone female hitchhikers), and assuming similar traffic flows to those reported in the "bust size" study, in which the passage of 100 cars took “about 40 to 50 minutes” (p. 1296), we assume that it took between 1.5 and 2 hours for 240 cars to pass. In order to test 4,800 drivers, then, a total of 30 to 40 hours of testing would be required. The "t-shirt colour" article also states (p. 77) that the experiment “took place during summer weekends on clear sunny afternoons between 2 and 5 PM.” With three hours being available on Saturday and three more on Sunday, the experiment would thus have taken between five and seven complete weekends, assuming that every hour of testing time was sunny (a contingency that is far from guaranteed in Brittany). Yet none of the confederates who gave up multiple weekends to accomplish this Herculean task on behalf of psychological science are listed as co-authors, or even acknowledged in any way, in the resulting article.

Additionally, the design of the experiment appears to require that only drivers who were alone in their cars should be counted, since the purported effect of the red t-shirt was to increase the sexual attractiveness of the wearer. You might expect that if a male driver's wife is in the car, it could affect any sexually-motivated enthusiasm he might have for offering a lift to the hitchhiker, whatever the colour of her t-shirt; alternatively, a female driver might have been willing to help the hitchhiker had all of the seats in her car not been full of children. We have seen a statement by Dr. Guéguen in which he confirmed that "L’expérience n’incluait effectivement que des personnes seules. Les automobiles avec plusieurs personnes ne sont pas prises en compte dans l’étude" ("The experiment only included people [driving] on their own. Cars containing multiple people were not counted in the study"). So the figure calculated above for the number of hours and days taken to test the required number of drivers needs to be multiplied by some factor to take into account the percentage of cars with multiple occupants. Given that the study was carried out on sunny weekend afternoons in summer in an area with a substantial number of tourists, it seems reasonable that perhaps half of the cars driving along the road on a summer afternoon might have had more than one occupant, which would either double the number of weekends required for collection of the data to between 10 and 14, or in any case more than compensate in our calculations for any growth in local traffic since the "variable bust size" study was conducted.

Another way to think about the time involved is to consider the interactions of the hitchhiker with the drivers who stopped. Even if it took an average of only two minutes to catch up to the car where it stopped (probably some distance along the road from her), introduce herself, explain that there was an experiment taking place, "warmly thank" the driver, and return to her starting point, that would require nearly 10 hours (i.e., four afternoons) just for the 579 drivers (450 male, 129 female) who were reported as having stopped, even assuming that in every case a new driver then stopped immediately afterwards. If drivers were only stopping at the rate of one every five minutes overall (12 per hour), it would take 48 solid hours to test 579 drivers.

5. Problems with recording the sex of the drivers
As mentioned in the introduction, there were two observers whose job was to observe every passing car and record its driver's sex. (Per the previous point, it is worth thinking about the challenge of determining whether or not a driver is on their own in the vehicle, which requires, for example, determining whether a car driving past at around 20 metres per second does or does not have a small child in the back seat.) The article states (p. 77) that “[t]he convergence between the two observers’ evaluation was high (r = 0.97)”.

There is a major problem here. In order for a correlation coefficient to be calculated, we need more information than the simple total numbers of male and female drivers. Specifically, the two observers would need to independently record both the sex of each driver and the sequence in which those drivers were observed; for example, with ten drivers and disagreement about the sex of the third, the correlation between MFFFMMMFMM and MFMFMMMFMM would be .80. However, the article reports (p. 77) that each of the observers “used two hand-held counters, one to count the female motorists and the other to count the male motorists”. The term “hand-held counters” suggests simple mechanical devices, such as those used to count attendees at sporting events (such as this). But without synchronized timestamps across all four of these counters, or some other form of sequential tracking, it is not possible to establish the order in which the drivers passed each of the observers. More sophisticated methods of collecting and correlating these data can be imagined (for example, using laptop computers), but of course both observers had their hands full with the counters. With just a count of male and female drivers from each observer, stating a correlation coefficient makes no sense. It is therefore entirely unclear how the author could have established the correlation coefficient that he reported.

In view of the above points, it is not clear that how the study can actually have taken place as described in the article. As noted above, we have seen a statement by Dr. Guéguen (with whom we have been indirect contact for almost two years now, via the good offices of the French Psychological Society, about a number problems in several of his published articles; more on this to come in a subsequent post) concerning the question of whether only drivers who were on their own were tested. That statement did not, however, provide any specific or relevant answers to any of the other issues about this article that I have discussed here.

[[Update 2017-11-29 22:08 UTC: Added link to Cathleen O'Grady's article. ]]

11 November 2017

Don't stiff people who live from tips

I don't normally blog just because a tweet annoyed me (otherwise I'd be writing several dozen blog posts per day), but this tweet and its ensuing thread touched a nerve.
From his profile and recent tweets, the author seems like the kind of person with whom I probably share a very sizeable percentage of my political and social attitudes.  He follows me on Twitter; I would follow him back, except that I ration my follows simply to try and slow down the firehose of information.  In short, I'm sure he's a nice guy with progressive values.  But it seems that he and some of his followers have a rather different attitude to tipping to mine.

Consider the situation.  You're(*) standing outside a restaurant at a US airport, looking at the menu.  The airline has messed up your connection, so maybe a nice meal will help you feel a little better.  You fancy half a dozen oysters (maybe $15) and then perhaps the steak (maybe $28) and a beer ($7), so that'll be $50 in total.  Plus you'll need to add $5 for tax and $10 for a 20% tip.  So that will cost a total of $65.  Can you afford that?  Yes?  OK, let's go.  Those of us who are lucky enough to be able to afford to eat in restaurants might make similar decisions many times per year (give or take the tax and tip calculations, depending on where we spend most of our time.)

Then the meal happens.  I encourage you to read the Twitter thread (it's quite short) to see what happened.  The situation is not entirely black and white, and the details of the author's experience are not especially relevant to my point here, but to sum up, he was not very impressed with the overall level of the service he received. That's OK; disappointment is a normal part of the human condition and experience, especially in consumer and travel situations.

After a while, the check arrives.  It's for $50 plus $5 tax, and it may or may not have "Thank you" written on it by hand, perhaps even with a little smiley face, because scientific research that was totally not underpowered or p-hacked in any way has shown that when female (but not male, as hypothesised in advance by the remarkably prescient authors of that study) servers do this, they get bigger tips.  Remember, you have already budgeted $10 for the tip, but because you were unhappy with the service, you are thinking twice about whether to give it.  For support in that decision, you go on Twitter to ask people how much of that amount they think you should give or withhold.  And, because Twitter solidarity with your friends while they're travelling is a genuinely rather nice thing about the 21st century, within just a few minutes you have several replies:
So, the consensus was that the author should tip 10%, instead of the now-conventional (for the US) 20%.  A couple of people even suggested that he tip 0%, but he settled on 10%.  Under the reasonable (I think) assumptions about his meal that I made earlier, that means he left the waitress about $5 instead of $10.

Had I been watching the proceedings during the 17 minutes between the first tweet and the verdict, my answer would have been: you should withhold nothing.  Tip the 20%.  Give the waitress the full $10 that you presumably budgeted for from the start.  After all, once you decided to walk into the restaurant it was basically a sunk cost anyway.  You can't know all of the reasons why the service was slow, and even if you could somehow establish that it was entirely her personal fault (rather than that of other staff, or the restaurant, none of whom will be affected by your tipping decision), it doesn't matter anyway. Within five minutes of leaving the restaurant the bad service you got will be forgotten forever (not least because you will shortly be waiting in line at the boarding gate, or back on the phone to American Airlines about your connection, dealing with people who do not have the incentive of possibly losing a tip to encourage them to give you better service, ha ha).

There seems to be a pervasive idea in certain parts of the world (mostly North America and the UK) that serving in a restaurant is like being one of those street entertainers who juggle things in front of the cars at red lights.  Indeed, something to reassure you that it's OK to think that way is usually written on the menu in some form: "We do not impose a service charge, as we believe that our customers have the right to reward good service personally".  Well, I've got news for all you people who like to imagine that you are normally skeptical of capitalism: that is pure marketing bullshit.  What it means is, "If our menu prices were 10/15/20% higher, people would be less likely to come inside.  So we make the posted prices lower in order to entice you in, at no risk to us, and we let the staff play a kind of roulette with their income based, essentially, on your mood".  (However, for parties of 6 or more, the restaurant has to add a service charge because waitstaff know that groups are terrible tippers and so would otherwise try to avoid having them seated in their area of the restaurant.)

Think about this: if you were eating at a restaurant in a country where the pricing structure of restaurants is such that tipping is not expected (in some cases, it might even be regarded as slightly offensive), you would probably not go to the manager and request, say, an 8% reduction in the bill because the service was a bit sloppy.  And a big part of the reason why you wouldn't do that is because it would require you to actively do something to justify your claim, versus the far simpler act of not placing the second $5 bill on the plate.  As a result of this (entirely natural) behaviour by customers, waitstaff in countries with a tip-based wage model are essentially incentivised to be both happy-looking and efficient, every minute of their working day.  That is, frankly, an inhuman requirement (try it in your office job for, say, fifteen minutes).

Actually, I can think of a certain kind of person who I would expect to stiff people in these circumstances.  The current archetype of this kind of person has strange blond hair and a fake tan and plays a lot of golf and mouths off a lot about how bad almost everyone else in the world is.  If we were to learn that he tweeted his buddies and told them how much he was going to stiff a waitress, we wouldn't bother spending the energy on rolling our eyes.  There are endless stories about how this individual didn't pay bills sent to himself or his companies, because he decided he didn't like the service he received.

Don't be that person.  Don't, in effect, put working people on piecework rates ("$0.30 per smile") by deciding how much you will tip them based on how perfectly they do their not-particularly-desirable job, simply because the formal rules say that you legally can because the tip is optional.  Be a mensch, as I believe the expression goes.  Eat your oysters, add the going rate for the tip, pay the bill, get on your plane, and don't punish the waitress for working in a messed-up system that pits her against both you and her employer.  If you are the kind of person who can afford to dine on oysters at an airport restaurant prior to getting on a plane, then pretty much by definition $5 means more to the waitress than it does to you.

Here's my personal benchmark (your mileage may vary): I wouldn't withhold a tip in a country where it is expected as part of the staff's remuneration (e.g., the US, Canada, or the UK), unless the situation was sufficiently serious that I would be prepared to complain to the restaurant manager about it.  (For what it's worth, I have never been in a restaurant situation that was so bad that I felt the need to complain to the manager.)  If the server were to, say, cough violently into my food and then carry on in the hope that I didn't notice, then that's not a tipping matter.  But I don't like the idea of micro-managing the ups and downs of other people's workdays through small (to me) sums of money.  It just doesn't feel like something we ought to be doing on the way to building a nicer society.

If you still have a few minutes, please watch this video, where someone a lot more erudite than me makes a far better job of explaining the point I wanted to make here. If you're in a hurry, skip to 09:30.

(*) All references to "you" are intended to be to a generic restaurant customer, although obviously the example from the quoted tweets will be salient. I hope Malcolm von Schantz will forgive me for choosing the occasion of his Twitter thread as a reminder that this issue has bothered me for some time and was in a far corner of my "blog ideas back burner".