21 May 2015

What to do with people who commit scientific fraud?

Another story of apparent scientific fraud has hit the headlines.  I'm sure that most people who are reading this post will have seen that story and formed their own opinions on it.  It certainly doesn't look good.  And the airbrushing of history has already begun, as you can see by comparing the current state of this page on the website of the MidWest Political Science Association with how it looked back in March 2015 (search for "Fett" and look at the next couple of paragraphs).  Meanwhile, Michael LaCour hastily replaced his CV (which was dated 2015-02-09) with an older version (dated 2014-09-01) that omitted his impressive-looking list of funding sources (see here for the main difference between the two versions); at this writing (2015-05-22 10:37 UTC), his CV seems to be missing entirely from his site.

This rapidly- (aka "hastily-") written post is in response to some tweets calling for fraudsters to be banned from academia for life.  I have a few problems with that.

First, I'm not quite sure what banning someone would mean.  Are they to have "Do Not Hire In Any Academic Context" tattooed on their forehead?  In six languages?  Or should we have a central "Do Not Hire" repository, with DNA samples to prevent false identities (and fingerprints to prevent people impersonating their identical twin)?

Second, most fraudsters don't confess, nor are they subjected to any formal legal process (Diederik Stapel is a notable exception, having both confessed in a book [PDF] and been given a community service penalty, as well as what amounts to a 6-figure fine, by a court in the Netherlands).  As far as I can tell, these people tend to deny any involvement, get fired, disappear for a while, and then maybe turn up a few years later teaching mathematics at a private high school or something, once the publicity has died down and they've massaged their CVs sufficiently.  Should that be forbidden too?  How far do we let our dislike of people who have let us down extend to depriving them of any chance of earning a living in future?

After all, we rehabilitate people who kill other people; indeed, in some cases, we rehabilitate them as academics.  And as the case of Frank Abagnale shows, sometimes a fraudster can be very good at detecting fraud in others.  Perhaps we should give the few fraudsters who confess a shot at redemption.  Sure, we should treat their subsequent discoveries with skepticism, and we probably won't allow them to collect data unsupervised, but by simply casting them out, we miss an opportunity to learn, both about what drove (and enabled) them to do what they did, and how to prevent or mitigate future cases.  We study all kinds of unpleasant things, so why impose this blind spot on ourselves?

Let's face it, nobody likes being the victim of wrongdoing.  When I came downstairs a couple of years ago to find that my bicycle had been stolen from my yard overnight, the one time that I didn't lock it because it was raining so hard when I arrived home that I didn't want to stay out in the rain a second longer to do it, I was all in favour of the death penalty, or at the very least lifelong imprisonment with no possibility of parole, for bicycle thieves.  The inner reactionary in me had come out; I had become the conservative that apparently emerges whenever a liberal gets mugged.  Yet, we know from research (that we have to presume wasn't faked --- ha ha, just kidding!) that more severe punishments don't deter crime, and that what really makes a difference [PDF] is the perceived chance of being caught (and/or sentenced).  And here, academia does a really, really terrible job.

First, our publishing system is, to a first approximation, completely broken.  It rewards style over substance in a systematic way (and Open Access publishing, in and of itself, will not fix this).  As outside observers of any given article, we are fundamentally unable to distinguish between reviewers who insist on more rigour because our work needs more rigour, and those who have missed the point completely; anyone who has had an article rejected from a journal that has also recently published some piece of "obvious" garbage will know this feeling (especially if our article was critical of that same garbage, and seems to be being held to a totally different set of standards [PDF]).

Second, we --- society, the media, the general public, but also scientists among ourselves (I include myself in the set of "scientists" here mostly for syntactic convenience) --- lionize "brilliant" scientists when they discover something, even though that something --- if it's a true scientific discovery --- was surely just sitting there waiting to be discovered. (Maybe this confusion between scientists and inventors will get sorted out one day; I think it's a very fundamental problem. Perhaps we would be better off if Einstein hadn't been so photogenic.) And that's assuming that what the scientist has discovered is even, as the saying goes, "a thing", a truth; let's face it, in the social sciences, there are very few truths, only some trends, and very little from which one can make valid predictions about people with any worthwhile degree of reliability. (An otherwise totally irrelevant aside to illustrate this gap: one of the most insanely cool things I know of from "hard" science is that GPS uses both special and general relativity to make corrections to its timing, and those corrections go in opposite directions.) We elevate the people who make these "amazing discoveries" to superstar status. They get to fly business class to conferences and charge substantial fees to deliver a keynote speech in which they present their probably unreplicable findings.  They go on national TV and tell us how their massive effect sizes mean that we can change the world for $29.99.

Thus, we have a system that is almost perfectly set up to reward people who tell the world what it wants to hear.  Given those circumstances, perhaps the surprising thing is that we don't find out about more fraud.  We can't tell with any objectivity how much cheating goes on, but judging by what people are prepared to report about their own and (especially) their colleagues' behaviour, what gets discovered is probably only the tip of a very large and dense iceberg. It turns out that there are an awful lot of very hungry dogs eating a lot of homework.

I'm not going to claim that I have a solution, because I haven't done any research on this (another amusing point about reactions to the LaCour case is how little they have been based on data and how much they have depended on visceral reactions; much of this post also falls into that category, of course).  But I have two ideas.  First, we should work towards 100% publication of datasets, along with the article, first time, every time.  No excuses, and no need to ask the original authors for permission, either to look at the data or to do anything else with them; as the originators of the data, you'll get an acknowledgement in my subsequent article, and that's all.  Second, reviewers and editors should exercise extreme caution when presented with large effect sizes for social or personal phenomena that have not already been predicted by Shakespeare or Plato.  As far as most social science research is concerned, those guys already have the important things pretty well covered.

(Updated 2015-05-22 to incorporate the details of LaCour's CV updates.)

09 May 2015

Real-time emotion tracking by webcam

The European Commission is giving financial backing to a company that claims its technology can read your emotional state by just having you look into a webcam.  There is some sceptical reporting of this story here.

"Realeyes is a London based start-up company that tracks people's facial reactions through webcams and smartphones in order to analyse their emotions. ...
Realeyes has just received a 3,6 million euro funding from the European Commission to further develop emotion measurement technology. ...

The technology is based on six basic emotional states that, according to the research of Dr Paul Ekman, a research psychologist, are universal across cultures, ages and geographic locations. The automated facial coding platform records and then analyses these universal emotions: happiness, surprise, fear, sadness, disgust and confusion. ...
 [T]his technological development could be a very powerful tool not only for advertising agencies, but as well for improving classroom learning, increasing drivers’ safety, or to be used as a type of lie detector test by the police."

Of course, this is utterly stupid.  For one thing, it treats emotions as if they are real tangible things that everyone agrees upon, whereas emotions research is a messy field full of competing theories and models.  I don't know what Ekman's research says, or what predictions it makes, but if it really suggests that one can reduce everything about what a person is feeling at any given moment to one of six (or nine, or twelve) choices on a scale, then I don't think I live in that world (and I certainly don't want to). For another, without some form of baseline record of a person's face, it's going to be close to impossible to tell what distortions are being heaped on top of that by emotions.  Think of people you know whose "neutral" expression is basically a smile, and others who walk round with a permanent scowl on their faces.

Now, I don't really care much if this kind of thing is sold to gullible "brand-led" companies who are told that it will help them sell more upmarket branded crap to people.  If those companies want to waste their marketing and advertising dollars, they're welcome.  (After all, many of them are currently spraying those same dollars more or less uselessly in advertising on Twitter and Facebook.)  But I do care when public money is involved, or public policy is likely to be influenced.

Actually, it seems to me that the major problem here is not, as some seem to think, the "big brother" implications of technology actually telling purveyors of high-end perfumes or watches, or the authorities, how we're really feeling, although of course that would be intensely problematic in its own right.  A far bigger problem is how to deal with all of the false positives, because this stuff just won't work - whatever "work" might even mean in this context.  At least if a "traditional" (i.e., post-2011 or so) camera wrongly claims to have located you in a given place at a given time, it's plausible that you might be able to produce an alibi (for example, another facial recognition camera placing you in another city at exactly the same time, ha ha).  But when an "Emocam" says that you're looking fearful as you, say, enter the airport terminal, and therefore you must be planning to blow yourself up, there is literally nothing you can do to prove the contrary.  Dr. Ekman's "perfect" research, combined with XYZ defence contractor's "infallible" software, has spoken.
  • You are fearful.  What are you about to do?  Maybe we'd better shoot you before you deploy that suicide vest.
  • The computer says you are disgusted.  I am a member of a different ethnic group.  Are you disgusted at me?  Are you some kind of racist?
  • Welcome to this job interview.  Hmm, the computer says you are confused.  We don't want confused people working for us.
So now we're all going to have to learn another new skill: faking our emotions so as to fool the computer.  Not because we want to be deceptive, but because it will be messing with our lives on the basis of mistakes that, almost by definition, nobody is capable of correcting.  ("Well, Mr. Brown, you may be feeling happy now, but seventeen minutes ago, you were definitely surprised. We've had this computer here for three years now, and I've never seen it make a wrong judgement.")  I suspect that this is going to be possible although moderately difficult, which will just give an advantage to the truly determined (such as the kind of people that the police might be hoping to catch with their new "type of lie detector").

In a previous life, but still on this blog, I was a "computer guy".  In a blog post from that previous life, I recommended the remarkable book, "Digital Woes: Why We Should Not Depend on Software" by Lauren Ruth Wiener.  Everything that is wrong with this "emotion tracking" project is covered in that book, despite its publication date of 1993 and the fact that, as far as I have been able to determine, the word "Internet" doesn't appear anywhere in it.  I strongly recommend it to anyone who is concerned about the degree to which not only politicians, but also other decision-makers including those in private-sector organisations, so readily fall prey to the "Shiny infallible machine" narrative of the peddlers of imperfect technology.

01 May 2015

Violence against women: Another correlate of national happiness?

Introductory disclaimer: This blog post is intended to be about the selective interpretation of statistics. Many of the figures under discussion are about reported rates of violence against women, and any criticisms or suggestions regarding research in this field are solely in reference to research methods. Nothing in this commentary is in any way doubting the very real experiences of women facing violence and abuse, nor placing responsibility for the correct reporting of abuse on the women experiencing it. Violence against women and girls (VAWG) is an extremely serious issue, which is exactly why it deserves the most robust research methods in order to bring it to light.

Back in February 2014, I wrote a post in which I noted the seemingly high correlation between “national happiness” ratings for certain countries and per-capita consumption of antidepressants in those countries. Now I’ve found what I think is an even better example of the limitations of ranking countries based on some simplified metric. I’ve asked my friend Clare Elcombe Webber, a commissioner for VAWG services, to help me here. So from this point on, we’re writing in the plural...

A few months ago, this tweet from Joe Hancock (@jahoseph) appeared in Nick’s feed. It shows, for 28 EU countries, the percentage of women who report having been a victim of (sexual or other) violence since the age of 15. Guess which country tops this list? Yep, Denmark. Followed by Finland, Sweden, and the Netherlands. Remember them? The countries that are up there in the top 5 or 10 of almost every happiness survey ever performed? Down near the bottom: miserable old Portugal, ranked #22 out of 23 in happiness in the post linked to above. (The various lists of countries don’t match exactly between this blog post and the one linked to above because there are different membership criteria, with some reports coming from the OECD, EU, or UN. Portugal was kept off the bottom of the happiness list in the post about antidepressants by South Korea.)

This warranted some more investigating, along the lines of Nick’s previous exploration of the link between happiness and antidepressants. The original survey data page is here; click on “EU map” and use the dropdown list to choose the numbers you want. Joe’s tweet is based on the first drop-down option, “Physical and/or sexual violence by a partner or a non-partner since the age of 15”. While performing the tests that we describe later in this post, we also tried the next option, “Physical and/or sexual violence by a partner [i.e., not a non-partner] since the age of 15”, but this didn’t greatly change the results. In what follows, unless otherwise stated, we have used the numbers for VAWG perpetrated by both partners and non-partners.

First, Nick took his existing dataset with 23 countries for which the OECD supplied the antidepressant consumption numbers, and stripped it down to those 17 which are also EU members. Then, he ran the same Spearman correlations as before, looking for the correlations between UN World Happiness Index ranking and: /a/ antidepressant consumption (Nick did this last time, but the numbers will be slightly different with this new subset of 17 countries); /b/ violence reported by women. Here are the results, which are first sight are rather disturbing:
  • Antidepressant consumption correlated (Spearman’s rho) .572 (p = .016) with national happiness.
  • Violence against women correlated (Spearman’s rho) .831 (p < .0001) with national happiness.
Let’s repeat that: Among the 17 largest economies within the EU, the degree of violence since age 15 reported by women is very strongly correlated with national happiness survey outcomes. When things turn out to be correlated at .831, you generally start looking for reasons why you aren’t in fact measuring the same thing twice without knowing it.

Trying to look for some way of mitigating these figures, Nick tried another approach, this time with parametric statistics. He took the percentage of women reporting being the victims of violence in all 28 EU countries, and compared it with the points score (out of 10) from the UN Happiness Survey. Here is the least pessimistic result obtained from the various combinations:
  • Across all 28 EU countries, violence against women correlated (Pearson’s r) .497 (p=.007) with national happiness.
This is still not very good news. If you’re hoping to show that two phenomena in the social sciences are correlated, and you find a correlation of .497, you’re generally pretty pleased.

Of course, correlation is not the same as causation. Probably nobody would suggest that higher levels of violence against women makes for a happier society, or that higher levels of general societal happiness cause people to become more violent towards women.

So what is going on here? Maybe the methods are seriously flawed. We might have difficulty imagining why Austrian women would report rates of interpersonal violence barely half those experienced by Luxembourgers, or that Scandinavians are assaulting women at over twice the rate of Poles, or that the domestic violence problem in the UK is 70% worse than in next-door Ireland.

But perhaps there are some other factors that might help to explain these numbers. Remember, these are answers being given to an interviewer from the EU Fundamental Rights Agency (FRA); they are not extracted from, say, police databases of complaints filed. Thus, while we can perhaps assume that the reports ought not to be affected too much by the perceived level of danger or social shame involved in revealing one’s situation to the authorities (it’s easy to imagine that that people in countries with high levels of equality and openness—Denmark, say—might feel more able to file charges about violence than in some other countries that are perceived as being more “macho”), the degree to which these data reflect reality will depend to a large extend on people’s degree of willingness to admit being a victim to a stranger. While one would hope that the FRA had thought about that and done the maximum in terms of study and questionnaire design, training of interviewers, etc., to allow women to be frank about their experiences, this isn’t something we were able to find definitively in their reported methodology (available here).

There are huge issues, which have dogged this type of research for many decades, when it comes to asking women to disclose their experiences of abuse. The conventional wisdom amongst researchers and service providers is that victims of abuse are extremely unlikely to reveal their experiences to anyone, and short of the FRA interviewers spending months building rapport with each respondent (which, obviously, they did not do) there is little to be done to mitigate this. Here are just some possible reasons why experiences of abuse might not have been disclosed to researchers, and how this could impact on the results:

·      The sampling method involved visiting randomly selected addresses. A common tactic used by abusive partners is to isolate their victim, primarily as a way of stopping any disclosure or attempt to seek support; so it is not unlikely that women currently in abusive relationships were “not allowed” to take part in the research at all. (If we wish to make great leaps of logic here, we could theorise that this could lead to a higher apparent incidence of VAWG in countries with better support services, as women in those countries were more likely to have been able to leave an abusive situation, and therefore were more able to take part in the research. But we don’t have data for that…)

·      Many women do not identify their experiences as violent or abusive, even when most external observers would say that they plainly are. This may be a defence mechanism, allowing them to avoid having to face up to the truth about their partner, the fragility of their personal safety, or the frightening nature of the world. Admitting that they are the victims of violence or abuse would also imply that they may have to act to change their situation. Therefore, respondents could simply be lying; and, even if a measure of social desirability might be able to detect this (possibly a tall order for such a serious subject), it’s unlikely that the interviewer would administer such a measure. Alternatively, the degree to which women deny that their experiences are violent or abusive might have a substantial cultural component; perhaps women in more “traditional” countries are more likely to justify some behaviours towards them as “normal”.

·      It is not clear, from the methodological background of the report, how issues of confidentiality were explained to respondents. We can reasonably conjecture that if a respondent disclosed that they were currently at serious risk from someone, that the interviewer would have been ethically obliged to do something additional with this information. Many abusers make threats of violence or serious reprisals should their victim make a disclosure (something borne out by the fact that the majority of serious injuries or murders of women by men they know occur at or shortly after the point of separation or disclosure of the abuse to a third party), and this would significantly impact whether or not a woman would answer these questions truthfully. In addition, perceived fear of the authorities may discourage a woman from disclosing; in many countries, the police and social workers often do not have a glowing reputation for providing support, and women may feel that involving them would exacerbate their problems, rather than help to resolve them.

·      Finally, victims who have disclosed their abuse often talk of their feelings of guilt, or that they are to blame for abuse. This shame could be an additional barrier to giving a truthful answer.

We can make some—admittedly sweeping—inferences from the fact that the data do not tell us what we would intuitively expect. We could speculate that those countries we might expect to be more socially “advanced” in terms of attitudes to violence against women could have higher rates of disclosures of abuse in this research because women in those countries feel more able to recognise and name their experiences, or feel more confidence in the authorities being supportive, or have greater trust in the confidentiality of the survey; and therefore are more prepared to report having been the victims of violence. A further conjecture could be that in these countries, women are socially “trained” that these experiences are neither normal nor acceptable, and that victims of violence are entitled to be heard, without being stigmatised. (However, a skeptic might respond that, while these assumptions enable us to put a positive spin on this slightly unusual dataset, they are still only assumptions for which we have little evidence, and do little to address the initial observation, namely that the countries in the EU deemed to be happiest also reported the highest levels of violence against women.) We could add all sorts of social variables into the mix here: availability of relationship education, social stigma towards single mothers, the perception of the state as supportive (or not), and so on. Violence against women and girls is a melting pot of individual, social, and cultural variants, and to date researchers have not been able to neatly set out what it is which makes some men decide to be abusive towards women, nor what makes some communities turn a blind eye to such abuse or even place the blame on the women being abused. Respondents potentially have many more reasons to conceal their experiences of violence and abuse than they might in other research areas, and there is no straightforward way of controlling for these. (Psychologists have devised various ways of controlling for social desirability biases, but it is not clear to us that these take sufficient account of cross-cultural factors; see Saunders, 1991.)

However, let’s assume for a moment that it might be valid to take the numbers in the report as not being directly reflective of the underlying problem, but instead as presenting a combination of the actual prevalence, multiplied by a “willingness to acknowledge” factor. At a certain point, this could mean that you could see higher numbers in the survey for countries where there’s actually less of a problem. For example, let’s say that the true rate of violence against women in Denmark is 60%, and that 87% of Danish women are prepared to discuss their experiences of violence openly; multiply those together, and there’s the 52% reported rate from the EU survey. Meanwhile, perhaps the true rate in Poland is 76% (note: we have no evidence for this; we are choosing Poland here only because it is the country at the bottom end of the FRA’s list), but only 25% of Polish women are prepared to discuss it; again, multiply those numbers together and you get the reported rate of 19%. In fact, this line of reasoning is commonly used by people working on the front line of VAWG support. For example, in one London borough, reports to the police of domestic abuse in 2014 were over 40% higher than in 2013, and this is considered to be a good thing; it’s assumed that the majority of domestic abuse goes unreported, and thus additional reports are just that: additional reports, rather than additional instances. But without more data from other sources and approaches, we just don’t (and can’t) know.

Here’s the kicker, though: if you choose to take the line that these figures “can’t possibly be right”, and that in fact they may even show the opposite of the real problem, that raises the question of why it’s OK to look for an alternative explanation for the figures on violence (or other social issues, such as, perhaps, antidepressant usage), but not for those on other phenomena, such as (self-reported) happiness. What gives data on happiness the kind of objective quality that legitimises all the column inches, TV airtime of happiness gurus, and government policy initiatives to try and boost their country’s rank from 18 to 10 in the UN World Happiness Index, if you’re simultaneously prepared to try to look very hard for reasons to explain away numbers that appear to show that your favourite “happy” country is a hotbed of violence against women?

And, even more importantly: whatever your position, do you have evidence for it?

You can find the dataset for this post here. (Yes, the filename does give away how long we have been working on this post!) It also includes all the data you need to re-examine the post about antidepressants from February 2014.

Saunders, D. (1991). Procedures for adjusting self-reports of violence for social desirability bias. Journal of Interpersonal Violence, 3, 336–344. https://doi.org/10.1177/088626091006003006 (Full text available here.)