I first had this idea about 10 years ago. Don't ask me why I've sat on it for so long. (If you think it's stupid, feel free to mutter "You should have sat on it for a bit longer" under your breath.) This is a serious proposal, so probably I should be doing it with more than just a blog post, but here goes anyway. ¯\_(ツ)_/¯
I think that we† should build a platform to make meta-analyses, systematic reviews, and potentially any other form of synthesis of the literature:
- Replicable (i.e., I can check your working in a short amount of time).
- Transparent (i.e., I can see all of your choices).
- Sustainable (i.e., I can see how your results stand up five years later).
Here's the basic idea: Right now, most meta-analyses are done in Excel (or perhaps Google Sheets). But everyone builds that spreadsheet in their own format, and it's not easy for anyone else to use it. Moreover, everyone is reinventing the wheel every time. Everyone starts from scratch, reads all of the papers, tries to make sense of the results (which are often far from unambiguous), enters them with some degree of error, maybe uses a bit of R code to calculate the I² or draw the forest plot, and then files the spreadsheet away. If we're lucky they share the files on OSF, but they are only ever a snapshot.
I'm actually a fan of Excel for a lot of things, but not when structure is required. What I want to build is a platform that will make doing meta-analyses using a spreadsheet as obsolete as doing your ANOVA with a slide-rule. Something like GitHub for evidence.
I don't have a cute name for the project at this point. The (ironic) title of this post would give MMAGA, but I'm told by people who read the news that there might be some marketing issues there. I played around with various acronyms and they all sound a bit self-indulgent. So for the moment I will use the frankly appalling name (although everything will in fact be done in a relational database, so who knows) "SynthBase", in the hope that the first agenda item of the all-star committee that will shortly (right?) assemble to properly specify and build the system will be to choose a better name.
Here are the beginnings of the technical specification for SynthBase. It still needs a lot of work, but I think that the principles are solid. There are two key components.
The first component is the DOI. This is an existing almost‡-unique identifier for pretty much every paper that we are likely to want to include in any synthesis of the literature. Yes, there are some very old papers that don't have DOIs, but we can either accept that those won't make the cut, or we can design an ad-hoc pseudo-DOI system to deal with them. Ditto for any recent papers in predatory journals that we might think are worth including anyway (hmmm). Within a paper we would need some structured, semi-structured, and unstructured ways to identify a data point, but this need not be a major obstacle. Humans (or perhaps AI, under human supervision) will be involved in identifying these data points, and it's a problem of finite complexity within any one paper.
The second component is timestamping. In SynthBase, no data point is ever deleted, nor is it ever definitive. Rather, every record of everything that happens has a timestamp. When user A records that the effect size of result "R" from the study with DOI "D" is an OR of 2.5, the date/time when they did that is noted. When User B points out that according to their calculation the OR is in fact 0.4 (because user A got the ratio upside-down), that more recent date/time is noted. If a user wants to get the corrected version, they just turn the handle and the most recent value will be used. But if they want to see how you arrived at your result a year ago, they tell the report generator to use that date/time, and the old value will be used.
Here is the workflow. As a researcher conducting a meta-analysis, you tell SynthBase to start a new synthesis and it creates a workspace. Next, you enter a DOI. If anybody has ever used that DOI within SynthBase before, you get an alert, and you can see (probably in stages of degree of detail) every other synthesis that has used results from that article. You can choose to trust an existing result (e.g., the effect of the intervention was d = 1.2) or you can propose an update (e.g., "I recalculated the effect size and it looks like d = 0.4"). Either way, you are recorded as having used or proposed a change to that information. Most importantly, if you, or a subsequent user of that result, proposes a different value, anyone who has used it (and anyone who signed up for updates) will get an alert, and any syntheses that are based on the previous value will be flagged as using numbers that not everyone agrees upon. (An important part of the design will be deciding how to deal with disagreements, and making the system at least moderately robust to trolling. But with everything being tracked and transparent, you can see when six serious senior PIs think it's 0.4 and @KrazyKarl420 thinks it's 1.2.)
When your synthesis is complete you can press a button and SynthBase will generate a variety of reports, as well as making a snapshot of the data that you used. All that the snapshot needs to contain is the date/time and a list of which records you have included; this will be delivered to you as a URL (although there will presumably also be a facility to export a CSV file). The reports could be interactive: Click a point in the forest plot and it will open the doi.org page of the article, or whatever (UI design is very definitely not one of my strengths).
Here is the really cool bit: When someone clicks on that snapshot URL at a later point, they can ask to see the results either according to the original timestamp, or the current time (or any time in between). If any of the per-result records have been updated since the snapshot URL was generated, they will see what your synthesis looks like now that those records have been updated (due to a correction, retraction, or any other change). If a landmark study is retracted, we can see the state of the evidence from any given meta-analysis before and after that event in just a few seconds.
This approach also makes it very easy to extend a previous synthesis. You can just fork the original and add the more recent studies (or any others that you think are worthy of exclusion, which the previous authors omitted — or vice versa). Credit for the authors of the pre-fork synthesis could be generated automatically.
This approach also makes it very easy to extend a previous synthesis. You can just fork the original and add the more recent studies (or any others that you think are worthy of exclusion, which the previous authors omitted — or vice versa). Credit for the authors of the pre-fork synthesis could be generated automatically.
At the moment I am thinking that SynthBase would be useful mostly for the synthesis part of a meta-analysis. But it could perhaps also be used as part of the selection process for studies to be included. A simple record with the DOI and "Why we chose not to include this" might be all that is needed, at least initially.
Twenty-five years ago (i.e., back when I was "a computer guy", and 10 years before I became a scientist) I would probably have prototyped this by now. But I am old and slow, and I also know that to do this will be as much about IT operations — as it is about software and database design. So for now I'm just going to put the idea out there with the most basic concepts, and see if anyone bites. Maybe my legacy will be that in 20 years time someone will say "We're still making the same mistakes — if only we'd built the platform that Nick suggested".
The initial challenge is to build a platform that is easier to use than a spreadsheet even for the first user, entering only results from scratch. Only that way, I feel, will this get off the ground. The data entry has to be as easy as Excel (or perhaps only slightly more onerous if there are other immediately obvious advantages). The more syntheses have been performed, the greater the value of SynthBase will be, for the authors of subsequent syntheses (who can choose to trust the results of previous authors, or at least use them to check their own working), but also for the authors and readers of existing work, who can be alerted to possible mistakes or differences of interpretation.
Now I just need a few people who are prepared to help build this. And someone with some money to get it off the ground, although it ought not to be hugely expensive since it doesn't involve large amounts of data storage or CPU power. Perennity and availability would be the main issues. I think the ideal structure would be some kind of non-profit, which I have no interest in being the boss of. In fact I'm not interested in being the leader of the design and development committee either, beyond the first couple of meetings where I get to make sure that you've all understood the point.
Right then. Over to you, I guess.
† Me, you, anyone in science who cares enough. Margaret Mead was right. (Not just me, though.)
‡ Some journals have been known to recycle a DOI for the retraction notice of an article, but we can probably deal with that.
No comments:
Post a Comment