Nick Brown's blog: Make Meta-Analysis Great Again

I first had this idea about 10 years ago. Don't ask me why I've sat on it for so long. (If you think it's stupid, feel free to mutter "You should have sat on it for a bit longer" under your breath.) This is a serious proposal, so probably I should be doing it with more than just a blog post, but here goes anyway. ¯\_(ツ)_/¯

I think that we^† should build a platform to make meta-analyses, systematic reviews, and potentially any other form of synthesis of the literature:

Replicable (i.e., I can check your working in a short amount of time).
Transparent (i.e., I can see all of your choices).
Sustainable (i.e., I can see how your results stand up five years later).

Here's the basic idea: Right now, most meta-analyses are done in Excel (or perhaps Google Sheets). But everyone builds that spreadsheet in their own format, and it's not easy for anyone else to use it. Moreover, everyone is reinventing the wheel every time. Everyone starts from scratch, reads all of the papers, tries to make sense of the results (which are often far from unambiguous), enters them with some degree of error, maybe uses a bit of R code to calculate the I² or draw the forest plot, and then files the spreadsheet away. If we're lucky they share the files on OSF, but they are only ever a snapshot.

I'm actually a fan of Excel for a lot of things, but not when structure is required. What I want to build is a platform that will make doing meta-analyses using a spreadsheet as obsolete as doing your ANOVA with a slide-rule. Something like GitHub for evidence.

I don't have a cute name for the project at this point. The (ironic) title of this post would give MMAGA, but I'm told by people who read the news that there might be some marketing issues there. I played around with various acronyms and they all sound a bit self-indulgent. So for the moment I will use the frankly appalling name (although everything will in fact be done in a relational database, so who knows) "SynthBase", in the hope that the first agenda item of the all-star committee that will shortly (right?) assemble to properly specify and build the system will be to choose a better name.

Here are the beginnings of the technical specification for SynthBase. It still needs a lot of work, but I think that the principles are solid. There are two key components.

The first component is the DOI. This is an existing almost^‡-unique identifier for pretty much every paper that we are likely to want to include in any synthesis of the literature. Yes, there are some very old papers that don't have DOIs, but we can either accept that those won't make the cut, or we can design an ad-hoc pseudo-DOI system to deal with them. Ditto for any recent papers in predatory journals that we might think are worth including anyway (hmmm). Within a paper we would need some structured, semi-structured, and unstructured ways to identify a data point, but this need not be a major obstacle. Humans (or perhaps AI, under human supervision) will be involved in identifying these data points, and it's a problem of finite complexity within any one paper.

The second component is timestamping. In SynthBase, no data point is ever deleted, nor is it ever definitive. Rather, every record of everything that happens has a timestamp. When user A records that the effect size of result "R" from the study with DOI "D" is an OR of 2.5, the date/time when they did that is noted. When User B points out that according to their calculation the OR is in fact 0.4 (because user A got the ratio upside-down), that more recent date/time is noted. If a user wants to get the corrected version, they just turn the handle and the most recent value will be used. But if they want to see how you arrived at your result a year ago, they tell the report generator to use that date/time, and the old value will be used.

Here is the workflow. As a researcher conducting a meta-analysis, you tell SynthBase to start a new synthesis and it creates a workspace. Next, you enter a DOI. If anybody has ever used that DOI within SynthBase before, you get an alert, and you can see (probably in stages of degree of detail) every other synthesis that has used results from that article. You can choose to trust an existing result (e.g., the effect of the intervention was d = 1.2) or you can propose an update (e.g., "I recalculated the effect size and it looks like d = 0.4"). Either way, you are recorded as having used or proposed a change to that information. Most importantly, if you, or a subsequent user of that result, proposes a different value, anyone who has used it (and anyone who signed up for updates) will get an alert, and any syntheses that are based on the previous value will be flagged as using numbers that not everyone agrees upon. (An important part of the design will be deciding how to deal with disagreements, and making the system at least moderately robust to trolling. But with everything being tracked and transparent, you can see when six serious senior PIs think it's 0.4 and @KrazyKarl420 thinks it's 1.2.)

When your synthesis is complete you can press a button and SynthBase will generate a variety of reports, as well as making a snapshot of the data that you used. All that the snapshot needs to contain is the date/time and a list of which records you have included; this will be delivered to you as a URL (although there will presumably also be a facility to export a CSV file). The reports could be interactive: Click a point in the forest plot and it will open the doi.org page of the article, or whatever (UI design is very definitely not one of my strengths).

Here is the really cool bit: When someone clicks on that snapshot URL at a later point, they can ask to see the results either according to the original timestamp, or the current time (or any time in between). If any of the per-result records have been updated since the snapshot URL was generated, they will see what your synthesis looks like now that those records have been updated (due to a correction, retraction, or any other change). If a landmark study is retracted, we can see the state of the evidence from any given meta-analysis before and after that event in just a few seconds.

This approach also makes it very easy to extend a previous synthesis. You can just fork the original and add the more recent studies (or any others that you think are worthy of exclusion, which the previous authors omitted — or vice versa). Credit for the authors of the pre-fork synthesis could be generated automatically.

At the moment I am thinking that SynthBase would be useful mostly for the synthesis part of a meta-analysis. But it could perhaps also be used as part of the selection process for studies to be included. A simple record with the DOI and "Why we chose not to include this" might be all that is needed, at least initially.

Twenty-five years ago (i.e., back when I was "a computer guy", and 10 years before I became a scientist) I would probably have prototyped this by now. But I am old and slow, and I also know that to do this will be as much about IT operations — as it is about software and database design. So for now I'm just going to put the idea out there with the most basic concepts, and see if anyone bites. Maybe my legacy will be that in 20 years time someone will say "We're still making the same mistakes — if only we'd built the platform that Nick suggested".

The initial challenge is to build a platform that is easier to use than a spreadsheet even for the first user, entering only results from scratch. Only that way, I feel, will this get off the ground. The data entry has to be as easy as Excel (or perhaps only slightly more onerous if there are other immediately obvious advantages). The more syntheses have been performed, the greater the value of SynthBase will be, for the authors of subsequent syntheses (who can choose to trust the results of previous authors, or at least use them to check their own working), but also for the authors and readers of existing work, who can be alerted to possible mistakes or differences of interpretation.

Now I just need a few people who are prepared to help build this. And someone with some money to get it off the ground, although it ought not to be hugely expensive since it doesn't involve large amounts of data storage or CPU power. Perennity and availability would be the main issues. I think the ideal structure would be some kind of non-profit, which I have no interest in being the boss of. In fact I'm not interested in being the leader of the design and development committee either, beyond the first couple of meetings where I get to make sure that you've all understood the point.

Right then. Over to you, I guess.

Don't pretend you didn't find Dilbert funny before you found out that Scott Adams was a bit of a dick.

^† Me, you, anyone in science who cares enough. Margaret Mead was right. (Not just me, though.)

^‡ Some journals have been known to recycle a DOI for the retraction notice of an article, but we can probably deal with that.

4 comments:

Mads5 April 2026 at 02:28
Dear Nick,

You’re right. This system that you are describing has a mirror - business intelligence. I have developed systems for this and generally the result is a combination of UI work and large sql statements in order to work over different data sources and produce interactive reports. These allow ceo and cfo types of large organisations to drill down into datasets and see alterations day after day as the underlying data updates and changes. It ties together kpi and organisational performance which in turn allows better kpi design over time.

The key component is path dependence of analysis. We want to know how we arrive at the results in the report so we effectively generate that result fresh every time and expose the sources of the calculations. In the BI world we do this because we want the CEO type to have a chance to examine for themselves the mechanism producing what appears to be an actionable result.

This way, the guys with the responsibility and the incentive will analyze and understand the precise metrics of what actually forms the basis of his decision - it surfaces the source.

You can’t do this if the new reports arrive every week Monday morning. Excel has been replaced.

But there’s a problem. Doing generic data analysis and presentation and especially allowing individual retooling earlier in the path of your report is about 10-20 times harder than doing it for one report. It’s therefore expensive. Hence why the people who use it are ceo types who have somebody else, usually an expensive consultant, build the reports. An example software might be business analyze.

However, two things : this is pre ai observations, and making static reports dynamic should be about twice the work now, not 10-20 times. And secondly this solves a much greater problem, which is making invalidation work citable. That is to say, if you develop a meta analysis or a report that contains both invalidation AND validation you can be cited for your invalidation work when the validated work is cited. This gives the ability for academics to build their citation score using the platform you propose. That’s the key feature in my view.

I might be able to prototype something of this nature, but it will be scrappy and obviously I don’t have any money to contribute. Just knowhow. I’m a computer type person.
Jason Kerwin5 April 2026 at 09:39
This seems like the kind of thing Claude Code or Codex would be great at. And if you set it up on a machine where you can let agentic AI run wild, you could go a step further: have it go pull down the data files from all the studies and re-estimate the effects from the raw data.

04 April 2026

Make Meta-Analysis Great Again

4 comments: