What would science look like if it were invented today?
The Internet represents an opportunity to change this system, one which has created a 300-year-old, collective long-term memory, into something new and more efficient, perhaps adding in a current, collective short-term working memory at the same time. With new online tools, scientists could begin to share techniques, data and ideas online to the benefit of all parties, and the public at large. (Robert J. Simpson, paraphrasing Michael Nielsen)
Sure, it is hard to imagine you reading this blog post in a world which hadn't yet engaged in science but the question "What would email look like if it were invented today" was recently addressed during the presentation of the Wave protocol, and entertaining some similar ideas on reinventing science may perhaps be worthwhile: how would a system have to be designed that creates and structures knowledge such that these two complex processes can effectively feed on and adapt to each other, making use of the most appropriate technologies at hand? Both processes are highly interrelated but to facilitate the discussion, we will first consider them separately (in this and the next issue of the Euroscientist), and then provide a synthesis (to which you can contribute).
Part I: What would knowledge creation look like if it were invented today
The basic components of research
Let us start by considering scientific knowledge creation — or research, for short. Within the framework of existing knowledge, this requires, as a first step, the identification (and perhaps further characterization) of a gap to be bridged or closed, albeit some methodologists prefer or even have to construct their bridges before choosing a suitable place to install them.
Once such a gap has been identified (we will leave a detailed consideration of this process for later), three basic components are necessary to close it, usually following each other as stages of a research project:
- Planning: an idea on how to bridge or close the gap
- Realization: the means to put the idea into practice
- Verification: independent assessment of the realization.
A fourth component is crucial to the process — appropriate communication during and across the three basic stages as well as beyond individual research projects. Traditionally, this was (and still is) accomplished separately for each of them:
- Grant proposals after an idea had been prepared for realization,
- Conference and journal papers once the realization had progressed, and
- Further papers (by independent investigators) once replication had been attempted (e.g. as a control experiment in a follow-up study).
The decoupling of this fourth component from the other three, however, is simply a trait our research landscape has inherited from the era of paper-based scientific communication, and by far not a technical necessity today when basically any kind of information can be shared instantly (with few exceptions, e.g. patient data) within and beyond the scientific community. For our purposes, we will thus reframe the concept of putting ideas or results on paper as putting them on a wiki, a blog, a dedicated online repository or successors of these (e.g. as blips or wavelets within the proposed Wave protocol) — in any case a shared research environment — from where they can be syndicated and aggregated in various forms and embedded in other digital environments.
Hello to public research environments online
In this kind of framework (best known as Open Science, henceforth public research environment to emphasize that the concept is applicable across disciplines and that communication in and with the public is different from science as we know it), individual contributions (or comment thereupon) can be automatically assigned a unique identifier (henceforth contribution ID; this may be a revision number with time stamp in wikis or databases, a DOI for journal articles or an ISBN for books), linked to its originator (henceforth contributor ID; usually the user name) as well as other relevant information (e.g. funding sources), and aggregated in various forms. In a paper-based system, contributor ID is mainly composed of an author's surname plus some representation — variable across journals — of given names, such that a single contributor ID may be shared by different individuals whose names are identical or similar, while some individuals (especially those with multiple initials, with non-English characters, or who changed their name after marriage) may have more than one contributor ID. For online platforms, the contributor ID is generally unique within but not across individual online platforms, although a number of solutions towards unique identification of contributors have been implemented (e.g. OpenID), including some specifically targeted at scientists (e.g. Researcher ID).
Each contribution ID can not only be linked to its contributor but also tagged (similar to the keywords currently accompanying manuscripts or grant proposals) and have their quality assessed (or rated, for short) by individual contributors (perhaps as a function of the overlap between the tags for their personal expertise and those of the contribution under consideration) according to a pre-defined set of evaluation criteria (e.g. appropriateness to the current stage of a given project, reliability of the information supplied, or presentation with enough context to be understood by specialists and/ or the public). Some journals already allow such ratings and further comments. However, none of them currently provides aggregations of ratings or comments by contributor, although technical standards for such purposes are operational (e.g. hreview). Despite possible herding effects and other sources of error, the principle feasibility (not the effectiveness) of generating and aggregating such user-defined metrics has been demonstrated on multiple online platforms, especially in non-scholarly environments (tagging: Flickr; rating: Ebay) but in some scholarly ones too (tagging at CiteULike).
No working implementation currently exists that would address the lack of incentives for scientists to engage in collaborative research assessment of this sort but since both publishers and funding agencies have managed to coerce scientists and their institutions into all sorts of behaviour during research assessment exercises in the past and present, they should have no problems providing incentives to participate in this one which has the added benefits of being both transparent and beneficial to the scientific community as a whole (it is of note in this respect that there are very few incentives in the current system to deliver timely, fair and detailed peer reviews for grant proposals or manuscripts). One way to do this would be to require that every reference cited should be rated by the citing researchers (some journals already single out a few references in this manner as being "of outstanding interest" or similar, but aggregating such ratings of single references in a global database like Open Library would be more helpful), another would be to include both the quality and the quantity of a specific researcher's ratings (both active and passive) into the determination of the variable portion of her research funding, perhaps with some sort of normalization by the usage frequency of the tags involved (to balance between large and small fields of inquiry, and to avoid exaggerated claims). The remaining obstacles to a wider adoption of such transparent reputation schemes based on a public research environment with unique contribution and contributor ID schemes are thus not of a technical nature, and we shall assume these features to be available for the system we are about to design.
So far, we have only covered technical aspects of redesigning a research system emancipated from the paper medium but, as Michael Nielsen put it, "[T]here is a second and more radical way of thinking about how the Internet can change science, and that is through a change to the process and scale of creative collaboration itself, enabled by social software such as wikis, online forums and their descendants." In a similar vein, Timothy Gowers started the Polymath project with a blog post discussing the following idea:
It seems to me that, at least in theory, a different model could work: different, that is, from the usual model of people working in isolation or collaborating with one or two others. Suppose one had a forum (in the non-technical sense, but quite possibly in the technical sense as well) for the online discussion of a particular problem. The idea would be that anybody who had anything whatsoever to say about the problem could chip in. And the ethos of the forum — in whatever form it took — would be that comments would mostly be kept short. In other words, what you would not tend to do, at least if you wanted to keep within the spirit of things, is spend a month thinking hard about the problem and then come back and write ten pages about it. Rather, you would contribute ideas even if they were undeveloped and/or likely to be wrong.
This short way of communication is taken to an extreme via the exchange of text messages over mobile phones and web platforms, particularly Twitter or the social aggregator FriendFeed, and even though scientists clearly form a minority on such platforms, they did begin to incorporate them into their research.
Quick poll: did you check any references in this post so far? How did you did that? And how do you usually do it when you read a paper? Sadly, even though most scientific journals now publish their content on the internet, most of the formatting is still being performed with paper as a target — only rarely are hyperlinks incorporated even in the online versions. Online environments, on the other hand, are built around hyperlinks and allow to embed basically any kind of media, for example the Science Commons video below that highlights the value of sharing scientific information.
Research seen in a new light
With the above remarks in mind, let us now reconsider the three stages listed above:
The conception of ideas is a process very specific to the problem at hand and to the individuals (or possibly even machines) dealing with it. Ideas may arise from intensive or superficial occupation with a topic, from experimental or theoretical work on it, from a literature search, from play with methods and concepts, and under multiple other conditions.
If scientists can access all the scientific information relevant to their research, new ways of processing them can be invented: BioText Search Engine, for instance, allows to search the literature via figures from Open Access articles, while Pubfeed uses a corpus of user-defined seed papers to provide an automated stream of literature recommendations that can be fed into a feed reader. Upon visiting this platform according to her own schedule, the researcher can then just click on an item in the feed to go to the abstract, and with one more click to the full text (if she has access to it), suitable reference managers automatically download the article along with its metadata. Some such platforms even allow to host one's digital library on the web and to share it (including metadata) with colleagues or collaborators — a service that tremendously facilitates collaboration but is necessarily of limited use in the realm of toll-access barriers, even if one was lucky enough to receive an eprint for personal use from the authors of a particular study.
In contrast to grants, it is usually hard to tell when a research project began. For simplicity, let us thus assume that it is started by being entered into the public research environment and tagged as an idea with suitable keywords. Similar to the above-described feeds for publication alerts, scientists (and possibly other interested parties, including dedicated robots with their own contributor ID that access the system via its API) subscribed to specific tags or contributors (or combinations thereof) will then automatically be informed of the existence of this new project and may add to it (e.g. comments, references, extensions, limitations, illustrations, links to suitable tools or relevant legal information or related ongoing projects or previous refutations of similar ideas, offers for collaboration or funding, suggestions for a timeline, or simply further tags, or ratings of any of these), to which the original contributor and anyone else interested may respond. All of this would require open standards and suitable licensing as well as provisions for security and against spam.
As a result of these interactions, the planning of a subset of proposed projects will have taken shape after a while, i.e. the necessary material, financial and human resources integrated with a tentative timeline to acquire some preliminary data. Once these are available, they will be posted in the same way as everything mentioned before — with the public research environment effectively acting as an electronic lab notebook — and immediately visualized and integrated with the relevant information available in the system by then, such that the procedures can be adapted as needed to gather the amount and quality of data necessary to bridge the targeted knowledge gap in its most recent state.
Searchable lists sortable by tags, contributors, ratings, envisioned budget or other metadata can then be compiled automatically. On this basis, science funders (which may include dedicated funding bodies but also other organizations, companies, groups of scientists or others, possibly even including lay people) would be able to browse (potentially even with the aid of automated or semi-automated proposal crawlers) through the available proposals meeting their criteria and to either fund them directly or to signal to other funders that they would be willing to fund a proposal in part (such a practice would particularly benefit transdisciplinary projects, which often fall through the grid in traditional research funding). No technical difficulties here, just cultural ones associated with the cherished habit of keeping ideas and results private until formal publication.
It is important to note that such a public research environment would allow for independent verification right from the start in that independent samples could be investigated in parallel by independent scientists (or even robots) following the same public protocol and posting their data in public as they arise — a situation far from being common in contemporary science, although not entirely new after successful completion of large-scale collaborative initiatives like the Human Genome Project.
The transition phase
One of the most frequently raised arguments against public research environments concerns the perceived danger of getting scooped of the information laid out under the eyes of everyone and their dog. But with a functional attribution system as described above, it will always be possible to point out, in public, who had posted what and when, thereby severely limiting the effectiveness of any scooping attempt. Furthermore, it is probably fair to assume that way more scientists would prefer to engage in collaboration rather than scooping, and so it is much more likely that the posting of ideas, results or analytical tools will result in constructive feedback early on, which may actually enhance their research. Indeed, once the paper-based separation of the communicative component of knowledge generation has been overcome, the incentives are going to shift towards releasing new information immediately. Until this is achieved — and this will take a while — the paper-based system will remain important, and our new system will have to be set up as a complement to it.
Interestingly, a public research environment would work best if the initiators of a project had a certain amount of baseline funding at their disposal to bring their research through the idea stage until the first preliminary data (when it is easier to get putative funders interested in the matter). Such baseline funding is realistic: A recent study on the cost effectiveness of the Natural Sciences and Engineering Research Council of Canada found that the costs of the research grant peer review exceeded the costs of providing every eligible researcher with a yearly baseline grant of about CAN$ 30k. Furthermore, a possibility to invest in selected projects initiated by others (either in terms of reviewer effort or as active participant) is perhaps even better a form of research assessment than classical behind-the-doors peer review.
Given that a rating system implemented in our public research environment would almost certainly be less expensive than classical committee-based peer review of grant proposals (most online platforms can be used at no or low cost, no travel costs are incurred by the process, and all the effort spent on reviewing — currently often lost to society, particularly if a manuscript or proposal is rejected — could be used immediately by anyone), the new system would represent an improvement with respect to the current one, even if neither the quality of the research, nor the speed of communicating the results were affected. But both are bound to improve in the new system, leaving more money in the research funding system that can actually be spent on research than this is currently the case.
A small change in the design of the research system — switching from paper-based to web-based communication of ideas, results and verifications — may have profound consequences: within the scientific community, the permanent communication of progress during the course of a project will shorten the feedback loops, allowing to improve or update the design of any research project on the run and to link it to other gap-closing or even maintenance work on our shared corpus of knowledge. Beyond the scientific community, a scientific cycle that is completely open will allow new ways of interaction with society at large, particularly the media: Instead of maintaining a stream of "scientists found out" broadcasts as they do today, the media could add in some issues of the "scientists are currently investigating — let's see how they do it" variety, and everybody and their dog could join. Such strong interaction with the public via the internet also set the frame for the discussion of the second aspect of science — knowledge structuring — to which we will turn next, and you are warmly invited to participate.
This post was written up on the basis of multiple and ongoing discussions in several online environments, particularly the Science 2.0 group at FriendFeed. Specifically, Björn Brembs, Cameron Neylon and Michael Nielsen provided comments on an earlier draft.