Over the past decade, the scale of neuroscience datasets and scientific collaborations has grown dramatically. With the advent of cutting-edge technologies such as NeuroPixel probes, widefield microscopy, multi-omics, and advanced video tracking, the amount of data acquired over the course of a project now commonly reaches terabytes. This data needs to be processed and distributed to multiple laboratories for analysis and shared with the broader community. This has given rise to a new type of job in the field of neuroscience: the Research Software Engineer (RSE), a professional trained to develop data infrastructure and processing pipelines.
As an RSE ourselves, we lead the International Brain Institute, a large-scale scientific collaboration of 22 laboratories, and have seen how tailored engineering support can significantly improve the quality and impact of research. Our flagship project, a whole-brain map of neural activity, would not have been possible without core engineers who built a common infrastructure to collect, process, and redistribute data from thousands of experiments across dozens of geographically distributed laboratories. Pipeline standardization, visualization, and specialized testing were essential for an effort of this scale. These solid foundations also accelerated the progress of over 30 smaller projects, as scientists could rapidly reuse methods and leverage large, well-curated, and well-documented datasets. These shared methodologies were especially important for scientists validating findings across different institutions, such as those studying the neural representation of prior information.
Given the important role of RSEs, the number of RSEs in neuroscience is growing; for example, Princeton University, Janelia Research Campus, and University College London have launched dedicated software cores. However, more widespread adoption of this approach requires more dedicated funding for these positions, as well as structural changes that fit RSEs into the broader academic community. To be most effective, we believe RSEs should be embedded in institutions, rather than individual laboratories, and RSE appointments should be for the longer term, rather than tied to the lifetime of a particular project.
We believe that to be most effective, RSE should be embedded in institutions rather than individual laboratories.
RSEs have a wide range of skills and support all stages of scientific projects: they manage datasets, develop and operate processing algorithms such as data compression and spike sorting; they develop new analytical methods, write or review code to support journal publication, and help enforce quality control; on the hardware side, engineers assemble components, implement the control software, and write documentation to help users build and debug the systems; they also act as project managers, researching, prioritizing, and translating scientists’ needs, ultimately helping to foster collaboration; they are often the driving force behind open science efforts, disseminating tools, methods, and datasets to the community, documenting and packaging code, developing courses, giving lectures, and engaging with researchers who want to use shared data.
R
Although SEs bring clear benefits to the scientific enterprise, several factors have hindered the widespread adoption of this type of support. For one, the role does not have a clearly defined career path within academia. Restricted to existing personnel categories, senior RSEs with years of experience are often appointed as postdoctoral researchers or staff collaborators and do not receive the same salary benefits as their tenured, comparable scientist colleagues. It has been difficult to explain to institutional HR departments that valuable RSEs without PhDs should be paid a higher salary than those with master’s degrees, or that engineers with 15 years of experience should be given the same benefits as faculty. We need to create a pathway for RSEs to grow. This initiative will both retain talented workers and enable them to apply their knowledge across multiple projects.
In many universities, RSEs must be employed under the PI who is the main recipient of the grant. This rule means that RSEs depend on their home lab’s budget, both in terms of salary terms and salary levels. This makes it difficult for RSEs to transfer their expertise from one project to another. Also, engineers who are formally supervised by a PI cannot hire or manage junior RSE team members. This situation becomes even more difficult in cross-institution collaborations. A lead RSE from one institution must manage a remote team over which he or she has no direct authority. When engineers receive conflicting signals about work priorities from both their home PI and lead RSE (a classic example of matrix management), it is difficult to know which one to follow.