Using R and pre-registration to teach reproducible research

Over the last decade a number of initiatives have aimed to improve the transparency, reproducibility, and replicability of psychological science including preprint servers, the open sharing of data, materials, and code, registered reports, and pre-registration. In addition to efforts to improve research practices, attention has also turned to how we teach psychology and research methods at a more foundational level (Button et al., 2020, Chopik et al., 2018). As the PsyTeachR team at the School of Psychology and Neuroscience, University of Glasgow, we have successfully transitioned to teaching reproducible research using R across all undergraduate and postgraduate levels. Our curriculum now emphasizes essential ‘data science’ graduate skills that have been overlooked in traditional approaches to teaching, including programming skills, data visualisation, data wrangling and reproducible reports. Our use of R is often assumed to be our main focus however, whilst R is a useful tool to achieve our aims, our true objective is to teach students how to conduct, and the importance of, reproducible research.


Students are assessment-focused, not only in what they choose to spend their time on, but on what they consider important (Brown & Knight, 2021; Rust et al., 2005,). As such, reproducibility cannot be taught in a vacuum: lectures on open science are not enough to ensure that students develop the skills necessary to conduct reproducible research or to truly understand why reproducibility and transparency is so important for science in general. Rather, reproducible research, and the knowledge and skills required to produce and understand it, must be embedded throughout our assessments.


In this blog post, we describe one such assessment for our Research Methods course on the MSc (Conversion) in Psychological Studies/Science programme. A frequent, anecdotal concern we encounter when discussing the move to teaching programming and embedding reproducibility is that psychology programmes do not have the time to develop such skills, and that our diverse cohorts would not accept or be capable of learning them. The conversion programme takes place over one academic year with cohorts that are even more diverse than their undergraduate counterparts, with students who have strong applied (clinical) interests, but often limited statistical or computing competence. The course materials relating to R and the assessment are available as open educational resources with a Creative Commons licence that allows use, reuse and adaption with attribution.


What is it?

Working in small groups, students design and pre-register a questionnaire-based study. A group grade is awarded, although students can peer review their team, and penalties can be applied in cases where individuals did not fully contribute. Following the pre-registration, students analyse the data and individually write-up a full quantitative report.


The topic is broadly self-regulated learning and uses the Motivated Strategies for Learning Questionnaire (MSLQ, Pintrich et al., 1993). The MSLQ contains six sub-scales on learning strategies (e.g., organisation, help-seeking), and nine sub-scales on motivation (e.g., self-efficacy, test anxiety). In addition, participants are asked a range of demographic questions (e.g., gender, English as a second language). Students develop research questions and hypotheses that investigate: a) the correlation between two of the MSLQ variables and b) demographic group differences in at least one of the variables.


The pre-registration process requires students to answer five questions including specifying hypotheses, detailing which statistical tests they will use, exclusion criteria, outlier removal techniques, and presenting a power analysis to estimate the minimum sample size needed. Importantly, students must justify their decisions with reference to the literature. To scaffold the exercise, students are provided with pilot data to help them write reproducible code to perform their planned analyses. The pre-registration is submitted as a reproducible R Markdown document that contains both their answers to the questions and their analysis code.

To avoid issues with student projects generating research waste with multiple small samples (Button, 2018), data collection is done through the same survey, with students simply selecting their variables of interest from the larger dataset. Using this structure allows for flexibility in the designs students can choose for their report whilst providing datasets with several thousand participants, helping to reinforce discussions about power that they have encountered during lectures. Whilst we have used the MSLQ, any questionnaire with multiple sub-scales could be adapted to this structure.


The full write-up takes the form of a standard quantitative report. Crucially, students must highlight any deviations from the pre-registration in their results section. Such deviations can occur following feedback from the pre-registration assessment, or simply because students have changed their mind. They are encouraged to explore their data with the ethos “pre-registration is a plan not a prison” and the emphasis is on transparency and clearly delineating exploratory from confirmatory analyses.


Why use it?

Pre-registration is an excellent way to scaffold the complexity of research, particularly if this is the first research project students have ever conducted. The questions force them to think through each stage of the design and analysis, putting emphasis on asking good questions and planning. Using pilot data helps with conceptualising the analysis and makes the link between hypotheses and statistical analysis more concrete, which can be difficult for novice learners.

The requirement to justify their analytic decisions involves engaging with the wider research methods literature. As such, the process of researching their analytics choices helps them understand that there is often not one ‘correct’ answer and that there are multiple possible approaches (many analysts citation). In addition, this has a wider impact as it expands their understanding of research and science as a messy, subjective process and moves away from the idea that statistics and quantitative research is inherently objective. Although students could be required to report their analyses transparently in a traditional report without a pre-registration, diverging from their plan turns the abstract notion of exploratory vs confirmatory into something concrete. That they are allowed and encouraged to change their plan following feedback helps to reinforce the idea that it is okay to make mistakes and that what matters is transparency, not the p-value, in the end.


More broadly, the pre-registration provides a milestone during what would otherwise be a large and daunting project (such a milestone can also be useful for even larger dissertation projects, Creaven et al., 2021; Pownall et al., 2020). The course runs over 10 weeks and students unlock the knowledge and skills they need to answer the pre-registration questions progressively over those 10 weeks. As well as further breaking the process into small manageable steps, it also provides a concrete indication of how much their skills have developed.


Finally, the pre-registration highlights the importance of groupwork and team science. There is increasing recognition that part of the problem psychology has faced as a science can be attributed to the idea that each researcher needs to be an expert in every part of the research process, rather than acknowledging that not everyone is good at everything and it is through working together with a diversity of opinion, experience, and expertise that the best work is achieved.


Student perspective

“Learning about the replication crisis sold the idea of pre-registration to me, and having to write a pre-registration provided me with many skills, which have benefited me throughout the course. For example, being forced to rationalise the use of statistical tests and plan for violation of their assumptions improved my understanding of statistics and made me more comfortable with dipping into the statistics literature when necessary. Furthermore, doing RM1 and the pre-registration assessment led me to make the choice to pre-register my dissertation project. This forced me to think carefully about what sort of questions I could answer with the statistics I'm using, which meant that when I started writing my introduction, I could be more precise about the theory and findings I included. This saved time reading and writing, and hopefully resulted in a more concise piece of writing.” (anonymous student)


References

Button, K. (2018). Reboot undergraduate courses for reproducibility. Nature, 561(7723), 287-288.

Button, K. S., Chambers, C. D., Lawrence, N., & Munafò, M. R. (2020). Grassroots training for reproducible science: a consortium-based approach to the empirical dissertation. Psychology Learning & Teaching, 19(1), 77-90.

Brown, S., & Knight, P. (2012). Assessing learners in higher education. Routledge.

Chopik, W. J., Bremner, R. H., Defever, A. M., & Keller, V. N. (2018). How (and whether) to teach undergraduates about the replication crisis in psychological science. Teaching of Psychology, 45(2), 158-163.

Creaven, A., Button, K. S., Woods, H., & Nordmann, E. (2021). Maximising the educational and research value of the undergraduate dissertation in psychology. Preprint. https://doi.org/10.31234/osf.io/deh93

Pintrich, P. R., Smith, D. A., Garcia, T., & McKeachie, W. J. (1993). Reliability and predictive validity of the Motivated Strategies for Learning Questionnaire (MSLQ). Educational and psychological measurement, 53(3), 801-813.

Pownall, M. (2020). Pre-Registration in the Undergraduate Dissertation: A Critical Discussion. Psychology Teaching Review, 26(1), 71-76.

Rust, C., O’Donovan, B., & Price, M. (2005). A social constructivist assessment process model: how the research literature shows us this could be best practice. Assessment & Evaluation in Higher Education, 30(3), 231-240.

Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., ... & Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356.