Written in Livemark
(2022-03-31 09:09)

Methodology

This is a mixed-methods research project that aims to combine qualitative interviews and short response surveys with quantitative data analysis. It is also ‘research on research’ or meta-research, and therefore draws on scientometric and bibliometric analysis throughout.

Rather than begin by speaking to researchers or launching into a scientometric analysis of scholarly research, our approach was to speak first to funders and supporters of research related to data, its value, management, uses and impacts. Given the central role funders play in the research ecosystem, it seemed reasonable to investigate who and what they fund, and for what reasons.

Because of the emerging and varied nature of the field of data-related research, we avoided rigidly or precisely defining what we mean by ‘data-related research’ at the start of the project. Instead, we worked from the ground up – letting the topics, methods and theoretical framings that make up the field emerge from the analysis rather than pre-defining its boundaries. We believed this would enable us to cast our eyes as wide as possible and remain open to including areas of research in our landscape that we might not have anticipated.

Research questions

The main questions we wanted to answer through this research were:

These are broad questions that we were unlikely to answer within a single, short research project, so we also set ourselves a series of more targeted subquestions, which we have focused on answering through our qualitative and quantitative research. 1 Who is funding research related to data, its value, management, uses and impacts? 1 In the UK, who are the major funders of data-related research? 2 How much funding do they direct toward data-related research? 3 Has the amount of funding directed toward data-related topics increased/ decreased over the last decade? 2 Which research organisations are being awarded funding for data-related research? 1 How much funding are they being awarded? 3 Where is research related to data being published? 1 Is it possible to connect research publications to their source of funding? 4 What were the top data-related articles published in recent years?

Qualitative research

For the qualitative strand of this project we interviewed 19 people from 13 different funding organisations – six philanthropic funders and seven UK Research and Innovation research councils.

Before each interview we conducted desk research about each funder to better understand their core goals and mission, their governance structure and their current projects and funding priorities.

The interviews were discursive and informal, but in general we asked questions such as:

Quantitative research

Following the interviews, we set out to quantitatively analyse: who is funding data-related research; who is conducting data-related research; where is data-related research published; and what the impact of that research has been. We identified three databases that contain the information necessary to help us answer these questions.

Although we adopted a grounded approach and are keen to let the contours of the data-related landscape emerge naturally from our research, once we began our quantitative research we needed to select an initial list of keywords in order to begin searching our chosen databases for data-related funding and research. We chose to build on the 10 topics in the Open Data Institute’s (ODI’s) landscape review ‘ Data 2020 ’, since the 10 topics it discusses represent a useful initial sketch of the data-related landscape over recent years.

We know this is not the full extent of the field and intend to expand this initial list as we conduct further research in order to add adjacent keywords and areas of study. The 10 keywords are:

The list is a mix of: general terms such as ‘data sharing’ and ‘misinformation’; fairly specific terms with relatively high usage such as ‘digital economy’ and ‘data ethics’; and fairly new or emerging terms with potentially lower usage such as ‘data rights’ and ‘digital trade’. As will be seen, having a mixture of terms has given us insights into the types of terms and keywords that are easiest to track and analyse within this space.

We focused our analysis on the years 2020–2022 to enable us to explore whether the topics outlined in ‘Data 2020’ were indeed as relevant and important as predicted. (This research is ongoing, however, and will not be discussed as part of this research report). In addition, given the short timeframe of this project, focusing on two years of funding and research helped us strike the right balance between depth and breadth. The amount of time and labour needed to clean and cull a dataset over a larger timespan was beyond the bounds of this project.

To further limit the amount of time and labour required during this phase, we decided to focus on UK funders. More specifically, we exclusively analysed UK funders listed on 360Giving and UK government research councils on Gateway to Research. On the Lens, we used the ‘Institution Country/Region’ filter to focus our attention on articles written by people working in UK institutions. (Note: the author’s institution and country code do not appear in the downloaded dataset from the Lens.)

Once we had conducted a search along the parameters outlined above, we downloaded, cleaned, culled and analysed the resulting datasets. The typical process was:

These steps were time consuming and required a lot of labour and concentration. This involved reading and examining thousands of project and publication descriptions. It was often very nuanced whether an item was data-related research or was research that merely mentioned data. Even the term ‘data’ makes research in this area difficult. Most research will in one way or another collect, analyse or publish ‘data’, which means a large percentage of research publications mention or include the word data. However, only some of these publications actually engage with data as a concept, examine its value or impacts or discuss how to manage or use it. Sifting through the mentions of data to find those that are relevant is time and labour intensive. As an estimate, we tended to label about 30 items in one hour. With 1,642 items in total, we estimate we spent over 50 person-hours labelling these datasets.

The datasets come with some caveats.

Our findings are outlined below. We have also summarised our findings in a slide deck which you can find here.

A study of the emerging field of data-related research