Written in Livemark
(2022-04-27 13:37)

Discussion and future work

Although these are only initial results, we believe they demonstrate the value of this type of research. There is value in tracking government-funded academic research through systems like 360Giving, Gateway to Research and The Lens Scholarly Works and it is indeed possible to map parts of the landscape of data-related research. With further work, it should be possible to provide a more detailed overview of funding and research related to data, its value, management, uses and impacts.

In particular, we believe that the tables and datasets produced during this research project can help people interested in data-related research topics to search for and locate relevant materials more efficiently and effectively. The tables and datasets can be filtered to highlight the articles and funding schemes that people are interested in. For instance, someone interested in data ethics would be able to:

This will help:

Gaps in the evidence base

It is important to note, however, that these results are based on an extremely small section of the entire landscape of data-related research, in large part because there are major gaps in the evidence base. So long as these gaps remain unfilled, it will only be possible to map parts of the data-related research landscape - a comprehensive view of the field will remain out of reach.

Over the coming years, the ODI’s Evidence & Foresight programme will continue mapping the emerging field of data-related research and help fill in these large gaps in the evidence base. This will involve working to convene interested people and organisations to understand the challenges better and co-develop strategies to address them.

Non-academic ‘grey literature’

One large gap in the evidence base is related to research conducted outside of academia and published outside peer-reviewed journals or academic conferences.

The systems set up to track funding and research are primarily focused on peer-reviewed academic research and do not tend to capture non-academic ‘grey literature’, for example, project reports, technical evaluations and white papers produced by third sector research institutes, government agencies, industrial firms, consultancies and the media. It also misses huge areas of non-literature research outcomes like datasets, software tools, conferences, outreach programmes, spin-outs, patents and artistic works.

This is a problem because a lot of thought leadership, innovation and agenda-setting happens outside of academia. Focusing on government-funded academic research can lead to a distorted view of how innovative ideas are developed into impactful services and products.

There are a few notable attempts to fill this gap in the evidence base and connect the academic and non-academic research worlds, such as Altmetric and Overton , and some databases like Dimensions have begun capturing reports from large think-tanks and non-governmental organisations. Expanding the types of documents captured by indexing databases should help to fill some of the gaps in the evidence base around non-academic research.

Research published by non-academic sources, however, is currently much more difficult to track than academic research and potentially dwarfs the number of articles published within academia each year. Mapping this section of the data-related research landscape will not be solved by a few organisations working in relative isolation.

In the coming years we intend to explore ways of working in collaboration with academic and non-academic organisations to connect these two parts of the landscape of data-related research. For instance by hosting events that seek to connect academic and non-academic researchers working on similar data-related topics. Or by convening academic and non-academic organisations to explore options for adopting and adapting the existing schemas, standards and systems used within academia. This might involve co-creating a metadata standard for reports and other non-academic research outputs, which could then lead to the creation of search and discovery tools for grey literature that are similar in function to those we have used to track academic research.

Open research and open science

Another gap that needs to be addressed is related to open research and open science.

Many of the largest and best-curated databases for tracking funding and research, such as Web of Science and Scopus exist behind paywalls. Most universities and large research organisations pay for access to at least one of these databases, but for many people and smaller research organisations, these costs are prohibitive.

Supporting openly accessible databases like the Lens and OpenAlex should help make it possible for more people and organisations to conduct this type of research. OpenAlex, in particular, promises to be a useful resource for researchers. It is an ‘index of hundreds of millions of interconnected entities across the global research system’ that is free and open source. At the time of writing, the API is up and running and the web interface is scheduled for launch in April 2022.

We would also like to see more paid services offering programmes like the one run by Dimensions which provides free access to its database for non-commercial scientometric research projects. Unfortunately, its requirement that those granted free access must publish their results ‘in a peer-reviewed journal, or share them at a scientific conference’ once again draws an unhelpful distinction between academic and non-academic research, potentially precluding some non-academic researchers and organisations from gaining access to a valuable resource.

Open and standardised funding information

A third major gap in the evidence base is related to funding and grants information.

There is only limited information about funding and grants contained within the databases that exist to track funding and research, and this is true of paid and open databases. This potentially leads to blind spots for funders and researchers and duplication of effort.

The issue is not necessarily with these databases, but with the reporting of funding and grants information by funders and researchers. Government funding agencies are relatively good at publishing data about their funding portfolios, and some philanthropic funders like Wellcome publish funding data in standard formats that can be ingested into research databases like 360Giving and OpenAire. But on the whole, reliable funding data is difficult to come by.

More widespread adoption of things like the 360Giving Data Standard by funders would help to fill in some of these gaps, as would greater commitment by researchers and research organisations to include funding details in their publications – not just in academia, but across the rest of the data-related research landscape as well.

Questions for further research

In addition to working to address the gaps in the evidence base outlined above, we will spend the next year working to answer a series of further questions that this research project has surfaced.

The future of data-related research

One question we are interested in exploring is what the future of data-related research looks like. First, we want to talk to more people to see if they agree with our assessment that there is an emerging field of data-related research. Based on our initial research, there are others who agree that there is a field emerging around data-related topics, but we want to confirm this. We will seek to do this via a combination of interviews, surveys and further scientometric and bibliometric analysis. If our research confirms that a field is indeed emerging, that raises an interesting question of whether that field should remain spread across a range of academic disciplines and areas of inquiry, or whether it would benefit from becoming a separate, new field.

As we showed in the quantitative findings section, data-related research is currently being conducted in fields as diverse as mathematics, computer science, geography, biology, medicine, political science, law, media studies, economics and psychology. If the field remains an orthogonal, cross-domain topical thread then it will be imperative to ensure that research findings in those disparate areas of inquiry can be transmitted easily across the network and that people are able to connect to exchange methods, theory, advice and even peer review.

In particular it would be important to ensure that people conducting data-related research within their home domain are supported with knowledge, theory and skills that are specifically relevant and necessary for exploring questions related to data, its value, management, uses and impacts.

On the other hand, if the field were to become a separate area of inquiry, then it would be important to ensure that those working in the field utilise as many different methods and perspectives as possible when exploring questions related to data. Indeed, one benefit of data-related research remaining as a cross-domain research topic might be that it would help ensure that the field does not pursue a narrow set of data-related topics from a few theoretical and methodological perspectives. Since data is seemingly everywhere, it may make sense to ensure that research about data is conducted everywhere by the people who understand those domains.

Over the coming years our aim is to convene and collaborate with people and organisations across this emerging field in order to understand their view of the future of data-related research.

Search and discovery tools

Another question we are interested in exploring is whether it is possible to create search and discovery tools or services that would be useful to people interested in data-related topics. For instance, we hope to experiment and trial different tools which can help audiences more easily locate relevant data-related research. During our work for this project, we found that it is time consuming to sift through research outputs and funding descriptions to find those that are about data, its value, management, uses and impacts as opposed to merely those that mention or refer to data. Conveniently, identifying data-related research is a task that is quite suitable for natural-language processing and machine learning. Given the structured datasets and our hand-labelled datasets, it may be possible to create a machine learning classifier to identify data-related publications or projects. Considering the nuance in judging whether an output is data-related, it’s unlikely any machine-learning system could ever achieve perfect accuracy, but it can certainly become good enoughto produce relevant lists and better search results. Our initial, quick experiment with creating this classifier achieved 70–80% accuracy when compared to our hand-labelled dataset.

Over the next year we plan to build on this tool with the aim to increase the scope beyond our initial 10 data-related keywords, to run tens, even hundreds more keywords through this machine-learning system. One design question we intend to investigate is which formats are best suited to presenting these lists and results, depending on the needs of different parties. It could be an automated email newsletter, or a site with a searchable archive, or an analytics dashboard, or all of these.

We intend to test this proposition with interested parties over the coming year, so please contact us if you are interested in taking part.

Different methods and sources of data

At the ODI we’re interested in how we can catalogue, manage and provide access to a body of evidence, knowledge and thought about data and data-related topics, including the value of data, its management, uses and impacts. This report looking into data-related research has been the first step towards understanding this landscape, and over the next few years we will be expanding our approach to how we collect, manage and provide access to evidence about data-related topics, as well as our thinking around what constitutes evidence itself in this space.

This process starts at home, where we will be taking stock of the body of knowledge that the ODI has produced over the last 10 years. Given the gaps in our understanding about how we can collate evidence about ‘grey’ or non-academic literature, working to understand our own outputs will give us a chance to learn about how we can better share our own, non-academic work to make it more accessible to others interested in this space.

To continue on from this project, we want to further expand our net. We focused on three databases in this project, but came across a number of other aggregators of research and grants info in the process. We’d like to expand out to include other databases, like ARXIV, OpenAlex and Dimensions, to bring in more potentially relevant publications. In parallel with our intention to develop improved search and discovery tools, we hope to expand our set of search terms and bring in more publications to paint a more detailed picture of the landscape while cutting down the manual cleaning and labelling which accounted for a sizable chunk of this project.

Similarly, we want to expand our thinking about evidence through other areas of inquiry, or research methods. Activities like ecosystem mapping can help to build a picture of the organisations and researchers creating and contributing evidence in this space. Identifying and analysing alternative sources of data, like social media or indexes of podcasts, can help to identify live research and communication on data related topics, providing a more current view of the landscape. And focusing on some of the knottier questions, like what constitutes evidence, and what types of evidence are deemed to be robust or not, can help to better understand the full breadth of the landscape, ensuring our observations of the field encompass the full diversity of contributions.

— — —

We are excited to continue working in this area to fill in the gaps identified above and help map and connect the far-flung field of data-related research. We welcome any and all help on the challenges identified above and welcome insight into any challenges we have not yet identified. Please let us know if you are interested in joining us on this journey.

A study of the emerging field of data-related research