Good data management begins at the very start of a research project with developing a Data Management Plan (DMP).
BU and many funders have made this a requirement.
What is a Data Management Plan?
It is a summary, usually between 1-2 pages, explaining how data will be managed throughout the research project. It will address issues such as:
What are the benefits of writing a Data Management Plan?
Data management needs to be planned early to ensure the production of efficient and high quality data:
Essentially, it is much easier to do things correctly from the beginning, and much more costly to make retrospective changes!
Where can I find guidance?
Informed participant consent is required to enable research data to be deposited in BORDaR (or any other repository). This includes anonymised data.
The Participant Information Sheet should set out if research data will be deposited in a data repository and explain the benefits of doing so (e.g. available for future research).
The Agreement Sheet should include specific consent for the data to be archived.
The relevant sections are included in BU’s Participant Information Sheet and Agreement Form templates. The wording can be amended depending on the project.
High-risk data should be noted in the Data Management Plan along with the safeguards that will be in place to mitigate any risks.
The Online Ethics Checklist will identify any high-risk data associated with the research. What constitutes high risk is explained on BU’s Research Ethics website, and there are examples on BU’s Information Classification webpage (along with how data should be handled depending on the level of risk).
If there are high risks associated with processing personal data, a Privacy Impact Assessment (PIA) should be completed. Support and advice on PIAs are provided by BU’s Data Protection Officer via firstname.lastname@example.org.
Sensitive data can still be used and shared for research purposes (without infringing legal and ethical requirements) if appropriate controls and safeguards are applied such as informed consent, secure storage and data handling, anonymisation and controlled access.
Ownership of intellectual property rights (IPR) to research data needs to be documented from the start.
This applies to IPR arising from primary research and/or data derived from third-party sources. Intellectual property issues can then be dealt with more efficiently during the research project.IPR and primary data
Read BU’s Intellectual Property Policy (2020) for the policies and procedures applied to all intellectual property created during activity carried out by BU staff or students as part of research.
Where research is being carried out in collaboration with other academic institutions or with commercial partners, ownership rights may be shared or apply separately to different datasets. It should be clear from the start who owns what, and how the data can be used and shared. Please consult Legal Services during project planning to ensure appropriate agreements are in place.IPR and third-party data
The IPR for data derived from third-party sources (e.g. social media sites such as Twitter or an existing dataset from a commercial database) will likely belong to third parties, not to the researcher using the data. To use the data, it is likely that suitable permission will be needed from the data owner. Permission might be given in the form of a licence or in the terms and conditions published by the data owner. Written permission may need to be sought from the owner.
If using data from third-party sources, it is important to gain explicit permission from the data owner as soon as possible. Trying to do this retrospectively could take longer (especially if the data owner is an individual researcher who has moved on and becomes more difficult to contact). It would also be very costly to find out too late that the data cannot be used.
Some funders require references to relevant legislation in the Data Management Plan. Researchers should be familiar with the relevant legislation and ensure that their research is compliant.
Data Protection Act 2018 (UK implementation of GDPR)
Duty of confidentiality
Freedom of Information
The Freedom of Information Act 2000 gives members of the public the right to request information from public authorities, including Universities and research data.
Mental Capacity Act 2005
Statistics and Registration Services Act 2007
BU’s Information Classification webpage specifies how data should be handled and stored depending on the classification assigned to it.OneDrive for Business
OneDrive for Business is appropriate for solo projects or those with a limited number of collaborators. Staff are allocated 1TB storage in OneDrive for Business. Projects which will generate large files will not be appropriate for OneDrive. IT should be consulted as early as possible to identify alternative storage solutions and to estimate costs.SharePoint
SharePoint is more appropriate for projects with many collaborators. Projects which will generate large files will not be appropriate for SharePoint. IT should be consulted as early as possible to identify alternative storage solutions and to estimate costs. IT can also be consulted to advise on how to best set-up SharePoint for individual research projects.H and I-Drive
The H-drive and the I-drive should not be used to store research data. Please refer to OneDrive for Business or SharePoint. The H and I drives should not be used because they lack versioning functionality and file recovery is more complicated.
Metadata are standardised and structured descriptive labels applied to an object. A library catalogue, for example, is a database of metadata records describing the books in the library. Each record corresponds to a book on the shelves and is made-up of individual metadata fields such as 'Title', 'Author' or 'Subject' fields. Just as a book would be hard to find without a catalogue record, research data would be very hard to find and access without its own metadata record. Metadata is much easier to collect and record if planned from the start of a research project. This is particularly the case for collaborative projects where consistency is required.
Metadata requirements for BORDaR
The metadata elements required in BORDaR (BU's research data repository) are described in the 'Sharing your data' section of this guide.
Disciplinary specific metadata standards
The Digital Curation Centre maintain a list of disciplinary metadata standards. These are standards which formalise the metadata specifications that academic communities consider important for research data to be findable, accessible, interoperable and re-useable (FAIR). When choosing a metadata standard, consider which would be most appropriate for your area of research.
Some repositories specify the metadata standards which should be used to describe the hosted research data. The UK Data Service, for example, uses the Data Documentation Initiative (DDI) standard. If you plan on depositing your data with an external data repository, it is important to check whether adherence to a specific standard is required. The Registry of Research Data Repositories can be used to find suitable disciplinary repositories. Please be aware that some funders specify or provide guidance around which repository to use. The Data Curation Centre provide a summary of funders' data policies.
Subject terms and keywords
Taxonomies and/or thesauri are structured, controlled vocabularies. They are used to improve the discoverability of research data, and many data repositories require subject headings (controlled) or keywords (uncontrolled) to be provided when the data is deposited. If depositing data in an external repository, check which scheme is being used. This will need to be noted in your Data Management Plan.
BORDaR does not currently utilise subject headings, though they could still be entered as keywords.
Subject headings can be found by using generic or discipline specific schemas. A few examples are listed in the table below
|searchFAST||FAST subject headings are a simplified adaptation of the Library of Congress Subject Headings (LCSH).|
|Arts and Humanities|
|Art and Architecture Thesaurus (AAT)||Covering art, architecture and visual culture heritage.|
|HASSET||The Humanities and Social Sciences Electronic Thesaurus, provided by the UK Data Service.|
|Health and Medicine|
|Medical Subject Headings (MeSH)||Produced by the National Library of Medicine. It is used for indexing, cataloguing, and searching of biomedical and health-related information.|
|Science and Technology|
|Heritage Data||Lists of schemes covering archaeology.|
|HASSET||The Humanities and Social Sciences Electronic Thesaurus, provided by the UK Data Service.|
Not all research data can be preserved in the long term. This can be due to the cost of data storage or the risk that 'unnecessary' data deposits could swamp the scholarly system, making it more difficult to find data without considerably more effort. Ethical or commercial considerations could also rule out even publishing on a restricted basis.
Cox and Verbaan (2018) list the kinds of questions that need to be asked to determine the significance of a dataset:
The data type may also have a bearing on what is retained in the long-term. Observational data, for example, cannot be reproduced and is therefore unique. It would have a higher priority for preservation than the results of a relatively inexpensive experiment that could be reproduced with the right documentation. However, experimental data produced from a time-consuming and costly study would be a strong contender for preservation. It may also not be necessary to keep the output of simulated data, particularly if the file is very large. Instead, the code and documentation needed to re-run the simulation may be all that's required for preservation.
Sensitive data can often be shared for research purposes (without infringing ethical and legal requirements) because appropriate controls and safeguards are applied, for example informed consent, employing anonymisation techniques, and/or controlling access to data. However there will be some cases where it is not appropriate to make research data publicly available in a data repository even though its use within the research project was legal and ethical: it may not be possible to anonymise or otherwise reduce the sensitivity of the final dataset, and the impact of making data publicly available is usually very different to the impact of the same data being used in a controlled way within a specific individual research project.
Restricted access refers to limits which can be imposed on accessing research data through a data repository, where public access is not appropriate for these reasons. These restrictions can be placed in both the short and long term.
Embargoes are short term restrictions on access to research data. An embargo, for example, might be used to give researchers time to write-up and publish their research before the data is made available. If an embargo is required, this should be stated in the Data Management Plan and justification provided. The expectation is that research data should be deposited and made available at least by the same time as the published output in BURO (BU's publication repository).
Controlled Access places longer term restrictions on who can access research data. Different repositories will have different levels of restrictions, so if you plan on depositing in an external repository, it is important to check what levels of access they offer. BORDaR (BU's data repository) does not currently offer controlled access, but this is something currently being developed. If you plan on depositing data in BORDaR but believe that it would be inappropriate to publish the data without restrictions, please contact the research data management team.
Research data can either be deposited within BORDaR, BU's own research data repository, or with an external repository. It is important that the Data Management Plan clearly states which repository has been chosen. This is because different repositories have different requirements which need to be taken into account.
If depositing externally, a record will be created in BORDaR with a link to the data set. The preference is for data to be hosted externally in a disciplinary repository. This is because the data is more likely to be found and used by members of the academic community if it is hosted on a repository used by members of that community. However, BORDaR is available, particularly when cost or suitability rule out other options.
The Registry of Research Data Repositories can be used to find suitable disciplinary repositories. Please be aware that some funders specify or provide guidance around which repository to use. The Data Curation Centre provide a summary of funders' data policies.
Where research data are deposited to be available to others, such access must be subject to licence conditions. These should be consistent with the University’s and the funder’s legal, ethical and contractual requirements and the position with regard to ownership of intellectual property rights. Such licences may restrict use of the data to research or other non-commercial purposes and set requirements as to citation, attribution or acknowledgement.
Creative Commons licences are often applied to research publications and data. Visit copyrightuser.org for more details.
Roles and responsibilities for managing research data need to be assigned to individuals and not just presumed. This is crucial in ensuring that the research will be carried out in line with the Data Management Plan and all applicable requirements (e.g. funder requirements, BU policies, legal requirements). For internal BU research teams roles and responsibilities can be documented in team plans and protocols. Assigning roles and responsibilities is particularly important for collaborative projects. You should seek advice from Legal Services to ensure that collaboration, data processing or data sharing agreements are put in place with external collaborators and service providers where appropriate.