Good data management begins at the very start of a research project with developing a Data Management Plan (DMP).
BU and many funders have made this a requirement.
What is a Data Management Plan?
It is a summary, usually between 1-2 pages, explaining how data will be managed throughout the research project. It will address issues such as:
What are the benefits of writing a Data Management Plan?
Data management needs to be planned early to ensure the production of efficient and high quality data:
Essentially, it is much easier to do things correctly from the beginning, and much more costly to make retrospective changes!
Where can I find guidance?
Participant consent for data sharing
Informed participant consent is required to enable research data to be deposited in BORDaR (or any other repository). This includes anonymised data.
High-risk data should be noted in the Data Management Plan along with the safeguards that will be in place to mitigate any risks.
Intellectual Property Rights
Ownership of intellectual property rights (IPR) to research data needs to be documented from the start.
IPR and primary data
IPR and third-party data
Data handling and storage
BU’s Information Classification webpage specifies how data should be handled and stored depending on the classification assigned to it.
OneDrive for Business
OneDrive for Business is appropriate for solo projects or those with a limited number of collaborators. Staff are allocated 1TB storage in OneDrive for Business. Projects which will generate large files will not be appropriate for OneDrive. IT should be consulted as early as possible to identify alternative storage solutions and to estimate costs.
SharePoint is more appropriate for projects with many collaborators. Projects which will generate large files will not be appropriate for SharePoint. IT should be consulted as early as possible to identify alternative storage solutions and to estimate costs. IT can also be consulted to advise on how to best set-up SharePoint for individual research projects.
H and I-Drive
The H-drive and the I-drive should not be used to store research data. Please refer to OneDrive for Business or SharePoint. The H and I drives should not be used because they lack versioning functionality and file recovery is more complicated.
Metadata are standardised and structured descriptive labels applied to an object. A library catalogue, for example, is a database of metadata records describing the books in the library. Each record corresponds to a book on the shelves and is made-up of individual metadata fields such as 'Title', 'Author' or 'Subject' fields. Just as a book would be hard to find without a catalogue record, research data would be very hard to find and access without its own metadata record. Metadata is much easier to collect and record if planned from the start of a research project. This is particularly the case for collaborative projects where consistency is required.
Metadata requirements for BORDaR
The metadata elements required in BORDaR (BU's research data repository) are described in the 'Sharing your data' section of this guide.
Disciplinary specific metadata standards
The Digital Curation Centre maintain a list of disciplinary metadata standards. These are standards which formalise the metadata specifications that academic communities consider important for research data to be findable, accessible, interoperable and re-useable (FAIR). When choosing a metadata standard, consider which would be most appropriate for your area of research.
Some repositories specify the metadata standards which should be used to describe the hosted research data. The UK Data Service, for example, uses the Data Documentation Initiative (DDI) standard. If you plan on depositing your data with an external data repository, it is important to check whether adherence to a specific standard is required. The Registry of Research Data Repositories can be used to find suitable disciplinary repositories. Please be aware that some funders specify or provide guidance around which repository to use. The Data Curation Centre provide a summary of funders' data policies.
Subject terms and keywords
Taxonomies and/or thesauri are structured, controlled vocabularies. They are used to improve the discoverability of research data, and many data repositories require subject headings (controlled) or keywords (uncontrolled) to be provided when the data is deposited. If depositing data in an external repository, check which scheme is being used. This will need to be noted in your Data Management Plan.
BORDaR does not currently utilise subject headings, though they could still be entered as keywords.
Subject headings can be found by using generic or discipline specific schemas. A few examples are listed in the table below
|searchFAST||FAST subject headings are a simplified adaptation of the Library of Congress Subject Headings (LCSH).|
|Arts and Humanities|
|Art and Architecture Thesaurus (AAT)||Covering art, architecture and visual culture heritage.|
|HASSET||The Humanities and Social Sciences Electronic Thesaurus, provided by the UK Data Service.|
|Health and Medicine|
|Medical Subject Headings (MeSH)||Produced by the National Library of Medicine. It is used for indexing, cataloguing, and searching of biomedical and health-related information.|
|Science and Technology|
|Heritage Data||Lists of schemes covering archaeology.|
|HASSET||The Humanities and Social Sciences Electronic Thesaurus, provided by the UK Data Service.|
Selecting data for long-term preservation
Not all research data can be preserved in the long term. This can be due to the cost of data storage or the risk that 'unnecessary' data deposits could swamp the scholarly system, making it more difficult to find data without considerably more effort. Ethical or commercial considerations could also rule out even publishing on a restricted basis.
Cox and Verbaan (2018) list the kinds of questions that need to be asked to determine the significance of a dataset:
The data type may also have a bearing on what is retained in the long-term. Observational data, for example, cannot be reproduced and is therefore unique. It would have a higher priority for preservation than the results of a relatively inexpensive experiment that could be reproduced with the right documentation. However, experimental data produced from a time-consuming and costly study would be a strong contender for preservation. It may also not be necessary to keep the output of simulated data, particularly if the file is very large. Instead, the code and documentation needed to re-run the simulation may be all that's required for preservation.
Restricted access to sensitive data
Sensitive data can often be shared for research purposes (without infringing ethical and legal requirements) because appropriate controls and safeguards are applied, for example informed consent, employing anonymisation techniques, and/or controlling access to data. However there will be some cases where it is not appropriate to make research data publicly available in a data repository even though its use within the research project was legal and ethical: it may not be possible to anonymise or otherwise reduce the sensitivity of the final dataset, and the impact of making data publicly available is usually very different to the impact of the same data being used in a controlled way within a specific individual research project.
Restricted access refers to limits which can be imposed on accessing research data through a data repository, where public access is not appropriate for these reasons. These restrictions can be placed in both the short and long term.
Embargoes are short term restrictions on access to research data. An embargo, for example, might be used to give researchers time to write-up and publish their research before the data is made available. If an embargo is required, this should be stated in the Data Management Plan and justification provided. The expectation is that research data should be deposited and made available at least by the same time as the published output in BURO (BU's publication repository).
Controlled Access places longer term restrictions on who can access research data. Different repositories will have different levels of restrictions, so if you plan on depositing in an external repository, it is important to check what levels of access they offer. BORDaR (BU's data repository) does not currently offer controlled access, but this is something currently being developed. If you plan on depositing data in BORDaR but believe that it would be inappropriate to publish the data without restrictions, please contact the research data management team.
Where will the data be shared?
Research data can either be deposited within BORDaR, BU's own research data repository, or with an external repository. It is important that the Data Management Plan clearly states which repository has been chosen. This is because different repositories have different requirements which need to be taken into account.
If depositing externally, a record will be created in BORDaR with a link to the data set. The preference is for data to be hosted externally in a disciplinary repository. This is because the data is more likely to be found and used by members of the academic community if it is hosted on a repository used by members of that community. However, BORDaR is available, particularly when cost or suitability rule out other options.
The Registry of Research Data Repositories can be used to find suitable disciplinary repositories. Please be aware that some funders specify or provide guidance around which repository to use. The Data Curation Centre provide a summary of funders' data policies.
What license will be applied to the data?
Where research data are deposited to be available to others, such access must be subject to licence conditions. These should be consistent with the University’s and the funder’s legal, ethical and contractual requirements and the position with regard to ownership of intellectual property rights. Such licences may restrict use of the data to research or other non-commercial purposes and set requirements as to citation, attribution or acknowledgement.
Creative Commons licences are often applied to research publications and data. Visit copyrightuser.org for more details.
Roles and responsibilities for managing research data need to be assigned to individuals and not just presumed. This is crucial in ensuring that the research will be carried out in line with the Data Management Plan and all applicable requirements (e.g. funder requirements, BU policies, legal requirements). For internal BU research teams roles and responsibilities can be documented in team plans and protocols. Assigning roles and responsibilities is particularly important for collaborative projects. You should seek advice from Legal Services to ensure that collaboration, data processing or data sharing agreements are put in place with external collaborators and service providers where appropriate.