Data Collection
What types of data will you collect, create, link to, acquire and/or record?
This project will produce 2-, 3-, and 4-dimensional microscopy images of bacterial cells in the ND2 image format. Individual images will range from 1-10 MB in size.
The project will create software code in the Python programming language for the analysis of the microscopy image files. Generated analysis data include annotations, extracted image features, and AI-derived image representations.
What file formats will your data be collected in? Will these formats allow for data re-use, sharing and long-term access to the data?
ND2, TIFF, HDF5. ND2 format is preferred for raw image storage to maximize compatibility with acquisition software, and ND2 conversion to TIFF-OME format is readily achieved with open-source libraries as needed.
What conventions and procedures will you use to structure, name and version-control your files to help you and others better understand how your data are organized?
Experimental methods will be documented in electronic lab notebooks detailing the user and the procedures used to prepare and image samples.
Images will be saved in the native file type of the microscope, Nikon ND2. An unmodified, original version of each image file will be maintained on a centralized file server. Metadata will be stored directly in ND2 image files and will include a full specification of hardware components (objective, camera, light source, filters/cube) and settings (light intensity, exposure time, gain) involved in image collection. Each imaging experiment (up to 10,000+ separate files) will be located in a dedicated folder using a name with the form sampleID_YYYYMMDD_R#.ome, where R is the revision number. Each sample will be assigned a unique sampleID that will connect to information about the sample in the README.txt files included in each folder. Metadata files for each experiment will be generated according to Image Data Resource formatting standards.
HDF5 format files will link microscopy files from each experiment, computational analyses, annotations, metadata, and related experimental data.
Documentation and Metadata
What documentation will be needed for the data to be read and interpreted correctly in the future?
Experimental methods will be documented in electronic lab notebooks detailing the user and the procedures used to prepare and image samples. Metadata from primary microscopy experiments will include a full specification of hardware components (objective, camera, light source, filters/cube) and settings (light intensity, exposure time, gain) involved in image collection.
How will you make sure that documentation is created or captured consistently throughout your project?
A Data Manager will be tasked with communicating the Data Management Plan to all Team members, including conventions for data formatting and annotation. The Data Manager will consult with researchers at the onset primary data collection and analysis, or will ensure consultation with such researchers by a properly trained lead user at each institution and/or lab group. The Data Manager or a designated representative will further perform periodic audits to verify correct data formatting and annotation, first immediately following any initial data collection or processing using new methods or by research groups new to the project, and subsequently on a semi-annual basis to ensure continued compliance.
If you are using a metadata standard and/or tools to document and describe your data, please list here.
Image acquisition metadata will be stored directly with images in ND2 and TIFF files. Experimental metadata will conform to Image Data Resource formatting standards. HDF5 format will link raw microscopy files from each experiment, computational analyses, annotations, metadata, and related experimental data.
Storage and Backup
What are the anticipated storage requirements for your project, in terms of storage space (in megabytes, gigabytes, terabytes, etc.) and the length of time you will be storing it?
Approximately 200 TB of data will be gathered and preserved at least 3 years past project completion.
How and where will your data be stored and backed up during your research project?
A centralized file storage server will be located in a datacenter, with the location to be determined in collaboration with the Digital Research Alliance. An offsite backup will be configured.
How will the research team and other collaborators access, modify, and contribute data throughout the project?
All primary team members will have full access privileges to the centralized data server. External users will be provided read access as requested, coordinated through team leader Yves Brun.
Preservation
Where will you deposit your data for long-term preservation and access at the end of your research project?
Image data from the project will be submitted to be considered for long-term deposition by the Image Data Resource project or similar public repository.
Indicate how you will ensure your data is preservation ready. Consider preservation-friendly file formats, ensuring file integrity, anonymization and de-identification, inclusion of supporting documentation.
Our use of the Image Data Resource metadata file format will facilitate long-term repository deposition.
Sharing and Reuse
What data will you be sharing and in what form? (e.g. raw, processed, analyzed, final).
Raw image data, computationally extracted features, and AI-processed image representations.
Have you considered what type of end-user license to include with your data?
Data will be distributed under the CC BY license.
What steps will be taken to help the research community know that your data exists?
Data will be published in primary research journals, and data from each publication will be clearly identified and available for download from a publicly-accessible website maintained by the research team and hosted within the Réseau d’informations scientifiques du Québec (RISQ) for high-speed data transfer to Canadian researchers and beyond. We will also work to deposit data at the Image Data Resource or similar public repository following publication of each dataset to provide access to members of the general research community.
Responsibilities and Resources
Identify who will be responsible for managing this project’s data during and after the project and the major data management tasks for which they will be responsible.
A Data Manager is tasked with enforcing the Data Management Plan. The Data Manager is under direct supervision of the PI and general supervision of the Project Management Committee.
How will responsibilities for managing data activities be handled if substantive changes happen in the personnel overseeing the project’s data, including a change of Principal Investigator?
The CFI Project Management Committee will designate a new Data Manager as required.
What resources will you require to implement your data management plan? What do you estimate the overall cost for data management to be?
The Data Manager position will be included among the duties of a HQP hired for infrastructure operation and management through funds requested in the CFI application. Funds for purchase and administration of data storage infrastructure are similarly included within this CFI project.
Ethics and Legal Compliance
If your research project includes sensitive data, how will you ensure that it is securely managed and accessible only to approved members of the project?
Not applicable.
If applicable, what strategies will you undertake to address secondary uses of sensitive data?
Not applicable.
How will you manage legal, ethical, and intellectual property issues?
Intellectual property issues are addressed through established IP agreements with our industrial partners, Valence Discovery and Molecular Forecaster.
Role of the Data Manager
A designated Data Manager will oversee application of the project Data Management Plan. At the outset of each research group’s contribution of data to the project, the Data Manager will consult will ensure review of the Data Management Plan and to provide assistance in its implementation, followed by an initial audit to confirm proper file formatting and organization. The Data Manager will perform subsequent, quarterly audits to verify compliance with the Data Management plan with respect to the following:
- data and metadata formatting
- file and directory naming conventions
- directory structure
- file integrity
- 3-2-1 backup strategy and integrity of the backups
The Data Manager will work with each research group to ensure that published project data are freely available through publicly accessible websites, while also coordinating direct file server access (via SFTP, SMB, etc.) for Team Members and other users in collaboration with IT professionals managing file storage resources.
Once the Data Manager has been designated, contact information will be added to the Data Management Plan, which will be updated as needed and will remain publicly accessible at its web address: https://brunlab.com/research/amr-ai-data-management-plan/