How to Structure a Research Product to Facilitate AI Reproducibility
Purpose:
This guidance provides key elements to consider when you are ready to begin uploading your published models, tools, and datasets:
Checklist:
- Repository Title: Does your repository have a meaningful title that clearly states what type of research is being conducted?
- Keywords: Did you provide a list of relevant keywords that would make this particular research artifact (model or dataset) discoverable?
- Formatting: Did you upload a copy of your model in a reproducible format (e.g, notebook)?
- Model/ Tool Description: Does your model have a detailed description that uses words that you think will help or are used by others to find models like yours online? Repeat these words across the title, descriptions, and keywords to increase online discoverability.
- README File: Did you create and upload a README that outlines the specific model workflow and parameters, suitable environmental setup to run the experiments efficiently?
- Dataset Description: Did you include a description and summary about the dataset used, including dataset name and source (citation), the methodology by which the dataset was obtained (e.g, open source, simulation), the scope and contents of the dataset, and how it is organized? Did you include preprocessing information if the dataset is specifically collected/generated/integrated or “uniquely” preprocessed for purposes of the research study or project?
- Sample Datasets (GitHub Link): Did you provide a dataset or sample(s) of the dataset(s)? Here is an iHARP example of GitHub or Zenodo
- Citations: Have you included a citation along with a DOI associated with the published research paper based on the model and/or dataset?
To Learn More: