GitHub Starter Kit


Overview

Within the GitHub Starter Kit page, you will find a check list and detailed guidance that provides key elements to consider when you are ready to begin uploading your published models, tools, and datasets.


 

Checklist

  • Repository Title: Does your repository have a meaningful title that clearly states what type of research is being conducted?
  • Keywords: Did you provide a list of relevant keywords that would make this particular research artifact (model or dataset) discoverable?
  • Formatting: Did you upload a copy of your model in a reproducible format (e.g, notebook)?
  • Model/ Tool Description: Does your model have a detailed description that uses words that you think will help or are used by others to find models like yours online? Repeat these words across the title, descriptions, and keywords to increase online discoverability.
  • README File: Did you create and upload a README that outlines the specific model workflow and parameters, suitable environmental setup to run the experiments efficiently?
  • Dataset Description: Did you include a description and summary about the dataset used, including dataset name and source (citation), the methodology by which the dataset was obtained (e.g, open source, simulation), the scope and contents of the dataset, and how it is organized? Did you include preprocessing information if the dataset is specifically collected/generated/integrated or “uniquely” preprocessed for purposes of the research study or project?
  • Sample Datasets (GitHub Link): Did you provide a dataset or sample(s) of the dataset(s)? Here is an iHARP example of GitHub or Zenodo

 

GitHub Starter Kit [PDF File]

Click the above GitHub Starter Kit link to access a PDF file

 


Expanded Guidance

In this section you will find detailed guidance on setting up your GitHub repository.

  • Titles are important for viewers to know exactly what your research entails. Here is a good example of a title for your research artifact: “Image Processing Workflow for Filtering, Segmenting, and Characterizing Digital Porous Media.”
    • This title clearly states what type of research is being conducted.
  • Keywords help to make your research artifact more discoverable to others.
  • Remember that other researchers use keywords to find models/notebooks/software/datasets in a repository catalog or online.
  • When applying keywords, researchers must think about how others would search for this particular research artifact.
  • When applicable, use keywords to indicate the type of hazard, research method, technology, problem addressed, and purpose.
  • Repeating words used in the description and titles as keywords increases the chances that the dataset will be discovered.
  • A keyword can be a controlled vocabulary term or a custom keyword. Here is more information about adding keywords to your research artifact.
  • Here is a good example of using keywords:
    • Keywords and Subjects: beam_hardening; image_processing; segmentation; image_filtering; image_correction; porous_media_imaging

Grey box with Keywords and Subjects: beam_hardening; image_processing; segmentation; image_filtering; image_correction; porous_media_imaging

  • Begin with a general statement that provides context to the study by which the model was created (e.g., The system under investigation…).
  • Address the research problem that the model is helping to solve.
  • Do not copy the abstract of the paper, as that often describes the research results and specifically not the model itself.
  • Specify who will benefit from reusing the model and how.
  • Use language that can reach experts as well as broader audiences.
  • Use words that you think will help or are used by others to find models like yours online. Repeat these words across the title, descriptions, and keywords to increase online discoverability.
  • Do not use acronyms if possible. Spell out the full acronym if you must use them.
  • Below is a simple example :
    • “This workflow is for purposes of filtering, segmenting, and geometrically characterizing porous media image datasets. Presented as a Jupyter Notebook, it contains algorithms to correct beam hardening, denoise, segment, and characterize images. The geometric characterization algorithm demonstrates the shapes of pores in the segmented dataset.”
    • “The workflow is developed for a supervised master’s thesis C. Turhan, “Towards Scalable Data Model for Curation and Reusable Workflows for Porous Media Image Analysis,” The University of Texas at Austin (2024), and is being used by the Digital Porous Media Research Group.”
  • Provide a README file that describes:
    • Provide/Upload all the required files associated with the model.
    • See sample references: Example 1
  • Provide a README file to describe a suitable setup to run the model efficiently. For example:
    • “The workflow can be run on a laptop or a computer cluster, the latter requiring suitable code modifications. Since it is a Jupyter Notebook, users can easily modify the code cells for specific rock types. Usage instructions are provided in the Jupyter Notebook.”
  • Provide a README file describing specifications about libraries, etc

If the dataset is considered secondary to the project and therefore reused from external sources

  • Provide an overview of the methodology by which the dataset was obtained. Provide the dataset name and source (citation)
  • Provide a very brief overview of the scope and contents of the dataset and how it is organized.
  • If applicable, describe how the dataset was processed, including preprocessing information if the dataset is specifically collected/generated/integrated or “uniquely” preprocessed for purposes of the research study or project.
  • Indicate whether the data were quality-controlled; you may go into more detail in the Data Report.
  • Keep descriptions concise and engaging.
  • Use language that can reach experts as well as layperson audiences.
  • Use words that you consider will help or are used by others to find datasets like yours online.
  • Here is a simple example of a dataset description.
  • Provide a sample(s) of the dataset(s), especially if the dataset is large. Here is an iHARP example of GitHub.

If the dataset is considered a primary outcome of the project or research study

  • Datasets should be described as a standalone research output, so they can be understood independently from related research products, such as a published paper or research code. Focus on describing the dataset. You may begin the text with “This dataset…”
  • This is particularly important if you have a dataset that has been specifically collected/generated/integrated or “uniquely” preprocessed for purposes of the study,
  • Additionally, one should prepare a detailed data report (additional guidance will be provided for this). Dataset contributions will be listed on Zenodo via iHARP’s Community Collection.
  • Provide a citation for the published work based on the model.
  • Provide the correct DOI associated with the published research paper as indicated in the example below;

Maloy Kumar Devnath, Sudip Chakraborty, and Vandana P. Janeja. 2024. Deep Learning for Antarctic Sea Ice Anomaly Detection and Prediction: A Two-Module Framework. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Geospatial Anomaly Detection, October 29, 2024. ACM, Atlanta, GA, USA, 90–93. https://doi.org/10.1145/3681765.3698457

 

last updated 2025 December 17