HomeAboutSoftwarePublicationsPostsMicroBinfie Podcast

MicroBinfie Podcast, 78 StaPH-B state public health bioinformatics

Released on March 31, 2022

Back to episode list

StaPH-B and the Cecret Pipeline: Insights from Dr. Erin Young and Dr. Kelsey Florek

Dr. Erin Young and Dr. Kelsey Florek recently joined us to discuss StaPH-B, a U.S. state public health bioinformatics group, and provided insights into the popular SARS-CoV-2 pipeline, Cecret.

About StaPH-B

Kelsey Florek explained that StaPH-B was created to facilitate collaborations between bioinformaticians in state public health laboratories. This group is particularly beneficial for those who are new to sequencing and understanding the data generated. It provides a communication and expertise network among different laboratories, contributing to projects funded by the NIH, CDC, and other grant agencies.

Erin Young highlighted the diverse membership of StaPH-B, which offers excellent learning opportunities. With nearly 400 members and over 50 channels focused on bioinformatics, StaPH-B uses a Slack workspace to provide a valuable resource where bioinformaticians can ask questions and share ideas.

When asked about membership, Kelsey clarified that while StaPH-B was initially founded for state public health bioinformaticians, it is open to everyone. However, the content is focused on state public health activities. Key achievements discussed include the Slack workspace, collaborations on GitHub, Docker, and the development of collaborative workflows.

StaPH-B's training activities, including the StaPH-B Toolkit, training sessions, and videos, ensure that knowledge and expertise are shared effectively across the community.

The Cecret Pipeline

The discussion moved to the Cecret pipeline, one of Erin’s bioinformatics pipelines for SARS-CoV-2. Developed during the pandemic, the intention was to use the Arctic group's protocol for sequencing SARS-CoV-2 on the Nanopore sequencing platform. However, Erin required an Illumina-based pipeline, as sequencing SARS-CoV-2 on the MiSeq was preferable to the Nanopore platform. The Cecret pipeline was developed using BWA as the default aligner and is intended for viral-based sequencing with a known, reliable reference.

Erin highlighted the SEQret pipeline tutorials and the monthly videos produced by StaPH-B, which outline various state laboratory projects, as useful resources for those entering the field.

Evolution of COVID Genome Analysis Workflows

In a previous conversation, discussions on the evolution of COVID genome analysis workflows were highlighted, noting how they have adapted due to the growing amount of data being analyzed. Various workflows like Secret, NF Core, Monro, and the Next Flow Optic Pipeline were mentioned for their unique features and popularity.

Erin, the creator of Secret, shared her initial apprehension about making her workflow public and her diligence in tracking changes in her repository to ensure scientific validity. Over time, the workflow has evolved with gradual improvements and fewer bugs, maintaining a consistent trajectory without dramatic shifts. The name "Secret" was inspired by a meaningful hiking landmark in Northern Utah.

The speakers emphasized the necessity of managing and connecting the increasing amounts of COVID data to public health efforts.

Conclusion

In conclusion, StaPH-B and workflows like Secret are playing significant roles in the fields of bioinformatics and COVID genome analysis. Collaborations and resources like StaPH-B are essential for sharing knowledge and expertise among laboratories, which is crucial for the successful execution of projects funded by organizations such as the NIH and CDC.

Extra notes

Microbial Bioinformatics Insights

  • STAFFB (State Public Health Bioinformatics Workgroup)

    • Founded to foster collaboration among state public health bioinformaticians.
    • Acts as a communication conduit for labs engaged in various projects.
    • Provides a Slack workspace with over 50 channels for diverse bioinformatics activities and communication.
    • Members include state health departments and others with relevant collaborations.
  • Collaboration and Resource Sharing

    • STAFFB facilitates resource pooling and expertise exchange, crucial for labs new to sequencing technologies.
    • Supports standardization and method-sharing for projects associated with bodies like CDC.
    • Encourages joining of members regardless of their organizational affiliation, though benefits are more tailored towards state public health activities.
  • Workflow Development for SARS-CoV-2 Sequencing

    • Early 2020, with the emergence of COVID-19, new workflows were urgently needed to sequence SARS-CoV-2.
    • Erin played a role in developing the SECRET workflow to translate Arctic group's nanopore-based protocols to Illumina platforms.
    • SECRET uses BWA as the default aligner and emphasizes a species-agnostic approach to amplicon-based sequencing.
  • Challenge of Workflow Adaptations

    • Initial workflows required compatibility with existing protocols which posed challenges.
    • Features evolved through community feedback and rapid environmental changes.
    • The introduction of multiple aligners and compatibility options to address submission standards (e.g., GenBank, GISAID).
  • Community Feedback and Workflow Evolution

    • The SECRET workflow, initially made public on GitHub, facilitated open feedback loops for refinement.
    • Use of community-developed methodologies (e.g., Docker, Nextflow) helped streamline bioinformatic processes.
    • Highlighted the adaptability of workflows as analysis strategies and public health priorities evolved.
  • Shifts in Sequencing Practice

    • Shift to high-volume sequencing (500-1000 samples weekly) required adapted data management and analysis practices.
    • Evaluation of tools and methodologies for effectively processing and connecting sequencing data with public health outcomes.
    • Examples include replacements of tools such as Pangolin for lineage assignment and various trimming methods.
  • Training and Knowledge Dissemination

    • STAFFB's efforts in training and tutorials to onboard new bioinformaticians.
    • Publicly available resources and monthly video projects providing insights into ongoing bioinformatic challenges and developments.
  • Innovative Tool Development and Usage

    • SECRET's development involved an iterative process of community engagement and feature refinement.
    • Incorporated tools like Minimap2, SamTools, and referenced specific feature needs to ensure broad usability while accommodating shifting standards in bioinformatics.

Challenges Faced

  • Rapid onset and evolving understanding of SARS-CoV-2 necessitated quick development of compatible workflows.
  • Balancing between legacy bioinformatics tools and cutting-edge needs often led to feature creep.
  • Continuous adaptation to ensure compliance with public health data submission standards amidst evolving pathogen genomics landscape.

Episode 78 transcript