Presentation skills: speak slow! explain for motivation and background ! less teminology jar or abbreviation (not most people know!) . more transition between each page

Due to the covid-19,SIB Days of 2020 becomes a virtually meeting. Login into to virual platform can be found in my personal email (SIB Days 2020 - login information). Generally speaking , I think this virtual conference is very well organised. They have a compact website for everything. One thing I like most is that they put links to ZOOM/Jitsi link for every presentation and poster. So users can go to specific session directly without using an additional link sent by email or via other ways.

Ron Appel , SIB executive Director , gives welcome speech SIB is both working place and community for bioinformatics communication, training Rons speaking speed is pretty slow, which is good practice !

Science community: Roman Arguello,Julien Roux, Alexsandra Sapala , Rob Waterhouse

Christine Durinx

Topics I’m interested:

#Keynote lecture:

Victoria Nembaware, University of Cape Town The Sickle Cell (a.k.a. Banana Cell) Africa Data Coordinating Center (SADaCC)

Sickle cell disease (SCD) is a life-threatening blood disorder that affects millions of people worldwide, with disproportionate incidence rates and mortality rates in the Sub-Sarahan (SSA) region. The Sickle Africa Data Coordinating Center (SADaCC) was created in 2017 to provide support to develop the largest multi-national SCD patient registry in SSA, data standardization, research capacity development, support improved healthcare and coordinate communications for the SCD in SSA Collaborative Consortium (SickleInAfrica). SADaCC envisions contributing to improved quality of care, quality of life and survival of people affected by SCD in Africa and globally. This presentation will highlight SADaCC’s strategy, lessons to date and key achievements.

resource constrained

SCA: single cell anemia , one protein changes.

Registry: SickleInAfrica Consortium

Database: Sickle cell disease ontology

Datacollection and storage: Redcap app, … HPC

Big data analytics : quality

#10:45 - 11:00 time changed François Bonnardel (Group: Frédérique Lisacek) Prediction of carbohydrate-binding proteins in microbial proteomes check poster !

#10:55 - 11:05 Fabian Arnold (Group: Valérie Barbié) MTPpilot: an interactive software for NGS results analysis for molecular tumor boards

good slow speed. introduce motivation; add index/number for different position in slide for easy illustration HTML javascript backend php-MSSQL

#11:00 - 11:15 Luciano Abriata (Group: Matteo Dal Peraro) State-of-the-art web services for modeling the structures of proteins that lack clear templates in the PDB check this and pay attention can use ideas from structure prediction to (functional) PPI prediction machine learning or deep learning for my project ?

#10:15-10:45 : SIB PhD Training Network Coffee Corner Gregoire Rossier, Stephany Orjuela

#11:05 - 11:10: Iulian Dragan (Group: Mark Ibberson) dsSwissKnife: Enabling federated data analysis for biomedical research bring analysis to data. only return anlysis to central server (without shareing sensitive data ) PCA, Clustering and ML(random forest) with open sourece software :Pol , DATASHIEDLD

#11:10 - 11:15 Maria Famiglietti (Group: Alan Bridge) Linking genome and variation data with knowledge of protein function and disease: Enhanced integration of UniProtKB/Swiss-Prot, ClinVar and PubMed. read poster standardazation of variant clinical features, function impact

#15:30 - 16:30 Poster Session 1

Phylogeny-driven and alignment-free protein family assignment is an accurate and scalable alternative to methods relying merely on closest sequence

paper: https://doi.org/10.1101/2020.04.30.068296 Victor Rossier, Alex Warwick Vesztrocy, View ORCID ProfileMarc Robinson-Rechavi, Christophe Dessimoz

Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary analyses. Generally, these sequences are assigned to the same families and subfamilies as their closest (most similar) sequence.

However, ignoring the gene family phylogeny is often misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. In particular, when the query diverged before the emergence of subfamilies, its closest sequence is likely to belong in an over-specific subfamily. For example, an hemoglobin beta as closest sequence of an hemoglobin query. Moreover, when the query subfamily undergoes accelerated evolution, the query closest sequence can belong to a sister subfamily or merely to the parent family.

To overcome this shortcoming, phylogeny-driven tools have emerged but remain restricted to small-scale phylogenomic databases. Indeed, these tools rely on gene trees, whose inference is computationally expensive.

Here, we first demonstrate that 20 to 70% of assignments by closest sequence go to the wrong, and mostly over-specific, subfamily. Consequently, we introduce OMAmer, a novel alignment-free protein family assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. While inspired by metagenomic taxonomic classifiers, OMAmer leverages the novel idea of subfamily-informed k-mers for alignment-free mapping toward ancestral protein subfamilies. While able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than the closest sequence baseline, DIAMOND.

OpenGenomeBrowser: A dynamic and scalable web platform for comparative genomics

Driven by advances in sequencing technologies, many organizations and research groups have accumulated hundreds of genome sequences. Fortunately, standardized data formats are widely used to store assemblies and annotations. Nevertheless, few attempts have been made to automate bioinformatic analysis steps to help non-programmers explore the data. Further challenges are emerging as sequencing projects progress: storing data and metadata, tracking and denoting changes in assemblies or annotations, and enabling easy access. OpenGenomeBrowser is an open-source Python/Django-based website that is designed to solve these problems. If it is provided with a certain simple format for storing genome-associated files and metadata, it allows the user to easily filter genomes based on metadata, search for annotations, compare gene loci, interactively browse KEGG-maps, search for orthologous genes, perform BLAST searches and, in addition, gives users access to the raw data. To offload compute-intensive tasks to a cluster, some functions can be over-written by user-defined code. The popular Python programming language was chosen to make it easy for users to add new functions and, ideally, contribute them to the project.

#HOTs Hierarchical Orthologous Transcripts Orthology, the relation between genes of different species derived from a common ancestor, is commonly used to propose function equivalence between species and propagate results obtained in model organisms to other species. Orthology inference has been tackled by dozens of tools and online resources, however, these tools usually perform inference considering a unique protein coding sequence by gene. They rarely account for other isoforms issued from alternative splicing, the eukaryotic mechanism that allow a single gene to be transcribed in multiple transcripts by different combinations of its exons. Integrating alternative splicing in the framework provided by orthology inference resources would allow to better reconstruct the evolutionary history of alternative transcripts and provide important insight into their evolutionary conservation, a relevant marker of functional significance. Our ongoing project is to design Hierarchical Orthologous Transcripts (HOTs), a scalable software dedicated to compute homology relationship between transcripts. It combines OMA Hierarchical Orthologous Groups (HOG), groups of gene descended from a common ancestor at a given speciation event, and spliced alignment, an alignment technique used to align genes exon by exon. Using spliced alignment, we will identify isoforms sharing homologous exon structure. Then, using an implicit gene tree derived from HOGs, we will locate gain and loss of alternative transcripts in the considered tree. Among other applications, this software will give the possibility to identify highly conserved isoforms, more likely to be functional, and allow to determine if relevant isoforms in model species have orthologs in human and other species.

#Enabling structure-guided life science research with the SWISS-MODEL Repository Three-dimensional structures of proteins provide valuable insights into their function on a molecular level and inform a broad spectrum of applications in life science research. The Protein Data Bank (PDB) contains all experimentally determined structures, but they only cover a fraction of all known protein sequences. The SWISS-MODEL Repository (SMR, https://swissmodel.expasy.org/repository) extends the structural coverage and provides an up-to-date collection of annotated 3D protein models generated by automated homology modelling. SMR intends to provide models of sufficiently high quality to support structure-based applications in life science research. Reliable quality estimates are therefore an essential feature of SMR models.

We look at antibiotic resistance in Mycobacterium tuberculosis to showcase SMR. For M. tuberculosis, SMR covers ~68% of the residues in its proteome while only ~13% would be covered by the PDB. A protein which drew attention in the scientific community is the Mycothiol ligase mshC, which is thought to be involved in resistance to the antibiotic Isoniazid and was recently proposed as a new drug target. No experimental structure exists for mshC but SMR provides a high-quality homology model. We show how the annotated model can be visualized to generate hypotheses on resistance variant mechanisms and comment on using the model to assist future experimental structure determination.

Homology models can provide valuable structure information much faster than experimental methods, which is especially useful in time critical situations. For example, several research groups have made use of SWISS-MODEL to study the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins.

#Evolutionary conservation and constraint from multispecies whole genome alignments The ever growing number and quality of available genomes are enabling increasingly powerful comparative genomics analyses. In particular, these resources allow for the identification of genomic sequences conserved across multiple species, which are likely to be under selective constraints and therefore to play key functional roles. In practice, conserved sequences can be identified from multispecies whole genome alignments (MWGA), which have been successfully used to uncover new functional elements (e.g. coding regions, regulatory elements like miRNAs, and other unclassified elements) and to identify signatures of selection in several species, including Human, Drosophila, and Arabidopsis. Here, we present an automated and cross-platform pipeline to compute MWGAs, perform downstream evolutionary analyses, provide resources to annotate coding and non-coding regions, and generate data visualization resources in the form of Genome Browser tracks. We apply this pipeline to compute and flexibly update MWGAs for numerous arthropod clades. We demonstrate the use of these MWGAs to improve the annotation of coding and non-coding regions in arthropod genomes and to investigate conserved functional genomic elements across arthropods.

10:00 - 10:45 COVID-19 Panel Discussion Facing the COVID-19 crisis as bioinformaticians: four perspectives

Session chairs: Julien Roux and Maïa Berman

I like the way Maia keeping sharing realated links in Zoom chat

The COVID-19 pandemic is now slowing down in most European countries, but it will leave profound marks on society. Many SIB members played a significant role in fighting against COVID-19 by developing or maintaining key resources that were necessary for the scientific community: Nextstrain, ViralZone, and many more. In this session we will meet and discuss with some of them, looking back at what went on behind the scenes over the past months. What were the immediate effects of the COVID-19 crisis on their work and teams? What were the main challenges to overcome? Which lessons were learned for the future, and how do they envision the evolution of their resources in the next few months? Our discussion will revolve around the themes of communication, training, biocuration, open science, collaboration and sustainability of resources. Our panel speakers are: Emma Hodcroft, Postdoctoral researcher at the Biozentrum, University of Basel (Nextstrain) Philippe Le Mercier, Project Manager at the Swiss-Prot Group, Geneva (ViralZone) Patricia Palagi, Head of the SIB Training Group, Lausanne Fabio Rinaldi, Group Leader at the University of Zürich and Dalle Molle Institute for Artificial Intelligence Research, Lugano

SIB Resources supporting COVID-19 research https://www.sib.swiss/about-sib/news/10660-sib-resources-supporting-sars-cov-2-research Discover Nextstrain https://nextstrain.org/: communication Emma Hodcroft ,suddenly attention from public. dramatically increasing visitors to web/service Training:Patricia Palagi, List of upcoming SIB Courses https://www.sib.swiss/training/upcoming-training-courses SIB online courses playlist on YouTube https://www.youtube.com/playlist?list=PLoCxWrRWjqB3fIC7JTnhCRFmraL7lXy6t ViralZone: Philippe Le Mercier, data curation and annotation, Discover ViralZone https://viralzone.expasy.org/ Fabio Rinaldi, The Literature Based Discovery resource https://pub.cl.uzh.ch/projects/COVID19/lbd.html collaborations: The early UniProt release: https://covid-19.uniprot.org/ The COVID-19 Integrated Knowledgebase https://covid-19-sparql.expasy.org funding sustainability: Join the Global Biodata Coalition stakeholder meetings: https://www.eventbrite.co.uk/e/gbc-stakeholders-meeting-30-june-1100-1300-bst-registration-105218170380 News on the GBC: https://www.sib.swiss/about-sib/news/10699-global-biodata-coalition-a-champion-for-essential-biological-databases

Question for Emma:

Do you consider using more F.A.I.R. sequence sharing platforms, such as EMBL-EBI’s European Nucleotide Archive (ENA) ? (GISAID has received criticism by various ELIXIR pojects)

#10:45 - 11:15 Keynote Lecture 2 Getting it out there: Science communication for bioinformatician Getting it out there: Science communication for bioinformatician

To make an impact, your research has to reach the experts, policymakers and members of the public it’s designed to help. But how do you get your research noticed? And how do you contribute to the wider conversation – even if you don’t have ‘news’ to share? Science journalist and broadcaster Flora Graham, who writes the influential Nature Briefing newsletter for the journal Nature, will introduce you to science communication for scientists. She will throw back the curtain on how science journalism works, how to reach the people you’re aiming for, how to grapple with social media, and how to measure success. Flora Graham, Science and Technology Journalist for Nature

Even if Flora is a native English Speaker with perfect accent and grammar and so on. But I still speak slow is import while she speak littble bit faster. Maybe its because of my low level of English

f you are about to publish a paper and would like to get SIB’s help in promoting it, feel free to contact the SIB communication team at communications@sib.swiss or Maïa Berman directly at maia.berman@sib.swiss 

#13:20 - 13:25 Victor Rossier (Group: Christophe Dessimoz) Phylogeny-driven and alignment-free protein family assignment is an accurate and scalable alternative to methods relying merely on closest sequence

#13:15 - 13:20 Julija Pecerska (Group: Maria Anisimova) Impact of MDR-TB strain background on transmission fitness loss

#13:25 - 13:30 Mathilde Foglierini (Group: Luciano Cascione) AncesTree: an interactive immunoglobulin lineage tree visualizer

#13:45 - 13:50 Marco Pagni (Group: Mark Ibberson) MetaNetX/MNXref 4.0 - a reconciliation of metabolites and biochemical reactions import for parsing PPI>?

#13:50 - 13:55 Xavier Richard (Group: Christian Mazza) On the use of game engines to perform stochastic simulations of bacterial interaction things seem not difficult. but title “game engine” catch everyone’s eye and get most of questions. So ?

14:35 - 14:45 Andre Kahles (Group: Gunnar Rätsch) Metagraph: Representing sequence data at a petabyte scale.

14:45 - 14:50 Anna Koroleva (Group: Maria Anisimova) owards creating a new knowledge base for literature-based discovery.

14:50 - 14:55 Lionel Breuza (Group: Alan Bridge) Representing the human metabolome in UniProtKB using Rhea 15:05 - 15:15 Maciej Bak (Group: Mihaela Zavolan) novel approach to detect the influence of RNA-binding proteins on pre-mRNA processing.

15:20 - 15:25 Roman Mylonas (Group: Mark Ibberson) Pumba: a web resource to verify antibody based results 15:15 - 15:25 Fabio Rinaldi (Group: Fabio Rinaldi) Fast Efficient Accurate Entity Recognition for biomedical applications.

14:55 - 15:10 min Philippe Le Mercier (Group: Alan Bridge) Bioinformatic analysis of SARS-CoV-2 15:10 - 15:15 Maarten Reijnders (Group: Robert Waterhouse) CrowdGO: a wisdom of the crowd-based Gene Ontology annotation tool

16:00 - 17:00 Poster Session 2

#SwissDrugDesign: a Collection of Web-based Tools to Support Drug Discovery and Development Drug discovery is an interdisciplinary, complex, time-consuming and expensive process. Today, Computer-aided Drug Design (CADD) is considered as a valid strategy to improve efficiency and speed by addressing several questions along the drug discovery and development pipeline.

The SwissDrugDesign initiative aims at providing a comprehensive and freely accessible web-based in silico drug design environment to the scientific community. Its purpose is to offer a large collection of integrated tools covering many aspects of CADD.

This demonstration will exemplify some typical CADD activities that can be achieved through the different SwissDrugDesign modules. These includes the prediction of potentially active compounds, the optimization of the most promising chemical series, or the selection of drug-like molecules to increase success rate in cellular or in vivo assays. This will be performed through molecular docking (www.SwissDock.ch), direct or reverse virtual screening (www.SwissSimilarity.ch and www.SwissTargetPrediction.ch), estimation of physicochemical and pharmacokinetic parameters (www.SwissADME.ch) and design of bioisosteric molecules (www.SwissBioisostere.ch).

appendix

Monday 8 June: Workshops

Building scalable data integration pipelines using container technology: From Theory to Practice Organisers: Kasun Samarasinghe and Pierre-Andre Michel All the data and knowledge resources have to integrate data from various external resources. In order to accommodate large scale data, scalable data integration pipelines need to be developed. The aim of this workshop is to present the steps to build a scalable data integration pipeline using container technology and state-of-the-art workflow management tools.Containerization is a mature technology, which is used in various different use cases, especially to isolate large systems into manageable and maintainable components. Such containerized components can be deployed to achieve scalability and load balancing. We will present a container-based architecture for data pipelines based on Docker and the use of such a containerized workflow in the Apache Airflow open source workflow management framework.A hands-on session will be organized for the attendees to build a simple ETL (extraction, transformation and load) pipeline, ideally from a bioinformatics resource such as Ensembl. We will also share neXtProt’s experience on developing and improving our internal pipelines and the challenges faced during the process.Finally, an open discussion will be conducted to exchange ideas, comments and possible collaborations to work on improvements of the presented system in a bioinformatics context.

Building scalable data integration pipelines using container technology: From Theory to Practic Organisers: Stefan Bienert, Pablo Escobar and Jaroslaw Surkont Description: Going beyond the usual 20 minute Docker tutorial… lots of tutorials on the net leave the user at the point where they can run a simple application out of a single container. But how do you get your own full-blown web service in the cloud thereafter?That requires multiple containers working together in the scope of container orchestration. In a real world setup, this requires the containers to be provided via a container registry. Finally, since no one likes to rebuild containers manually for every code change, the whole deployment should be automated by CI.That is what we want to try to teach people in this workshop: Take a (Django) web app, distribute it into multiple containers and run it via Docker Compose as an entry-level orchestration tool. Of course, in the back everything should live in a Git repository and from there we want to add GitLab’s CI functionality to our project. To not store confidential information like passwords inside the Git repository while using the CI, GitLab secrets will also be a topic. As usual we want the users to do the work.

Snakemake for reproducible analyses Organiser: Romain Feron and Antonin Thiebaut Data analyses usually involves pipelines connecting tools and script to process the data. In recent years, workflow management systems have attempted to standardize these pipeline and provide many quality-of-life features for pipeline developers . Snakemake is a popular workflow manager that works as a superset to Python.It implements a text-based workflow specification language and an execution environment allowing easy execution of workflows on local and remote computers and computational platforms

11:15 - 11:25 João Matias Rodrigues (Group: Christian von Mering) Ecological insights from the analysis of the Microbe Atlas Project dataset, a global reference set of a million microbiome samples 11:30 - 11:35 Marta Perez (Group: Vincent Zoete) Addressing challenges in cancer immunotherapy with structural bioinformatics approaches 11:35 - 11:40 Erblin Asllanaj (Group: Torsten Schwede) Enabling structure-guided life science research with the SWISS-MODEL Repository Interesting

11:35 - 11:40 Xinrui Lyu (Group: Gunnar Rätsch) Mutational Signatures used Supervised Negative Binomial Non-Negative Matrix Factorization 12:15 - 13:30 SIB PhD Training Network GTG

13:30 - 13:45 Abdullah Kahraman (Group: Christian von Mering) Pathogenic impact of transcript isoform switching in 1209 cancer samples covering 27 cancer types using an isoform-specific interaction network 13:45 - 14:00 Luciano Cascione (Group: Luciano Cascione) Transcribed ultra-conserved regions model drug response in the NCI60 cell line panel

10:45 - 11:00: Lucas Paoli (Group: Shinichi Sunagawa) Exploring uncharted phylogenomic diversity reveals novel chemistry in the global ocean microbiome 14:45 - 14:50 Marthe Solleder (Group: David Gfeller) Mass spectrometry based immunopeptidomics leads to robust predictions of phosphorylated HLA class I ligands