17 January, 2024
The vast amount of genomic, transcriptomic and proteomic data from the study of different tissues and, increasingly, individual cells requires the training of professionals who are skilled in many aspects of high-throughput data analysis.
The aim of the programme is to train and prepare professionals with expertise in all aspects of biomolecular data management and interpretation to meet the needs of academia and industry. The program is run by Pázmány University's Faculty of Information Technology and Bionics (Hungary) and HiDucator Ltd, a Finland-based company specialized to deliver online courses in the fields of bioinformatics, biodata analysis, and biostatistics.
The training will emphasise analysis in the R environment, but will also include Python and Java skills. The programme will provide an overview of biostatistics, sequence analysis, bioinformatics aspects of high-throughput genomics, proteomics and transcriptomics, phylogenetics and structural bioinformatics.
The purpose of this course is to teach how the R statistical environment can be applied for biological data analysis.
• Introduction to R: Installation,
package management, basic operations
• Sequences and sequence analysis
• Annotating gene groups: Ontologies,
pathways, enrichment analysis
• Proteomics: mass spectometry
• Reconstructing gene regulation networks
• Network analysis: iGraph
An overview of the major biological databases and an introduction of the basic sequence analysis methods
• Biological databases with the main focus on DNA and protein sequences
• Comparison and alignment of sequences, similarity-based searches in databases
• Discovery of protein sequence motifs and sequence features; metabolic pathway data
• Genome browsers and sources of gene expression data; gene lists and the concept of enrichment
• Micro-RNAs and their targets; protein visualization
Familiarity with Internet sources for genome-wide data; basic skills in using tools at these web sites; understanding how modern high-throughput methods generate sequence data and gene and protein expression data; practical skill of using genome browsers to access genome data and genome comparison data; understanding gene prediction and genome annotation pipelines; skill of performing individual gene predictions; understanding different levels of variation in human genomes; understanding basic workflows of microarray data analysis and next-generation sequencing data analysis; basic knowledge of experimental methods in proteomics and metabolomics which enables understanding data analysis in these fields; skill of identifying proteins from mass spectroscopic data.
An application oriented course focusing on how statistical methods can be used to address common problems in the analysis of results from molecular biology experiments.
• Comparing simple groups: hypothesis testing
• Multiple groups: ANOVA and related concepts
• Hypothesis testing in complex experimental settings: Randomized complete block design
• Dose and response: regression models
• Handling low sample sizes with General Linear Models
• Planning optimal sample sizes: how many animal do I need?
Programming for beginners, using the Python langauge:
• Concepts in programming, fundamentals of algorithms
• Basic variable types & data structures
• Program organization, loops and conditional statements
• Basics of file input/output
• Parsing text files
• Basics of GUI programming
Basics of search, manipulation and analyses of structures of large biological molecules, especially proteins.
Advanced concepts and tools in biomolecular structure analysis
• Assigning secondary structural elements from 3D coordinates
• The concept of domains: definitions based on structural and sequence features
• Origins and uses of global and local similarity in structures
• Structure classification and functional assignment
• Prediction of structural features from sequences
• Full 3D structure prediction
• Protein: ligand docking
• Ensemble-based structural models to represent protein internal dynamics
• Analysis of the structure of nucleic acids
Introduction of the two most frequently applied approaches to locate the common features of large gene lists.
• Enrichment analysis: an overview
• Over-representation Analysis
• Gene Set Enrichment Analysis
Introduction of the frequently applied approaches to create heatmap plots from gene expression data using online and local software tools.
• Theoretical background of heatmap analysis
• Methodological overview including statistics and software aspects
• Practice with the heatmap tool at Gene Expression Omnibus
• Optional practice: How to create heatmaps with R?
Flow cytometry: counting and sorting stained cells
• Next-generation sequencing: introduction and genomic applications
• Quantitative transcriptomics: qRT-PCR
• Advanced transcriptomics: gene expression microarrays
• Next-generation sequencing in transcriptomics: RNA-seq experiments
• Analysis of DNA-protein interactions: chromatin immunoprecipitation
Introduction to the Java programming langauge:
• Basics of object-oriented programming in Java
• Classes, interfaces, inheritance, function overloading
• Basics of file input/output
• Basics of GUI programming
• Concept of threading in Java
• Use of external APIs to solve bioinformatics-related tasks
Phylogenetics is the taxonomical classification of organisms based on how closely they are related in terms of evolutionary differences. The course will familiarize students with different phylogenetics algorithms and practical software applications for biological problems.
• Introduction to phylogenetics, and essentials of evolution as background
• Data types for phylogenetic analysis and parsimony
• Distance based methods, distance matrices, nucleotide substitution models
• Model based methods: maximum likelihood and Bayesian phylogenetics
• Auxiliary methods: bootstrapping, consensus trees, tree comparison
• Visualization of phylogenetic trees
Biologist, protein structure researcher. His research experience extends to various fields of bioinformatics, and he is experienced in method development and data analysis. He regularly publishes as the last author in interdisciplinary, molecular biological, and bioinformatics profiled journals. He serves as a regular reviewer for PhD theses and domestic and international grant applications. Currently, he is a member of the Presidency of the Hungarian Bioinformatics Society.
Assistant professor at university of tampere,
CEO at hiducator ltd
Dr Ortutay is an assistant professor at University of Tampere and CEO of HiDucator Ltd. He supports the development of eLearning concepts and implementation of online courses as part of the re- design process of higher education. He is the author of dozens of bioinformatics research papers, and a definitive text book in the field.
Dr. Tolvanen has designed award winning online courses, which were delivered for the interested researchers world-wide since 2002. Dr Tolvanen is a university lecturer of the University of Turku (Finland), author of several bioinformatics research papers with a focus on genomics and structural biology.
With his extensive experience in systems biology, he is an internationally recognized researcher in the regulation of the cell cycle and circadian rhythm. He has more than a decade of experience in foreign teaching and research. He regularly publishes in reputable journals in his field. Currently, he represents the Budapest region in the Hungarian Biochemical Society.
Head of the Pázmány ITK Neural Bioinformatics research group, an expert in molecular data processing. His research area includes the development of large-scale genomic language models and sequence representations, along with algorithms built upon them.
Bioinformatician, Chairman of the Interdepartmental Scientific Committee on Bioinformatics of the Hungarian Academy of Sciences. He has known, followed and actively researched the field of bioinformatics since its inception.