We will introduce best practices and essential foundations to guide you from data loading to generating scientific insights. By the end of the course, you will be able to develop your data science ideas independently and in a structured way.
In Module I, inspired by the concept of tidy data from the R community, “Tidy Python” will mimic, to a great extent, previous CCMAR’s R workshop. Organise, process, and analyse tabular data of all sorts, however, in Python.
If you are familiar with topics from the Module I program, you can participate only in the Module II program. Module II will provide personalised guidance on your dataset, problem, or project as specified in your registration form.
Prerequisites:
- Audience: Researchers, PhD students, Master students
- Bringing your own laptop is required, however there's NO need for Python installation.
- Completion of the 2024 edition of Python for life science course or understanding specific parts of it, which are available as interactive notebooks here:
- For Module II, please share with us your dataset, tools you use and insights you would like to get from the data. Having Python running on your machine is not strictly necessary, but it will facilitate your project progress beyond the course.
- This advanced training course will be held in English.
Module I
2-3 July | 24 places
In Module I, from foundational concepts like conditionals and loops to libraries like NumPy, SciPy and Pandas, you'll learn to harness Python's power for data manipulation, analysis, and visualisation. Along the way, you'll develop a structured way of preparing and executing scientific data analysis workflows efficiently and reproducibly.
Day 1: Tidy principles applied to raw data
11:45 - 12:30, Help with Python setup (optional for those who wish to run codes locally on their machines).
14:00 - 17:30, Data exploration and tidy data.
Day 2: Data wrangling and visualization
10:00 - 12:30, Visualisation, pre-processing.
14:00 - 17:30, Analysis, reproducibility.
Module II (Bring your data/project)
4 July | 15 places
Module II will provide personalised guidance on your dataset, problem or project specified in your registration form, which may include data cleaning, exploratory analysis, predictive model development and training.
Day 1: Exploratory analysis, data cleaning, identify and use existing repositories/packages
10:00 – 12:30, Explore data and modularise code.
14:00 - 17:30, Pipelines, external packages, GitHub.
Instructors:
Paulo Martel (CINTESIS, UAlg): Computational biologist focusing on protein dynamics and lecturer at UAlg.
David Palecek (PBS, CCMAR): Python practitioner with an interest in automation and bioinformatics.
Registration & Costs:
NOTE: Members of CIMAR-LA (CCMAR and CIIMAR) have the same access conditions.












