Since 2017, I've been developing and applying data analysis and machine learning tools to solve biological questions in different fields of biology, such as Evolution, Biochemestry, Molecular Biology, Systems Biology, and Gene Regulation.
Currently, I am the bioinformatician of the Bas van Steensel lab, where we develop new technologies to understand gene regulation better.
In this context, I'm mainly responsible for developing AI tools, visualization methods, automated pipelines, and statistical frameworks that follow these technologies.
Interests: Bioinformatics; Computational Biology; Deep Learning; Systems Biology; Evolution
PARM is a deep learning model that predicts the promoter activity from the DNA sequence itself. We trained PARM on a specific type of MPRA data that allows it to predict cell-specific promoter activity in a lightweight and fast manner.
In addition to developing PARM together with my colleagues, I was responsible for performing a plethora of computational experiments with PARM to test hypotheses on transcription factors biology. I was also responsible for publishing PARM as a Bioconda package and maintaining the codebase.
Primetime is a user-friendly pipeline for analyzing transcription factor (TF) prime reporter data. My colleague developed a robust method to quantitatively detect the activity of TFs, and I developed primetime to follow this new technology.
Primetime is a snakemake pipeline that automates the processing of sequencing data, including barcode counting, clustering, annotation, and differential TF activity analysis across experimental conditions.
Domainogram is a type of plot that shows statistical differences in nuclear lamina interactions between two conditions using pA-DamID data. I've done a rework in the original code. My version is based on ggplot2 to make it more flexible and easier to use.
Also, in the paper below, we studied the mechanisms behind the nuclear lamina interaction by rearranging (deleting, inverting) the genome organization. In this context, I adapted the statistics and visualization of the domainograms to each of the new rearrangement of the genome.
CURE is a framework to curate UCE data for species-tree reconstruction. When dealing with UCE data, there are two main ways to cure the data. None of them was automated or efficiently suitable for large datasets.
Guided by my colleagues, I developed CURE to solve this problem. CURE is user-friendly and speeds up the curation process by parallelizing the steps.