Since 2017, I've been developing and applying data analysis and machine learning tools to solve biological questions in different fields of biology, such as Evolution, Biochemestry, Molecular Biology, Systems Biology, and Gene Regulation.
Currently, I am the bioinformatician of the Bas van Steensel lab, where we develop new technologies to understand gene regulation better.
In this context, I'm mainly responsible for developing AI tools, visualization methods, automated pipelines, and statistical frameworks that follow these technologies.
Interests: Bioinformatics; Computational Biology; Deep Learning; Systems Biology; Evolution
PARM is a deep learning model that predicts the promoter activity from the DNA sequence itself. We trained PARM on a specific type of MPRA data that allows it to predict cell-specific promoter activity in a lightweight and fast manner.
In addition to developing PARM together with my colleagues, I was responsible for performing a plethora of computational experiments with PARM to test hypotheses on transcription factors biology. I was also responsible for publishing PARM as a Bioconda package and maintaining the codebase.
Primetime is a user-friendly pipeline for analyzing transcription factor (TF) prime reporter data. My colleague developed a robust method to quantitatively detect the activity of TFs, and I developed primetime to follow this new technology.
Primetime is a snakemake pipeline that automates the processing of sequencing data, including barcode counting, clustering, annotation, and differential TF activity analysis across experimental conditions.
Domainogram is an R package developed during my PhD that implements statistical tests to study lamina-associated domains (LADs). The method works by comparing the proportion of domains co-localized with known LADs.
The package uses statistical differences to validate biological findings, applying a genome rearrangement approach to compute domain statistics and incorporates analysis of gene expression differences across domain classes, providing a comprehensive approach to chromatin organization analysis.
CURE is a framework to curate UCE data for species-tree reconstruction. When dealing with UCE data, there are two main ways to cure the data. None of them was automated or efficiently suitable for large datasets.
Guided by my colleagues, I developed CURE to solve this problem. CURE is user-friendly and speeds up the curation process by parallelizing the steps.