Aside

Contact


Skills

Statistical:
Bayesian inference · Hierarchical models · Random forest · Experimental design

Programming:
R · Python · Julia · Stan · SQL · duckDB · Bash

Data Engineering:
ETL/ELT · Apache Arrow · Cloud-optimized data (Parquet, COG) · Azure (Blob, Databricks) · HPC

Reproducibily:
Quarto · Markdown · Jupyter · Pandoc · LaTeX · Make

DevOps:
Unix · Git · GitHub · CI/CD pipelines · Docker · Unit testing

Web & Interactive Tools:
HTML/CSS · Shiny · API design basics


Outreach

· Developed and maintained the QCBS R Workshop Series, reaching nearly one thousand graduate students · Taught +250 students across biology, engineering, and graduate programs (programming, statistics, ecology) · 5 published papers & preprints with ~100 citations · Translated client needs into technical analyses and delivered clear, actionable results


Languages

Portuguese · Native
French · Full Professional
English · Full Professional

Disclaimer

CV source code hosted on

Last updated: 2026-02-06

PDF download available

Main

Willian Vieira PhD

Data scientist combining machine learning, Bayesian inference, and cloud-oriented data engineering to deliver reliable analytical pipelines at scale.

Data Science Experience

Data Analyst

Habitat, Montreal, Canada

N/A

2025 - 2024
(1 yr 9 m)

Built and maintained automated, reproducible pipelines for cloud-based spatial data using R, Python, and Julia | Delivered geospatial ML models for client projects | Designed a lakehouse data architecture for efficiently streaming unstructured datasets from Azure storage | Developed a metadata-driven workflow to improve pipeline scalability and governance | Led the team’s transition to Unix, creating automated setup scripts while modernizing the workflows with Docker, CI/CD, and DevOps practices

PhD Research

Integrative Ecology Lab, Sherbrooke, Canada

N/A

2024 - 2017

Built end-to-end analytical pipelines during PhD, from assembling continental forest-climate datasets to implementing mixed-effects, ML, and non-linear Bayesian hierarchical models | Developed open-source R packages (, ) and reproducible HPC workflows to run thousands of model fits at scale. This work formed the basis of one peer-reviewed publication in Ecological Modelling, two preprints (, ), and a technical methods book documenting the full modelling and computational pipeline

Biostatistician

Environment and Climate Change Canada - Quebec, Canada

N/A

2022 - 2020
(part-time)

Developed a cost-aware probability sampling protocol to improve the spatial representativeness of boreal bird surveys in Quebec | Led R&D on spatial bias correction, designing a method later adopted by other provinces | Produced a fully open source, reproducible analytical workflow with automated documentation

Education

PhD, Ecology

Université de Sherbrooke - Sherbrooke, Canada

N/A

2024 - 2017

How climate, competition, and forest management shape the limits of tree species distributions: from individuals to metapopulations

Masters 2, Agroecology and Resource Management

Bordeaux Sciences Agro, Bordeaux, France

N/A

2016 - 2015

Modelling the dispersion of weed species in agricultural landscapes

BSc in Agronomy

Universidade Federal de Santa Catarina, Florianópolis, Brazil

N/A

2015 - 2010

Defaunation impact on a threatened species: araucaria in southern Brazil