cv

General Information

Full Name João Augusto Leite
Languages Portuguese, English

Education

  • 2022-2026
    PhD - Computer Science
    University of Sheffield, UK
  • 2021-2024
    Msc - Computer Science
    Universidade Federal de São Carlos, Brazil
    • Specialised in self-training and data augmentation methods for hate speech detection.
    • Supervised by Prof. Dr. Diego Silva.
    • Published a research paper and a thesis: "Noisy Self-Training with Data Augmentations for Offensive and Hate Speech Detection Tasks". Presented at RANLP 2023 in Varna, Bulgaria.
  • 2017-2021
    Bsc - Computer Science
    Universidade Federal de São Carlos, Brazil

Experience

  • 2024 - Present
    Research Associate
    University of Sheffield, UK
    • Developing content verification tools for the Chinese language.
  • 2023 - 2024
    Graduate Teaching Assistant
    University of Sheffield, UK
    • Prepared lab demonstrations and graded undergraduate work for the text processing module (COM3110).
  • 2021 - 2022
    Data Scientist
    PicPay, Brazil
    • Designed and deployed information retrieval models for millions of users.
    • Conducted A/B testing for proof-of-concept projects.
  • 2019 - 2021
    Machine Learning Engineer
    Birdie.ai, Brazil
    • Built NLP models for applications like NER, sentiment analysis, and ontology building.
    • Led labeling tasks for supervised learning.

Volunteer Work

  • 2019
    Organiser
    Semana da Computação - UFSCar, Brazil
    • Semana da Computação is an annual event that brings together students and professionals from many fields of Computer Science.
    • I was a member of the organising committee and responsible for defining the event's schedule and contacting speakers.

Awards

  • 2022
    EPSRC Doctoral Training Partnership (DTP) Scholarship
    UK Research and Innovation (UKRI)
    • Covers full international tuition fees, living expenses and provides a research support grant.
  • 2021
    Best Paper Award
    Department of Computer Science, Universidade Federal de São Carlos
    • My paper "Toxic language detection in social media for Brazilian Portuguese: New dataset and multilingual analysis" was awarded the best paper published in the computer science department in 2021.

Publications

For a complete list of publications, please visit my Google Scholar profile or the publications page.

Research Interests

  • Applications
    • AI for social good, content verification, disinformation and hate speech mitigation, assessment of credibility and persuasiveness.
  • Responsible AI
    • Fairness, transparency, alignment, adversarial robustness, interpretability.
  • Natural Language Processing
    • Language modelling, agentic AI, prompting strategies, learning from explanations, surface-form competition, multilingual settings and cross-domain adaptation.
  • Semi-supervised and Unsupervised Learning
    • Self-training, weak supervision, data augmentation, contrastive learning, zero and few-shot learning.

Stack

  • Programming Languages
    • Python, C
  • Machine Learning Frameworks
    • PyTorch, HuggingFace, Scikit-learn, spaCy, NLTK
  • ML Ops
    • Git, PyTorch Lightning, Weights & Biases, DVC, Databricks, AWS
  • Other
    • Unix shell scripting, Web scraping, Slurm