CV

Contact Information

Name Alessio Cocchieri
Email alessiococchieri.ac@gmail.com

Experience

  • March 2026 - May 2026

    Dublin, Ireland

    Research Scientist Internship (PhD Intern)
    IBM Research
    Natural Language Processing — LLM Factuality
    • Created a novel benchmark for multimodal scientific fact-checking, targeting LLM long-form generation
    • Coordinated a multi-team human annotation pipeline, overseeing annotation guidelines, quality control, and IAA
    • Conducted systematic analysis exposing critical factuality fragilities in multimodal LLMs across scientific domains
  • March 2023 - May 2023

    Dublin, Ireland

    Research Scientist Internship (Master Thesis)
    IBM Research
    Natural Language Processing — Named Entity Recognition
    • Leveraged LLM distillation to improve smaller-sized models for zero-shot NER
    • Produced two peer-reviewed publications — OpenBioNER (NAACL 2025) and ZeroNER (ACL 2025)
    • Contributed to the IBM zshot library for zero-shot NER model inference

About Me

  • Last-year NLP PhD candidate (ending Nov 2026) at UniboNLP, with 9+ publications at ACL, EMNLP, NAACL, and EACL.
  • I specialize in LLM evaluation and knowledge distillation for low-resource NLP, with applications to high-stakes domains like medicine.
  • Active reviewer for ARR and top-tier ML conferences like NeurIPS.

Education

  • 2023 - present

    Bologna, Italy

    PhD
    University of Bologna
    Natural Language Processing
    • Focus: LLMs, RAG, Information Extraction, Benchmarking, Low-resource NLP
    • Supervisor: Prof. Gianluca Moro — Research Group: UniboNLP
  • 2021 - 2023

    Bologna, Italy

    MSc
    University of Bologna
    Artificial Intelligence
    • Grade: 110/110 with honors
  • 2018 - 2021

    Bologna, Italy

    BSc
    University of Bologna
    Computer Science
    • Grade: 110/110 with honors

Selected Publications

  • ACL 2026
    LLMs (Almost) Never Abstain Under Medical Uncertainty

    We introduce MedQAbstain, a benchmark for medical abstention under uncertainty, revealing that state-of-the-art LLMs systematically overcommit, rarely abstaining even when the question itself is hidden.

  • EACL 2026
    ReMedQA: Are We Done With Medical Multiple-Choice Benchmarks?

    We show that high LLM accuracy in medical MCQA masks severe inconsistency. We propose novel metrics to evaluate true reliability across MCQA formats.

  • ACL 2025
    What do you call a dog that is incontrovertibly true? Dogma: Testing LLM Generalization through Humor

    We introduce Phunny, a novel benchmark using uncontaminated English puns, revealing that LLMs struggle with generalization even on simple tasks, consistently underperforming the human baseline.

  • EMNLP 2025
    Can Large Language Models Win the International Mathematical Games?

    We introduce MathGames, a novel multimodal benchmark of age-graded math problems from an international competition, showing that frontier LLMs underperform compared to humans, including 11-year-olds.

  • NAACL 2025
    OpenBioNER: Lightweight Open-Domain Biomedical NER Through Entity Type Description

    We introduce a 110M BERT model that leverages descriptions for zero-shot Biomedical NER, outperforming GPT-4o, specialized LLMs, and GLiNER by up to 10% F1.

  • ACL 2024
    To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering

    We introduce MedGENIE, the first generate-then-read framework for open-domain medical QA, demonstrating the effectiveness of generated over retrieved contexts and significantly improving LLM RAG performance.