Skip to main content

Mithridatium

Overview

Mithridatium is a set of tools that translates research on detecting and prevention of AI poisoning attacks into practical software solutions.The project focuses on identifying hidden backdoors in pretrained image classification models and helping users evaluate model integrity before deployment.

Mithridatium supports multiple backdoor detection methods and provides structured reports with verdicts, metrics, and relevant statistics. The goal is to make advanced AI security research more accessible to developers, researchers, and organizations working with pretrained models.

Basic Information

Project Resources

Core Dev Team

  • Client Dr. Reza Tourani
  • Track: Client-driven Product
  • Current Tech Lead: Pelumi Oluwategbe github
  • Developers:
    • Payton Guffey github
    • Gustavo Lucca github
    • Will Phoenix github

Technical Information

Additional Information

  • Start Date: August 2025
  • Technologies Used:
    • Python
    • PyTorch
    • Hugging Face
    • Streamlit / Gradio
    • AI/ML (poisoning attack detection and prevention)
  • License: MIT
  • Code of Conduct: CODE_OF_CONDUCT.md

Supported Detection Methods

  • FreeEagle: A white-box, data-free defense that analyzes internal model behavior for abnormal class bias.

  • STRIP: A defense that mixes inputs with other images and checks whether the model’s predictions remain unusually stable.

  • MMBD: A method that looks for abnormal dominance patterns across output classes.

  • AEVA: A black-box style defense that perturbs input images and observes how the model’s predictions change.

Development Priorities

  • Research and implement AI poisoning attack detection techniques
  • Translate academic research into practical, reusable software tools
  • Build testing frameworks for evaluating model robustness against poisoning
  • Document detection and prevention best practices

Get Involved

If you would like to contribute to this project, please visit our GitHub page to create your own issues or pull requests.