Skip to main content

Enhancing Bioinformatics Research through LLM

Overview

This project aims to develop a tool leveraging large language models (LLMs) to recommend bioinformatics APIs and detect API usage errors in research code. The tool will specifically address the challenges of proper API usage in bioinformatics, where incorrect data handling can lead to faulty conclusions and, in severe cases, affect patient outcomes. The LLM will be trained on bioinformatics codebases, ensuring the tool can understand and guide correct API usage. The end goal is to improve research reliability, accelerate discoveries, and enhance data-driven health solutions.

Information

User Guide

This tool is designed to assist researchers by recommending appropriate APIs, detecting coding errors, and suggesting real-time corrections. It enhances the accuracy and efficiency of bioinformatics research by leveraging advanced language models.

Phase 1: LLM Fine-Tuning and Dataset Preparation Phase 2: API Integration and Error Detection Prototyping Phase 3: UI Development and User Testing Phase 4: Performance Optimization and Scalability Testing Phase 5: Security Audit and Compliance Check Phase 6: Final Testing, Documentation, and Deployment

Technical Information

Technical Overview(Might Be Changed)

Programming Languages: Python: Core language for developing the LLM model, integrating APIs, implementing error detection algorithms, and building the user interface. Recommended version: 3.11. Libraries and Frameworks:

AI/ML Frameworks: PyTorch (2.0.1 or higher) or TensorFlow (2.12.0 or higher): For training and fine-tuning the large language model. Transformers (Hugging Face 4.30.0 or higher): To utilize and customize pre-trained models like GPT-3, GPT-4, or BERT for bioinformatics tasks.

API Integration: Requests (2.31.0 or higher): For managing HTTP requests to bioinformatics APIs (e.g., NCBI, Ensembl, UCSC Genome Browser).

Web Development and User Interface: React.js (18.2.0 or higher): To build a dynamic, responsive, and interactive front-end for user interaction.

Databases and Storage: MongoDB (6.0.6 or higher): For storing user data, API metadata, and research outputs.

Get Involved

Contributing to the LLM-Based API Recommendation Tool project is a valuable opportunity to make an impact in bioinformatics research