This work is my final thesis for an undergraduate degree in computer science at the Faculty of Electrical Engineering and Computing. I have applied the knowledge and skills acquired during my studies to the development of this thesis.
Antibiotic resistance is a current problem in medicine. One of the ways to identify and classify resistance is through artificial intelligence technologies such as language models and machine learning. In this study, I utilized a publicly available dataset of resistance gene sequences from the CARD database and used a protein language model to create numerical representations of these genes. I trained two XGBoost machine learning models. One model was trained to identify, and the other to classify genes that confer resistance. The trained model can identify with 97 % accuracy whether a gene causes resistance, and if the gene causes resistance, the model can determine with 97 % accuracy to which antibiotic the gene is resistant. Such a model could find a place in practice as an aid in selecting treatment.