Master thesis

Semi-automatic text classification at MSF : methods, challenges, and recommendations

SONAR|HES-SO

  • Genève : Haute école de gestion de Genève

45 p.

Master of Science HES-SO en Sciences de l'information: Haute école de gestion de Genève, 2024

English The current work delineates the efforts taken to semi-automatically index the
Humanitarian Representation Team (HRT) public library of the international office of
Médecins Sans Frontières (MSF) using multi-label text classification. Supervised machine learning techniques were employed to tackle this task. Several models including Decision Tree, Random Forest, k-Nearest Neighbors, Multinomial Naïve Bayes, Support Vector Machine were evaluated on the above-mentioned library. Among these k-Nearest Neighbors achieved the best results with an F1-macro score of 0.80, macro precision of 0.96 and macro recall of 0.72. Hyper-parameter tuning and threshold adjustment were
also done to optimize the model’s performance. The challenges associated with multilabel classification, namely scarcity of labeled datasets, and class imbalance are explored and practical solutions in the context of MSF’s needs are suggested. Recommendations for implementing semi-automatic indexing at MSF are provided, considering the challenges and the results obtained.
Language
  • English
Classification
Information, communication and media sciences
Notes
  • Haute école de gestion Genève
  • Information documentaire
  • hesso:hegge
Persistent URL
https://sonar.ch/global/documents/330713
Statistics

Document views: 29 File downloads:
  • SCHWANDER_TM_2024.pdf: 47