Sciact
  • EN
  • RU

How to stop worrying and love multiple citation experimental data Full article

Journal Mendeleev Communications
ISSN: 1364-551X , E-ISSN: 0959-9436
Output data Year: 2025, Volume: 35, Number: 2, Pages: 224-227 Pages count : 4 DOI: 10.71267/mencom.7710
Authors Timofeev Yaroslav Vladislavovich 1,2 , Mrasov Amir M. 1,2 , Panova Maria V 2 , Novikov Fedor Nikolaevich 2 , Svitanko Igor 2
Affiliations
1 Department of Chemistry, M. V. Lomonosov Moscow State University, 119991 Moscow, Russian Federation
2 N. D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, 119991 Moscow, Russian Federation

Abstract: Numerous public databases now collect and disseminate biological activity data from literature and patents, forming the basis for chemogenomics and novel scoring functions. However, data quality is often compromised due to multiple citations of values across different studies with varying protocols. To address this issue, we used the XGBoost model in combination with a BERT-based NLP approach and a distance-based out-of-distribution (OOD) data detection method to enhance classification accuracy and exclude review articles.
Cite: Timofeev Y.V. , Mrasov A.M. , Panova M.V. , Novikov F.N. , Svitanko I.
How to stop worrying and love multiple citation experimental data
Mendeleev Communications. 2025. V.35. N2. P.224-227. DOI: 10.71267/mencom.7710 WOS Scopus OpenAlex
Identifiers:
Web of science: WOS:001506165900015
Scopus: 2-s2.0-105007460397
OpenAlex: W4407709724
Citing:
DB Citing
OpenAlex Нет цитирований
Scopus Нет цитирований
Altmetrics: