Аннотация:There are about three thousand known viral species, and several dozens of them pose a significant threat to the human health. More than 50 drugs are available for the treatment of diseases caused by well-studied viruses, such as human immunodeficiency virus, hepatitis C virus, herpes simplex virus, etc. However, the occurrence of resistant strains justifies the search for new drugs to replace the existing ones. On the other hand, broad-spectrum antivirals acting on a range of rare viral species like Ebola virus, Zika virus, dengue virus are urgently needed, since the development of individual drugs or vaccines against them is economically unbeneficial. Thus, the exploration of new antiviral compounds is an important task. The volume of antiviral bioactivity information is increasing rapidly. Publicly available resources such as ChEMBL, PubChem BioAssays, etc., made the access to this knowledge easy. Nevertheless, these data have not yet been analyzed thoroughly. Careful analysis of available information is required to clarify structure-activity relationships and to explore activity landscapes in the search for novel scaffolds. We have focused on public bioactivity database ChEMBL containing medicinal chemistry data extracted mainly from scientific literature, including binding, functional and ADMET information for drug-like compounds. As in all large databases, some data in ChEMBL are poorly annotated, thus complicating information retrieval and thorough analysis. Hence, to cover as many compounds as possible, advanced search techniques and specific annotations are required. The mySQL edition of ChEMBL_20 release was used for data analysis in our study. Four database fields ( assays.description, assays.assay_organism, target_dictionary.organism and target_dictionary.pref_name ) that contain information about compound’s organism of bioassay and target organism were used in search for antiviral bioactivity information. Two approaches were exploited to find relevant values in the fields. The first one was based on the lists of allowed values, selected manually by excerpting full lists of target_organism and assay_organism values in ChEMBL_20. Substrings’ dictionary containing official and historical viral species names, alternative species names and their abbreviations, and other words related to antiviral activity was applied to the assay_description field in the second approach. These search techniques led to the 1.5-fold increase of the number of extracted antiviral bioactivities compared to ChEMBL Web interface. The amount of data about compounds tested on antiviral activity having erroneous or missing annotations was estimated. To standardize the data annotation, a special procedure was designed. Viral species names were annotated according to the latest official ICTV taxonomy release. The f inal version of our database contains more than 145,000 compounds and about 346,000 bioactivity values. Visualization of chemical space and analysis of physicochemical descriptors distribution revealed the specific features of antiviral compounds in comparison with the whole ChEMBL database.