Аннотация:Probabilistic topic modeling of text collections is a powerful
tool for statistical text analysis based on the preferential
use of graphical models and Bayesian learning. Additive
regularization for topic modeling (ARTM) is a recent semiprobabilistic
approach, which provides a simpler inference
for many models previously studied only in the Bayesian
settings. ARTM reduces barriers to entry into topic modeling
research field and facilitates combination of topic models.
In this paper we develop the multimodal extension of
ARTM approach and implement it in BigARTM open source
project for online parallelized topic modeling. We demonstrate
the ability of non-Bayesian regularization to combine
modalities, languages and multiple criteria to find sparse,
diverse, and interpretable topics.