The article focuses on developing data clustering algorithms using asymmetric similarity measures, which are relevant in tasks involving directed interactions. Two algorithms are proposed: stepwise cluster formation and a modified version with iterative center refinement. Experiments were conducted, including a comparison with the k-medoids method. The results showed that the fixed-center algorithm is efficient for small datasets, while the center-recalculation algorithm provides more accurate clustering. The choice of algorithm depends on the requirements for speed and quality.
Keywords: clustering, asymmetric similarity measures, clustering algorithms, iterative refinement, k-medoids, directed interactions, adaptive methods
This article presents a comprehensive analysis of Russian-language texts utilizing neural network models based on the Bidirectional Encoder Representations from Transformers (BERT) architecture. The study employs specialized models for the Russian language: RuBERT-tiny, RuBERT-tiny2, and RuBERT-base-cased. The proposed methodology encompasses morphological, syntactic, and semantic levels of analysis, integrating lemmatization, part-of-speech tagging, morphological feature identification, syntactic dependency parsing, semantic role labeling, and relation extraction. The application of BERT-family models achieves accuracy rates exceeding 98% for lemmatization, 97% for part-of-speech tagging and morphological feature identification, 96% for syntactic parsing, and 94% for semantic analysis. The method is suitable for tasks requiring deep text comprehension and can be optimized for processing large corpora.
Keywords: BERT, Russian-language texts, morphological analysis, syntactic analysis, semantic analysis, lemmatization, RuBERT, natural language processing, NLP