|Original author(s)||Radim Řehůřek|
|Developer(s)||RARE Technologies Ltd.|
3.8.0 / 8 July 2019
|Operating system||Linux, Windows, macOS|
Gensim is implemented in Python and Cython. Gensim is designed to handle large text collections using data streaming and incremental online algorithms, which differentiates it from most other machine learning software packages that target only in-memory processing.
Gensim includes streamed parallelized implementations of fastText, word2vec and doc2vec algorithms, as well as latent semantic analysis (LSA, LSI, SVD), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), tf-idf and random projections.
Uses of Gensim
Gensim has been used and cited in over 1400 commercial and academic applications as of 2018, in a diverse array of disciplines from medicine to insurance claim analysis to patent search. The software has been covered in several new articles, podcasts and interviews.
Free and commercial support
- Scalable *2vec training
- Deep learning with word2vec and Gensim
- Radim Řehůřek and Petr Sojka (2010). Software framework for topic modelling with large corpora. Proc. LREC Workshop on New Challenges for NLP Frameworks
- Řehůřek, Radim (2011). "Scalability of Semantic Analysis in Natural Language Processing" (PDF). Retrieved 27 January 2015.
my open-source gensim software package that accompanies this thesis
- Gensim academic citations
- Commercial adopters of Gensim
- Podcast.__init__ episode #71 on Gensim
- Interview with Radim Řehůřek, creator of Gensim
- Gensim source code on Github
- Gensim mailing list on Google Groups
- Gensim chat room on Gitter
- Gensim open source Incubator