Cleora is an open-source, high-performance model for creating stable and inductive entity embeddings from large-scale, heterogeneous relational data and hypergraphs. Written in Rust with a Python API, it offers unparalleled speed and scalability for tasks like recommendation systems and graph analytics.

5
Added on: 2025-08-12
Price Type Free
Monthly Traffic: 51.2K

Social Media

| | | | | | | | | | |

Cleora Overview

Cleora is a general-purpose, open-source model developed by the Synerise.com team, designed for the efficient and scalable learning of entity embeddings from complex, heterogeneous relational data. It excels at transforming entities and their interactions—such as products in a shopping cart, users on a social network, or proteins in a biological system—into meaningful numerical vectors. These vectors, or embeddings, capture the underlying relationships and similarities, making them invaluable for downstream machine learning tasks.

Built with a high-performance Rust core and exposed through a user-friendly Python package (pycleora), Cleora achieves processing speeds that are orders of magnitude faster than traditional methods like DeepWalk or PyTorch-BigGraph. It operates on the principle of iterative random projections on a Markov transition matrix derived from the data, a method that avoids the noise and inefficiency of negative sampling. This allows it to process extremely large graphs and hypergraphs on a single machine, a significant advantage for real-world applications.

How to use Cleora

Using Cleora is straightforward for developers and data scientists familiar with Python. The process generally involves these steps:

  1. Installation: Install the Python package directly using pip: pip install pycleora.
  2. Data Preparation: Structure your data as a series of hyperedges. A hyperedge is a group of co-occurring entities. For example, a line in your input file could represent all products bought in a single transaction, separated by spaces. This can be prepared from a pandas DataFrame or any Python iterator.
  3. Matrix Creation: Use the SparseMatrix.from_iterator() function to convert your prepared data into a sparse Markov transition matrix. This matrix represents the relationships within your hypergraph.
  4. Embedding Initialization: You can either let Cleora initialize the embedding vectors deterministically or provide your own initial vectors. This unique feature allows you to incorporate external information, such as embeddings from text (e.g., Sentence-BERT) or images (e.g., ViT), into the graph structure.
  5. Propagation: Perform a few iterations of Markov propagation using mat.left_markov_propagate(embeddings). Typically, 3 to 7 iterations are sufficient. Fewer iterations capture direct co-occurrence, while more iterations capture deeper, contextual similarity.
  6. Normalization: Normalize the resulting embedding vectors, usually with an L2 norm, to ensure they reside on a hypersphere. This makes them comparable using cosine similarity or dot product.
  7. Usage: The final normalized vectors are your entity embeddings, ready to be used for recommendation, classification, clustering, or similarity search tasks.

Core Features of Cleora

  • Extreme Performance: Written in Rust and optimized for concurrency and cache coherence, making it exceptionally fast.
  • Scalability: Capable of embedding extremely large graphs and hypergraphs with billions of edges on a single commodity machine.
  • Inductive Learning: Can generate embeddings for new, previously unseen entities on-the-fly without retraining the entire model, effectively solving the cold-start problem.
  • Stable & Deterministic: Unlike methods like Node2vec, Cleora produces the same embeddings for the same input data across multiple runs, ensuring reproducibility and stability.
  • Hypergraph Support: Natively handles hypergraphs (e.g., products in a basket, users in a group), which is more powerful than simple pairwise graph decomposition.
  • Python Integration: Offers a seamless Python API (pycleora) with deep integration with NumPy for easy use in data science workflows.
  • Custom Initialization: Allows users to initialize embeddings with vectors from other sources (e.g., text, image models), enabling multi-modal analysis.

Use Cases for Cleora

Cleora's versatility makes it suitable for a wide range of applications across various industries:

  • E-commerce: Creating powerful product embeddings for recommendation systems (e.g., 'customers who bought this also bought...'), product similarity, and basket analysis.
  • Social Network Analysis: Embedding users and content to identify communities, predict connections, and recommend content.
  • Bioinformatics: Analyzing interactions between proteins, drugs, and genes by embedding them based on co-occurrence in biological pathways.
  • Financial Services: Detecting fraudulent activity by identifying unusual patterns in transaction graphs.
  • Academic Research: Analyzing co-authorship networks to discover research communities and influential authors.

Advantages of Cleora

Cleora stands out from other embedding frameworks due to several key advantages:

  • Unmatched Speed: It is significantly faster (e.g., over 190x faster than DeepWalk in benchmarks) than many popular alternatives.
  • Production-Ready: Its stability, inductivity, and real-time updatability make it ideal for deployment in live production environments.
  • High-Quality Embeddings: The method of explicit random walks on a full transition matrix, without negative sampling, leads to higher-quality and more accurate embeddings.
  • Resource Efficiency: It is designed to run efficiently on a single machine, reducing the need for expensive distributed computing clusters.
  • Simplicity and Flexibility: The model is conceptually simple yet powerful, offering flexibility in data input and embedding initialization.

Pricing and Plans

Cleora is a fully open-source project released under the MIT License. This means it is completely free to use for both academic and commercial purposes. There are no paid plans or hidden costs. The source code is publicly available on GitHub for anyone to use, inspect, or contribute to.

Cleora Comments (0)

No comments yet, be the first to comment!

Log in to post comments

Log in now

Cleora Alternatives

View All
Streamlit

Streamlit

Streamlit is an open-source Python framework that enables developers and data scientists to build and share beautiful, custom …

865.5K
Free
Fast.ai

Fast.ai

Fast.ai is a research institute dedicated to making deep learning accessible to everyone. It offers free courses, an …

402.8K
Free
Gradio

Gradio

Gradio is an open-source Python library that allows you to quickly build and share user-friendly web interfaces for …

239.4K
marimo

marimo

marimo is an open-source reactive Python notebook for modern data science and AI. It offers a reproducible, Git-friendly, …

173.8K
Free
TensorFlow

TensorFlow

TensorFlow is an end-to-end open-source platform for machine learning developed by Google. It provides a comprehensive, flexible ecosystem …

737.9K
Rerun

Rerun

Rerun is an open-source data stack for Physical AI, providing powerful logging and visualization tools for multimodal, time-series …

59.8K
MOSTLY AI

MOSTLY AI

MOSTLY AI is a Data Intelligence Platform that specializes in generating high-quality, privacy-safe synthetic data. It enables organizations …

59.6K
Free
Metaflow

Metaflow

A human-centric Python framework, originally from Netflix, for building and managing real-life data science, ML, and AI projects. …

20.3K
Free
Flower

Flower

Flower is a friendly, open-source framework for federated learning, analytics, and evaluation. It enables training AI models on …

71.1K
Eventual

Eventual

Eventual is building the future of data infrastructure with Daft, a high-performance, open-source query engine for multimodal data. …

8.6K

Cleora Embed Feature

Just copy the embed code below and paste this beautiful badge on your blog, article, or official app website to drive traffic directly to this tool's detail page and quickly boost your exposure and user count!

ToolMage
ToolMage
FOLLOW US ON
88
How to install?
Link copied to clipboard!