Trismik
Compare 50+ LLMs on your own data in minutes. Make evidence-based model decisions on quality, cost, and speed …
Compare 50+ LLMs on your own data in minutes. Make evidence-based model decisions on quality, cost, and speed without guesswork.
Hot100
Hot100 is a dynamic weekly chart showcasing the most innovative and useful AI-built projects. It provides a merit-based …
Hot100 is a dynamic weekly chart showcasing the most innovative and useful AI-built projects. It provides a merit-based leaderboard, evaluated by an AI judge named Flambo, focusing on genuine utility and groundbreaking ideas rather than marketing hype. Discover new trends, submit your creations, and engage with the vibrant AI builder community.
AIGRADE
AIGRADE offers independent evaluation, scoring, and certification for AI systems, focusing on reliability, transparency, and trust. Aligned with …
AIGRADE offers independent evaluation, scoring, and certification for AI systems, focusing on reliability, transparency, and trust. Aligned with ISO/IEC 23894, it provides a third-party, SOC2-friendly audit process to help businesses build trustworthy and compliant AI.
Scorecard
Scorecard is an end-to-end platform for evaluating, optimizing, and deploying enterprise AI agents. It helps teams replace subjective …
Scorecard is an end-to-end platform for evaluating, optimizing, and deploying enterprise AI agents. It helps teams replace subjective testing with structured evaluations, providing tools for continuous monitoring, prompt management, and performance metrics to build trustworthy and reliable AI applications with confidence.
Unify
Unify is a developer-centric LLMOps platform designed to simplify building, monitoring, and optimizing AI applications. It provides a …
Unify is a developer-centric LLMOps platform designed to simplify building, monitoring, and optimizing AI applications. It provides a universal API and a hackable framework for logging, evaluation, tracing, and managing AI agents, enabling developers to create custom workflows and interfaces with ease.
LastMile AI
LastMile AI is an enterprise-grade developer platform for testing, evaluating, and monitoring generative AI applications. It provides tools …
LastMile AI is an enterprise-grade developer platform for testing, evaluating, and monitoring generative AI applications. It provides tools like AutoEval for custom evaluator fine-tuning, synthetic data generation, and real-time monitoring to ensure AI systems are reliable and production-ready.
Openlayer
Openlayer is an enterprise-grade platform for AI evaluation and observability. It empowers teams to test, monitor, and govern …
Openlayer is an enterprise-grade platform for AI evaluation and observability. It empowers teams to test, monitor, and govern both traditional machine learning models and large language models (LLMs) throughout their entire lifecycle, from development to production, ensuring reliability and compliance.
Rival
Rival is a unique AI model comparison platform that focuses on "vibe" rather than just benchmarks. It allows …
Rival is a unique AI model comparison platform that focuses on "vibe" rather than just benchmarks. It allows users to intuitively compare leading models like GPT, Gemini, and Claude through side-by-side duels, response galleries, and historical evolution tracking. Discover the distinct personalities, creative styles, and reasoning approaches of different AIs to find the perfect model for your specific task, moving beyond quantitative scores to a qualitative, hands-on experience.
Vellum AI
Vellum AI is an end-to-end enterprise platform for building, evaluating, and deploying mission-critical AI agents and applications. It …
Vellum AI is an end-to-end enterprise platform for building, evaluating, and deploying mission-critical AI agents and applications. It provides a unified environment for orchestration, prompt engineering, RAG, evaluation, and monitoring, enabling teams to build reliable AI solutions 10x faster.
Coxwave Align
Coxwave Align is a powerful analytics engine designed for generative AI products. It enables businesses to monitor, analyze, …
Coxwave Align is a powerful analytics engine designed for generative AI products. It enables businesses to monitor, analyze, and evaluate LLM-based conversational applications like chatbots. The platform provides actionable insights to improve performance, reduce hallucinations, and enhance overall user experience and product quality.
FutureAGI
FutureAGI is a comprehensive LLM observability and evaluation platform designed for enterprises and developers. It helps build, evaluate, …
FutureAGI is a comprehensive LLM observability and evaluation platform designed for enterprises and developers. It helps build, evaluate, and improve AI applications to achieve up to 99% accuracy, offering tools for synthetic data generation, no-code experimentation, multimodal evaluation, and real-time production monitoring.
Humanloop
Humanloop is an enterprise-grade LLM evaluation and observability platform. It provides a comprehensive suite of tools for developing, …
Humanloop is an enterprise-grade LLM evaluation and observability platform. It provides a comprehensive suite of tools for developing, evaluating, and monitoring AI applications, enabling teams to ship and scale reliable AI products with confidence. It fosters collaboration between engineers, product managers, and domain experts through both code-first and UI-first workflows.
LMArena
LMArena is an open, crowdsourced platform from UC Berkeley researchers for evaluating and comparing leading AI models. Users …
LMArena is an open, crowdsourced platform from UC Berkeley researchers for evaluating and comparing leading AI models. Users anonymously test two models side-by-side, vote for the best response, and contribute to a dynamic, public leaderboard. It aims to make AI progress transparent and grounded in real-world human feedback.
Arize
Arize is an AI & Agent Engineering Platform designed for development, observability, and evaluation. It provides a unified …
Arize is an AI & Agent Engineering Platform designed for development, observability, and evaluation. It provides a unified solution for teams to build, monitor, debug, and improve LLM and ML models faster. By closing the loop between development and production, Arize helps ensure AI systems are reliable, trustworthy, and high-performing at scale.