By Bobby Jefferson on Wednesday, 06 November 2024
Category: Tech News

The best open-source AI models: All your free-to-use options explained

Jackie Niam/Getty Images

Generative AI (Gen AI) has advanced significantly since its public launch two years ago. The technology has led to transformative applications that can create text, images, and other media with impressive accuracy and creativity. 

Also: We have an official open-source AI definition now

Open-source generative models are valuable for developers, researchers, and organizations wanting to leverage cutting-edge AI technology without incurring high licensing fees or restrictive commercial policies. Let's find out more.

Open-source vs. proprietary models

Open-source AI models offer several advantages, including customization, transparency, and community-driven innovation. These models allow users to tailor them to specific needs and benefit from ongoing enhancements. Additionally, they typically come with licenses that permit both commercial and non-commercial use, which enhances their accessibility and adaptability across various applications.

Also: The best free AI courses in 2024

However, open-source solutions are not always the best choice. In industries that demand strict regulatory compliance, data privacy, and specialized support, proprietary models often perform better. They provide stronger legal frameworks, dedicated customer support, and optimizations tailored to industry requirements. Closed-source solutions may also excel in highly specialized tasks, thanks to exclusive features designed for high performance and reliability.

When organizations require real-time updates, advanced security, or specialized functionalities, proprietary models can offer a more robust and secure solution, effectively balancing openness with the rigorous demands for quality and accountability.

The Open Source AI Definition

The Open Source Initiative (OSI) recently introduced the Open Source AI Definition (OSAID) to clarify what qualifies as genuinely open-source AI. To meet OSAID standards, a model must be fully transparent in its design and training data, enabling users to recreate, adapt, and use it freely. 

Also: Can AI even be open source? It's complicated

However, some popular models, including Meta's LLaMA and Stability AI's Stable Diffusion, have licensing restrictions or lack transparency around training data, preventing full compliance with OSAID.

As part of the OSAID validation process, OSI assessed the following:

Compliant models: Pythia (Eleuther AI), OLMo (AI2), Amber and CrystalCoder (LLM360), and T5 (Google). Potentially compliant models: Bloom (BigScience), Starcoder2 (BigCode), and Falcon (TII) could meet OSAID standards with minor adjustments to licensing terms or transparency. Non-compliant models: LLaMA (Meta), Grok (X/Twitter), Phi (Microsoft), and Mixtral (Mistral) lack the necessary transparency or impose restrictive licensing terms.

LLaMA and other non-compliant architectures

The Meta LLaMA architecture exemplifies noncompliance with OSAID due to its restrictive research-only license and lack of full transparency about training data, limiting commercial use and reproducibility. Derived models, like Mistral's Mixtral and the Vicuna Team's MiniGPT-4, inherit these restrictions, propagating LLaMA's noncompliance across additional projects.

Also: Want to work in AI? How to pivot your career in 5 steps

Beyond LLaMA-based models, other widely used architectures face similar issues. For example, Stability Diffusion by Stability AI employs the Creative ML OpenRAIL-M license, which includes ethical restrictions that deviate from OSAID's requirements for unrestricted use. Similarly, Grok by xAI combines proprietary elements with usage limitations, challenging its alignment with open-source ideals.

These examples underscore the difficulty of meeting OSAID's standards, as many AI developers balance open access with commercial and ethical considerations.

Implications for organizations: OSAID compliance vs. non-compliance

Choosing OSAID-compliant models gives organizations transparency, legal security, and full customizability features essential for responsible and flexible AI use. These compliant models adhere to ethical practices and benefit from strong community support, promoting collaborative development. 

In contrast, non-compliant models may limit adaptability and rely more heavily on proprietary resources. For organizations that prioritize flexibility and alignment with open-source values, OSAID-compliant models are advantageous. However, non-compliant models can still be valuable when proprietary features are required.

Understanding licensing in open-source AI models

Open-source AI models are released under licenses that define usage, modification, and sharing conditions. While some licenses align with traditional open-source standards, others incorporate restrictions or ethical guidelines that prevent full OSAID compliance. Key licenses include:

Apache 2.0: A permissive license that allows free use, modification, and distribution, along with a patent grant. Apache 2.0 is OSI-approved and popular for open-source projects, providing flexibility and legal protection. MIT: Another permissive license that only requires attribution for reuse. Like Apache 2.0, MIT is OSI-approved, widely adopted, and offers simplicity and minimal restrictions. Creative ML OpenRAIL-M: A license designed for AI applications, allowing broad use but imposing ethical guidelines to prevent harmful use. OpenRAIL-M is not OSI-approved because it includes usage restrictions that conflict with the OSI's principles of unrestricted freedom. However, it is valued by developers aiming to prioritize ethical use in AI. CC BY-SA: The Creative Commons Share-Alike license permits free use and requires derivative works to remain open source. While it encourages open collaboration, it's not OSI-approved and is more commonly used for content rather than code, as it lacks some flexibility for software applications. CC BY-NC 4.0: A Creative Commons license that permits free use with attribution but restricts commercial applications. This license, used for certain model weights (like Meta's MusicGen and AudioGen), limits the models' usability in commercial environments and does not align with OSI's open-source standards. Custom licenses: Many models on our list, such as IBM's Granite and Nvidia's NeMo, operate under proprietary or custom licenses. These models often impose specific conditions for use or modify traditional open-source terms to align with commercial goals, making them non-compliant with open-source principles. Research-only licenses: Certain models, such as Meta's LLaMA and Codellama series, are available only under research-use terms. These licenses restrict use to academic or non-commercial purposes and prevent broad community-driven projects, as they do not meet OSI's open-source criteria.

Requirements for running open-source AI models

Running open-source Gen AI models requires specific hardware, software environments, and toolsets for model training, fine-tuning, and deployment tasks. High-performance models with billions of parameters benefit from powerful GPU setups like Nvidia's A100 or H100. 

Also: How open source attracts some of the world's top innovators

Essential environments typically include Python and machine learning libraries like PyTorch or TensorFlow. Specialized toolsets, including Hugging Face's Transformers library and Nvidia's NeMo, simplify the processes of fine-tuning and deployment. Docker helps maintain consistent environments across different systems, while Ollama allows for the local execution of large language models on compatible systems. 

The following chart highlights essential toolsets, recommended hardware, and their specific functions for managing open-source AI models:

Toolset

Purpose

Requirements

Use

Python

Primary programming environment

N/A

Essential for scripting and configuring models

PyTorch

Model training and inference

GPU (e.g., Nvidia A100, H100)

Widely used library for deep learning models

TensorFlow

Model training and inference

GPU (e.g., Nvidia A100, H100)

Alternative deep learning library

Hugging Face Transformers

Model deployment and fine-tuning

GPU (preferred)

Library for accessing, fine-tuning, and deploying models

Nvidia NeMo

Multimodal model support and deployment

Nvidia GPUs

Optimized for Nvidia hardware and multimodal tasks

Docker

Environment consistency and deployment

Supports GPUs

Containerizes models for easy deployment

Ollama

Running large language models locally

macOS, Linux, Windows, supports GPUs

Platform to run LLMs locally on compatible systems

LangChain

Building applications with LLMs

Python 3.7+

Framework for composing and deploying LLM-powered applications

LlamaIndex

Connecting LLMs with external data sources

Python 3.7+

Framework for integrating LLMs with data sources

This setup establishes a robust framework for efficiently managing Gen AI models, from experimentation to production-ready deployment. Each tool set possesses unique strengths, enabling developers to tailor their environments for specific project needs.

Choosing the right model

Selecting the right gen AI model depends on several factors, including licensing requirements, desired performance, and specific functionality. While larger models tend to deliver higher accuracy and flexibility, they require substantial computational resources. Smaller models, on the other hand, are more suitable for resource-constrained applications and devices.

Also: IBM will train you in AI fundamentals for free, and give you a skill credential - in 10 hours

It's important to note that most models listed here, even those with traditionally open-source licenses like Apache 2.0 or MIT, do not meet the Open Source AI Definition (OSAID). This gap is primarily due to restrictions around training data transparency and usage limitations, which OSAID emphasizes as essential for true open-source AI. However, certain models, such as Bloom and Falcon, show potential for compliance with minor adjustments to their licenses or transparency protocols and may achieve full compliance over time.

The tables below provide an organized overview of the leading open-source generative AI models, categorized by type, issuer, and functionality, to help you choose the best option for your needs, whether a fully transparent, community-driven model or a high-performance tool with specific features and licensing requirements.

Language models

Language models are crucial in text-based applications such as chatbots, content creation, translation, and summarization. They are fundamental to natural language processing (NLP) and continually improve their understanding of language structure and context. 

Notable models include Meta's LLaMA, EleutherAI's GPT-NeoX, and Nvidia's NVLM 1.0 family, each known for their unique strengths in multilingual, large-scale, and multimodal tasks.

Issuer & Model Parameter Sizes License Highlights
Google T5 Small to XXL Apache 2.0 High-performance language model, OSAID Compliant
EleutherAI Pythia Various Apache 2.0 Interpretability-focused, OSAID Compliant
Allen Institute for AI (AI2) OLMo Various Apache 2.0 Open language research model, OSAID Compliant
BigScience BLOOM 176B OpenRAIL-M Multilingual, responsible AI, OSAID Potential
BigCode Starcoder2 Various Apache 2.0 Code generation, OSAID Potential
TII Falcon 7B, 40B Apache 2.0 Efficient and high-performance, OSAID Potential
AI21 Labs Jamba Series Mini to Large Custom Language and chat generation
AI Singapore Sea-Lion 7B Custom Language and cultural representation
Alibaba Qwen Series 7B Custom Bilingual model (Chinese, English)
Databricks Dolly 2.0 12B CC BY-SA 3.0 Open dataset, commercial use
EleutherAI GPT-J 6B Apache 2.0 General-purpose language model
EleutherAI GPT-NeoX 20B MIT Large-scale text generation
Google Gemma 2 2B, 9B, 27B Apache 2.0 Language and code generation
IBM Granite Series 3B, 8B Custom Summarization, classification, RAG
Meta LLaMA 3.2 1B to 405B Research-only Advanced NLP, multilingual
Microsoft Phi-3 Series Mini to Medium MIT Reasoning, cost-effective
Mistral AI Mixtral 8x22B 8x22B Apache 2.0 Sparse model, efficient reasoning
Mistral AI Mistral 7B 7B Apache 2.0 Dense, multilingual text generation
Nvidia NVLM 1.0 Family 72B Custom High-performance multimodal LLM
Rakuten RakutenAI Series 7B Custom Multilingual chat, NLP
xAI Grok-1 314B Apache 2.0 Large-scale language model

Image generation models

Image generation models create high-quality visuals or artwork from text prompts, which makes them invaluable for content creators, designers, and marketers. 

Stability AI's Stable Diffusion is widely adopted due to its flexibility and output quality, while DeepFloyd's IF emphasizes generating realistic visuals with an understanding of language.

Issuer & Model Parameter Sizes License Highlights
Stability AI Stable Diffusion 3.5 2.5B to 8B OpenRAIL-M High-quality image synthesis
DeepFloyd IF 400M to 4.3B Custom Realistic visuals with language comprehension
OpenAI DALL-E 3 Not disclosed Custom State-of-the-art text-to-image synthesis
Google Imagen Not disclosed Custom High-fidelity image generation from text
Midjourney Not disclosed Custom Artistic and stylized image generation
Adobe Firefly Not disclosed Custom Integrated AI image generation within Adobe products

Vision models

Vision models analyze images and videos, supporting object detection, segmentation, and visual generation from text prompts. 

Also: How Claude's new AI data analysis tool compares to ChatGPT's version (hint: it doesn't)

These technologies benefit several industries, including healthcare, autonomous vehicles, and media.

Issuer & Model Parameter Sizes License Highlights
Meta SAM 2.1 38.9M to 224.4M Apache 2.0 Video editing, segmentation
NVIDIA Consistency Not disclosed Custom Character consistency across video frames
NVIDIA VISTA-3D Not disclosed Custom Medical imaging, anatomical segmentation
NVIDIA NV-DINOv2 Not disclosed Non-commercial Image embedding generation
Google DeepLab Not disclosed Apache 2.0 High-quality semantic image segmentation
Microsoft Florence 0.23B, 0.77B MIT General-purpose visual model for computer vision
OpenAI CLIP 400M MIT Text and image comprehension

Audio models

Audio models process and generate audio data, enabling speech recognition, text-to-speech synthesis, music composition, and audio enhancement.

Issuer & Model Sizes License Highlights
Coqui.ai TTS N/A MPL 2.0 Text-to-speech synthesis, multi-language support
ESPnet ESPnet N/A Apache 2.0 End-to-end speech processing toolkit
Facebook AI wav2vec 2.0 Base (95M), Large (317M) Apache 2.0 Self-supervised speech recognition
Hugging Face Transformers (Speech Models) Various Apache 2.0 Collection of ASR and TTS models
Magenta MusicVAE N/A Apache 2.0 Music generation and interpolation
Meta MusicGen N/A MIT / CC BY-NC 4.0 Music generation from text prompts
Meta AudioGen N/A MIT / CC BY-NC 4.0 Sound effect generation from text prompts
Meta EnCodec N/A MIT / CC BY-NC 4.0 High-quality audio compression
Mozilla DeepSpeech N/A MPL 2.0 End-to-end speech-to-text engine
NVIDIA NeMo (Speech Models) Various Apache 2.0 ASR and TTS models optimized for Nvidia GPUs
OpenAI Jukebox N/A MIT Neural music generation with genre/artist conditioning
OpenAI Whisper 39M to 1.6B MIT Multilingual speech recognition and transcription
TensorFlow TFLite Speech Models N/A Apache 2.0 Speech recognition models optimized for mobile devices

Multimodal models

Multimodal models combine text, images, audio, and other data types to create content from various inputs. 

Also: How AI hallucinations could help create life-saving antibiotics

These models are effective in applications requiring language, visual, and sensory understanding.

Model Name Parameter Sizes License Highlights
Allen Institute for AI (AI2) Molmo 1B, 70B Apache 2.0 A multimodal AI model that processes text and visual inputs, OSAID-compliant
Meta ImageBind N/A Custom Integrates six data types: text, images, audio, depth, thermal, and IMU.
Meta SeamlessM4T N/A Custom Provides multilingual translation and transcription services.
Meta Spirit LM N/A Custom Combines text and speech to produce natural-sounding outputs.
Microsoft Florence-2 0.23B, 0.77B MIT Handles computer vision and language tasks proficiently.
NVIDIA VILA N/A Custom Processes vision-language tasks effectively.
OpenAI CLIP 400M MIT Excels in text and image comprehension.
Vicuna Team MiniGPT-4 13B Apache 2.0 Capable of understanding both text and images.

Retrieval-augmented generation (RAG)

RAG models merge generative AI with information retrieval, allowing them to incorporate relevant data from extensive datasets into their responses.

Issuer & Model Parameter Sizes License Highlights
BAAI BGE-M3 N/A Custom Dense and sparse retrieval optimization
IBM Granite 3.0 Series 3B, 8B Custom Advanced retrieval, summarization, RAG
Nvidia EmbedQA & ReRankQA 1B Custom Multilingual QA, GPU-accelerated retrieval

Specialized models

Specialized models are optimized for specific fields, such as programming, scientific research, and healthcare, offering enhanced functionality tailored to their domains.

Issuer & Model Parameter Sizes License Highlights
Meta Codellama Series 7B, 13B, 34B Custom Code generation, multilingual programming
Mistral AI Mamba-Codestral 7B Apache 2.0 Focused on coding and multilingual capabilities
Mistral AI Mathstral 7B Apache 2.0 Specialized in mathematical reasoning

Guardrail models

Guardrail models ensure safe and responsible outputs by detecting and mitigating biases, inappropriate content, and harmful responses.

Issuer & Model Parameter Sizes License Highlights
NVIDIA NeMo Guardrails N/A Apache 2.0 Open-source toolkit for adding programmable guardrails
Google ShieldGemma 2B, 9B, 27B Custom Safety classifier models built on Gemma 2
IBM Granite-Guardian 8B Custom Detects unethical or harmful content

Choose open-source models

The landscape of generative AI is evolving rapidly, with open-source models crucial for making advanced technology accessible to all. These models allow for customization and collaboration, breaking down barriers that have limited AI development to large corporations.

Also: 4 ways to turn generative AI experiments into real business value

Developers can tailor solutions to their needs by choosing open-source Gen AI, contributing to a global community, and accelerating technological progress. The variety of available models -- from language and vision to safety-focused designs -- ensures options for almost any application.

Supporting open-source AI communities will be essential for promoting ethical and innovative AI developments, benefiting individual projects, and advancing technology responsibly.

Original link
Leave Comments