Benchmark

Claude Code is 4.2x faster & 3.2x cheaper with CustomGPT.ai plugin. See the report →

CustomGPT.ai Blog

Predictions 2024: Open Source LLMs Show Potential to Overtake Closed LLMs

Predictions 2024

In the first of our 2024 AI Predictions Mini-Series, we speculate how open-source LLMs, rapidly innovating throughout 2023, could outperform closed LLMs in capability and the rate of their adoption by developers and organizations. 

What we’re expecting in 2024:

– Open-source Large Language Models (LLMs) like Llama 2 from Meta will surpass the capabilities of their closed counterparts.

– Open-source LLMs will gain popularity due to their accessibility, transparency, and the collaborative efforts of the global AI community.

– Developers and organizations will increasingly turn to open-source LLMs to harness the power of natural language processing and generation.

Open Source LLMs vs. Closed LLMs: What’s the Difference?

Two clear approaches to developing AI LLMs exist: open-source LLMs and closed LLMs. Open-source LLMs are publicly available models that can be used, modified, and improved by anyone. Closed LLMs are proprietary models; generally, the code, training methodology, and software are kept secret.

image 43

OpenAI’s ChatGPT-4, Google Bard, and Anthropic’s Claude have closed LLMs. Meta’s Llama 2 is open source. There’s also the United Arab Emirates Technology Innovation Institute’s (TII) Falcom 180B, Abacus AI’s Giraffe, and Mosaic’s MPT-7B, all open-source, to name a few of the most discussed. It’s worth noting that “open-source” can be a grey area. Some LLMs are more open-source than others. 

Open-source LLMs foster collaboration, innovation, and transparency. Developers learn from each other, share code, build on existing work, and solve problems together. Open-source projects allow developers to inspect and audit AI, which can help solve trust issues, increase accountability, and clarify ethical standards. 

Closed LLMs provide robust security and privacy for developers. They are less at risk from malicious actors, and potentially, training data is better protected. The level of accuracy, functionality, and quality expected from corporate developers can reduce bugs and inconsistencies and increase the reliability of closed models that follow strict guidelines. 

Open Source LLMs will Surpass the Capabilities of Closed LLM Competitors

As the first open-source LLMs launched, often as an endeavor to democratize this keystone technology, they often performed poorly and saw extensive criticism. As AI’s breakout year has progressed, so have LLMs, so much so that open-source models could surpass proprietary competitors. AI experts believe Llama 2, created by Meta and Microsoft, is a serious threat to closed LLMs like GPT-4, and it consistently outperforms its open-source competitors. 

One study shared by Cobus Greyling found Llama-2 beats GPT-3.5 Turbo in certain benchmarks and that open-source LLMs used for LLM-based agents are able to surpass Chat-GPT-3.5 Turbo after extensive and task-specific pre-training and fine-tuning. In addition, open-source ToolLlama is better at tool usage, and Gorilla is better than GPT-4 at writing API calls. 

Another report from the Prompt Engineering Institute says Llama 2’s 40% more training data has led to performance gains, and Meta’s ability to leverage more public data makes Llama 2 more capable.  Llama 2-Chat beat ChatGPT-3 and other models for helpfulness, and the largest Llama 2 model, Llama-2-70b, matched GPT-4 at 85% for factual accuracy. The report summarizes that Llama 2 “stacks up impressively against GPT-4.”

Accessibility, Transparency, and Collaboration will Increase the Popularity of Open-Source LLMs

The cost of developing and training LLMs makes creating proprietary models prohibitive for most companies. Open-source LLMs, which have already been substantially trained, costing millions, are more accessible for companies, startups, and even individual developers. 

It’s estimated that ChatGPT-3 cost OpenAI $4.6 million to train and that a scalable enterprise in-house AI development could cost upwards of $95,000 per year. Pricing for SMB projects is also often less than transparent from LLM providers.

Llama-2’s 85% accuracy reportedly comes 30x cheaper than ChatGPT-4 and with a smaller model size and less complexity than GPT. Prompt Engineering Institute found Llama-2 was substantially less costly per paragraph summary and per 100,000 words than GPT.  Llama-2 also allows AI workflows to be contained internally with zero data exposure and can be self-hosted and modified for more control over data and privacy.

Llama-2, and indeed its open-source competitors, have and will continue to innovate and improve rapidly with the community development and knowledge sharing inherent to open-source technologies. 

Developers and Organizations will Increasingly Turn to Open-Source LLMs

Github’s Octoverse report on the state of open-source and AI says open-source powers “nearly every piece of modern software” and that generative AI projects are now in the top 10 open-source projects on the platform. Github COO Kyle Daigle writes:

“As more developers experiment with these new technologies, we expect them to drive AI innovation in software development and continue to bring the technology’s fast-evolving capabilities into the mainstream.”

Developers are using LLMs to develop APIs, bots, assistants, mobile applications, and plugins, propelling adoption and increasing the AI talent pool. 

The Octoverse report, citing statistics from the Linux Foundation, also reveals that 30% of Fortune 100 companies have Open Source Program Offices (OSPOs). This illustrates how organizations are increasingly embracing open-source contributions to power their operations, and it’s a trend likely to be profound in AI and LLMs, given the cost of development and the benefit of contributions from multiple organizations, experts, and communities to improve AI and overcome its flaws.  

A recent MIT Sloan article penned by Aron Culotta and Nicholas Mattei suggests open-source LLMs are a solution to building generative AI solutions locally rather than risking sensitive data with a third party. 

Open-source LLMs could also become increasingly popular given the apparent divides and less obvious reasons behind these conflicts within companies like OpenAI. Closed LLMs are also iterated rapidly, leaving little time for enterprises to understand or adapt to changes as they already face a significant learning curve deploying AI with customers and employees. 

Lastly, open-source LLMs offer significant freedom to developers and organizations to innovate them in the direction best suited for the aspired purpose and for the available budget and infrastructure. Open-source LLMs can be customized for lower-cost operation, for deploying multiple AI applications, and for user benefits like extended context windows. 

To Conclude 

The drawbacks of open-source LLMs include greater risks of misuse, the challenge of intellectual property rights, and a potential lack of quality control, which can create inconsistencies or even security risks. We’ve not delved into them in depth for this prediction, but these risks and others are notable, and when it comes to AI, addressing and mitigating risk is vital. 

What is clear is that, despite the risks, open-source LLMs have the potential to overtake the performance and popularity of closed LLMs. 

Open-source collaboration is driving rapid innovation, and when coupled with transparency right back to the bare code of these LLMs, as well as potentially lower development and deployment costs, it makes them an attractive proposition.

Frequently Asked Questions

Are open-source LLMs better than closed LLMs in 2024?

Not generally. In 2024, open and source-available LLMs became competitive for many workloads, but closed models were usually the safer production choice for reliability, safety guardrails, and lower ops burden.

LMSYS Chatbot Arena and Stanford HELM showed the tradeoff: some open models approached leaders on public evaluations, but benchmark closeness did not erase gaps many teams still saw in long-context consistency, policy controls, and managed uptime with providers like OpenAI and Anthropic. A practical 2024 rule was simple: choose open models if you need on-prem deployment, stricter data control, data residency, or custom fine-tuning on internal workflows; choose closed models if you need the most reliable long-task behavior and mature safety systems. Also, many “open” models were source-available, not OSI-approved; Meta’s Llama 3 license required separate permission above 700 million monthly active users. For buyers, including CustomGPT.ai users, support for self-hosting and models beyond OpenAI often mattered more than leaderboard wins.

What are the biggest disadvantages of open-source LLMs?

The biggest disadvantages of open-source LLMs are higher security and privacy burden, weaker enterprise assurances, and less predictable reliability. Lower cost and flexibility often mean more internal work on red-teaming, access control, data governance, and evals.

Closed-model vendors such as OpenAI and Anthropic often provide stronger centralized safeguards and audited data-handling commitments. By contrast, transparency varies widely across “open” models: some release weights but not training code or datasets, and Stanford’s 2024 Foundation Model Transparency Index found major gaps in data-provenance disclosure. Some popular options, including Llama, also use licenses that are not OSI approved, which can complicate legal review. MIT’s published zero-hallucination target across 90+ languages shows how demanding predictable behavior can be. For teams handling sensitive data or regulated workflows, open-source models are often a poor fit unless you can build and maintain your own guardrails, privacy controls, and reliability testing.

Is GPT-4 an open-source LLM?

No. GPT-4 is proprietary, not an open-source LLM. OpenAI has not released the model weights, full training data, or enough architecture detail for independent reproduction.

OpenAI’s GPT-4 Technical Report confirms it is a closed model, and the Open Source Initiative’s Open Source AI Definition 1.0 makes clear that API access alone does not make a model open source. In practice, that means you cannot self-host GPT-4, inspect its weights, or fully audit how it was trained. If your requirements include private deployment, data residency, or lower vendor lock-in, open-weight models such as Meta’s Llama 3 or Mistral are usually a better fit. Anthropic Claude and Google Gemini are also proprietary. When comparing platforms such as CustomGPT.ai, check which models are supported, whether private hosting is available, and how portable your setup is if you switch providers.

Why were organizations expected to adopt open-source LLMs faster in 2024?

Organizations were expected to adopt open-source LLMs faster in 2024 because security and procurement teams wanted deployment control, not just the top benchmark model. They needed options beyond ChatGPT/OpenAI or Anthropic, private cloud or on-prem deployment, and a self-hosted fallback.

Open-source models made those requirements practical: teams could pilot a 7B model on a single 24 GB GPU, keep sensitive data inside their environment, and preserve a path to production if pricing, compliance, or vendor policy changed. Meta’s Llama 3 8B release in 2024 and Mistral 7B’s Apache 2.0 license showed that open-weight models were becoming viable for real workloads, not just experiments. For institutions with high-governance content, that mattered; Lehigh University uses AI search across 400 million plus words of newspaper archives, the kind of corpus where data location and model choice affect platform selection. Buyers evaluating CustomGPT.ai increasingly asked those exact questions.

Can open-source LLMs be used in production?

Yes. Open-source LLMs can be used in production if they meet your targets for accuracy, p95 latency, security, and total cost. The deciding factor is deployment fit, not whether the model is open or closed.

If you need stricter data control, teams often choose open-source models for VPC or on-prem deployment, but you still need to validate privacy, logging, and model-update policies before launch. A practical rule is that an open-source model is production-ready only after it passes your domain evals, meets latency targets at expected traffic, and clears legal review on its license and acceptable-use terms. For example, Meta’s Llama 3 needs a separate commercial license above 700 million monthly active users, while Mistral 7B is under Apache 2.0. In practice, success depends as much on retrieval quality, fallback logic, and human handoff as on the base model. At GEMA, a CustomGPT.ai assistant handles 248,000+ inquiries with an 88% success rate.

How do open-source and closed LLMs compare on privacy and transparency?

Open models usually win on transparency, because you can inspect weights, run independent evaluations, and control deployment. Privacy is not determined by open versus closed; it depends on where data is processed, what is logged and retained, who can access it, and whether customer data is used for training.

If your team is asking whether CustomGPT.ai is limited to OpenAI or can support open-source models, the buying question is whether deployment, logging controls, retention, and model choice fit your security requirements. NIST AI RMF 1.0 and the OWASP Top 10 for LLM Applications both put data governance, logging, and access control ahead of model label alone. If sensitive files cannot pass through a vendor-hosted web app, require self-hosted or VPC deployment and confirm prompts, uploads, and logs are excluded from training, with defined retention and admin access. If managed controls are acceptable, compare OpenAI and Anthropic on audit logs, SSO, regional hosting, and data-use policy.

3x productivity.
Cut costs in half.

Launch a custom AI agent in minutes.

Instantly access all your data.
Automate customer service.
Streamline employee training.
Accelerate research.
Gain customer insights.

Try 100% free. Cancel anytime.