1
Feature Story
Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper | Anyscale
Aug 29, 2023 · anyscale.com
Key takeaways
- The study found that Llama-2-70b, an open-source language model, is almost as accurate as GPT-4 in terms of factuality, and significantly better than GPT-3.5-turbo.
- Two practical issues were encountered during the experiment: not following instructions and ordering bias. Larger models were better at following instructions, and ordering bias was tested by swapping the order of options.
- Despite Llama 2's tokenization being longer than ChatGPT's by 19%, it was found to be 30 times cheaper than GPT-4 for equivalent levels of factuality in summarization.
- The study suggests using Llama-2-70b or GPT-4 to increase the chances of a factual summarization, and advises against using smaller Llamas or GPT-3.5-turbo.