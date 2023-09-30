Llama 2 Long outperforms other AI models in long queriesLlama 2 Long is an extension of Llama 2, an open-source AI model that Meta released in the summer.Rizwan Choudhury| Sep 30, 2023 08:09 AM ESTCreated: Sep 30, 2023 08:09 AM ESTinnovationLLaMA logo generic image.Source: LightRocket via Getty Images Get a daily digest of the latest news in tech, science, and technology, delivered right to your mailbox. Subscribe now.By subscribing, you agree to our Terms of Use and Policies You may unsubscribe at any time.While Meta Platforms unveiled several new AI-powered features for its popular apps like Facebook, Instagram, and WhatsApp at its annual Meta Connect event in California this week, the most impressive innovation from the social media giant may have gone unnoticed by many.A team of Meta researchers quietly published a paper introducing Llama 2 Long, a new AI model that can generate coherent and relevant responses to long user queries, surpassing some of the best competitors in the field.Llama 2 Long is an extension of Llama 2, an open-source AI model that Meta released in the summer, which can learn from a variety of data sources and perform multiple tasks such as coding, math, language understanding, common sense reasoning, and conversational skills. See Also Related Meta's new AI model can translate nearly 100 languages Here's how to stop Meta from using your data for AI training Meta developing an AI model more powerful than LLaMa 2 However, Llama 2 Long has been trained on more data that contains longer texts and has been modified to handle longer sequences of information. This allows it to outperform other models such as OpenAI's GPT-3.5 Turbo and Claude 2, which have limitations on how much context they can use to generate responses.How Llama 2 Long worksThe Meta researchers used different versions of Llama 2, ranging from 7 billion to 70 billion parameters, which are the values that the AI model can adjust as it learns from data. They added another 400 billion tokens (units of text) of data that contained longer texts than the original Llama 2 dataset.They also tweaked the architecture of Llama 2 slightly, by changing the way it encodes the position of each token in the sequence. They used a technique called Rotary Positional Embedding (RoPE), which maps each token to a point on a 3D graph that shows its relation to other tokens, even when rotated. This helps the model produce accurate and helpful responses with less information and memory than other methods.They reduced the rotation angle of the RoPE encoding from Llama 2 to Llama 2 Long, which enabled them to include more tokens that are far apart or less frequent in the model's knowledge base.They also used reinforcement learning from human feedback (RLHF), a method where the AI model is rewarded for correct answers and corrected by human evaluators, and synthetic data generated by Llama 2 chat itself, to improve its performance on various tasks.The paper claims that Llama 2 Long can generate high-quality responses to user prompts that are up to 200,000 characters long, which is equivalent to about 40 pages of text. The paper also shows examples of Llama 2 Long's responses to queries on topics such as history, science, literature, and sports.The researchers say that Llama 2 Long is a step towards building more general and versatile AI models that can handle complex and diverse user needs. They also acknowledge the potential ethical and social implications of such models and call for more research and dialogue on how to use them responsibly and beneficially.Study abstract:We present a series of long-context LLMs that support effective context windowsof up to 32,768 tokens. Our model series are built through continual pretrainingfrom LLAMA 2 with longer training sequences and on a dataset where long textsare upsampled. We perform extensive evaluation on language modeling, syntheticcontext probing tasks, and a wide range of research benchmarks. On researchbenchmarks, our models achieve consistent improvements on most regular tasksand significant improvements on long-context tasks over LLAMA 2. Notably, witha cost-effective instruction tuning procedure that does not require human-annotatedlong instruction data, the 70B variant can already surpass gpt-3.5-turbo-16k’soverall performance on a suite of long-context tasks. Alongside these results, weprovide an in-depth analysis on the individual components of our method. Wedelve into LLAMA’s position encodings and discuss its limitation in modelinglong dependencies. We also examine the impact of various design choices inthe pretraining process, including the data mix and the training curriculum ofsequence lengths – our ablation experiments suggest that having abundant longtexts in the pretrain dataset is not the key to achieving strong performance, andwe empirically verify that long context continual pretraining is more efficient andsimilarly effective compared to pretraining from scratch with long sequences. HomeInnovationAdd Interesting Engineering to your Google News feed.Add Interesting Engineering to your Google News feed.SHOW COMMENT (1) For You Giorgio Rosa, the engineer who built his own islandWhat are these black balls doing in the reservoir?You can hear silence, claim researchers, settling an old debateGoogle's new undersea cable will connect Portugal and the USCan you improve on the ISS? Nanoracks hopes Starlab will fit the billCodec avatars: The next frontier of Meta’s VR technologyPerseverance rover tracks a dust devil moving at 12 mph on MarsMaybe you can hear sounds in space after allDNA tells the story of the first cowboys’ African descentChinese researchers create dancing microrobots using lasers Job Board