What's in the RedPajama-Data-1T LLM training set

4.6 (753) · $ 23.99 · In stock

RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, …

Meet Skill-it: A Data-Driven Skills Framework for Understanding

Data analysis with SQLite and Python for PyCon 2023

Catching up on the weird world of LLMs

Easily Train a Specialized LLM: PEFT, LoRA, QLoRA, LLaMA-Adapter

Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

How we built better GenAI with programmatic data development

Artificial Intelligence – Page 3 – Data Machina Newsletter – a

RedPajama-Data-v2: An open dataset with 30 trillion tokens for

LLaMA clone: RedPajama – first open-source decentralized AI with

togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face