EleutherAI releases massive AI training dataset of licensed and open domain text - TechCrunch
08.06.2025 10:14

EleutherAI, an AI research organization, has released what it claims is one of the largest collections of licensed and open-domain text for training AI models.
The dataset, called the Common Pile v0… [+3510 chars]