Latam-GPT: The Open-Source, Collaborative, and Free AI for Latin America

Latam-GPT: The Open-Source, Collaborative, and Free AI for Latin America

Latam-GPT is an innovative large language model being crafted specifically for Latin America. Spearheaded by the nonprofit Chilean National Center for Artificial Intelligence (CENIA), the initiative aims to empower the region’s technological independence through the development of an open source AI model that is trained on the languages and contexts of Latin America.

“This project cannot be accomplished by a single group or country in Latin America: It is a challenge that demands collective involvement,” states Álvaro Soto, director of CENIA, in an interview with WIRED en Español. “Latam-GPT is designed to foster an open, free, and, most importantly, collaborative AI model. For the past two years, we have been working through a grassroots approach, involving citizens from various countries who are eager to contribute. Recently, we have also witnessed greater engagement from governments, which have begun to take an interest and participate in the project.”

The project is distinguished by its collaborative ethos. “We are not aiming to compete with OpenAI, DeepSeek, or Google. Our goal is to create a model tailored for Latin America and the Caribbean, attuned to the cultural needs and challenges it entails, such as grasping diverse dialects, the region’s historical context, and unique cultural nuances,” elaborates Soto.

With 33 strategic partnerships with institutions throughout Latin America and the Caribbean, the project has amassed a data corpus surpassing eight terabytes, equivalent to millions of books. This extensive database has facilitated the development of a language model featuring 50 billion parameters, which places it on par with GPT-3.5 and equips it with a medium to high capability for executing complex tasks, including reasoning, translation, and association.

Latam-GPT is being trained on a regional dataset that compiles materials from 20 Latin American countries and Spain, encompassing a remarkable total of 2,645,500 documents. The data distribution shows a significant concentration in the largest countries, with Brazil leading at 685,000 documents, followed by Mexico with 385,000, Spain with 325,000, Colombia with 220,000, and Argentina with 210,000 documents. These figures reflect the dimensions of these markets, their digital advancement, and the availability of structured content.

“In its initial phase, we will launch a language model. We anticipate its performance in general tasks will be comparable to large commercial models, but it will excel in areas pertinent to Latin America. The objective is that, when querying topics relevant to our region, its knowledge will be significantly deeper,” Soto states.

The first model will serve as a foundation for developing a suite of more advanced technologies in the future, including those incorporating image and video processing, as well as scaling to larger models. “Given that this is an open project, we want other institutions to leverage it. A group in Colombia might adapt it for the educational system, while one in Brazil could modify it for the healthcare sector. The intention is to pave the way for various organizations to create specialized models aimed at particular fields such as agriculture, culture, and others,” shares the CENIA director.

https://in.linkedin.com/in/rajat-media

Helping D2C Brands Scale with AI-Powered Marketing & Automation 🚀 | $15M+ in Client Revenue | Meta Ads Expert | D2C Performance Marketing Consultant