BMI Calculator – Check your Body Mass Index for free!
Latam-GPT: The Free, Open Source, and Collaborative AI of Latin America
Latam-GPT is new large language model being developed in and for Latin America. The project, led by the nonprofit Chilean National Center for Artificial Intelligence (CENIA), aims to help the region achieve technological independence by developing an open source AI model trained on Latin American languages and contexts.
“This work cannot be undertaken by just one group or one country in Latin America: It is a challenge that requires everyone’s participation,” says Álvaro Soto, director of CENIA, in an interview with WIRED en Español. “Latam-GPT is a project that seeks to create an open, free, and, above all, collaborative AI model. We’ve been working for two years with a very bottom-up process, bringing together citizens from different countries who want to collaborate. Recently, it has also seen some more top-down initiatives, with governments taking an interest and beginning to participate in the project.”
The project stands out for its collaborative spirit. “We’re not looking to compete with OpenAI, DeepSeek, or Google. We want a model specific to Latin America and the Caribbean, aware of the cultural requirements and challenges that this entails, such as understanding different dialects, the region’s history, and unique cultural aspects,” explains Soto.
Thanks to 33 strategic partnerships with institutions in Latin America and the Caribbean, the project has gathered a corpus of data exceeding eight terabytes of text, the equivalent of millions of books. This information base has enabled the development of a language model with 50 billion parameters, a scale that makes it comparable to GPT-3.5 and gives it a medium to high capacity to perform complex tasks such as reasoning, translation, and associations.
Latam-GPT is being trained on a regional database that compiles information from 20 Latin American countries and Spain, with an impressive total of 2,645,500 documents. The distribution of data shows a significant concentration in the largest countries in the region, with Brazil the leader with 685,000 documents, followed by Mexico with 385,000, Spain with 325,000, Colombia with 220,000, and Argentina with 210,000 documents. The numbers reflect the size of these markets, their digital development, and the availability of structured content.
“Initially, we’ll launch a language model. We expect its performance in general tasks to be close to that of large commercial models, but with superior performance in topics specific to Latin America. The idea is that, if we ask it about topics relevant to our region, its knowledge will be much deeper,” Soto explains.
The first model is the starting point for developing a family of more advanced technologies in the future, including ones with image and video, and for scaling up to larger models. “As this is an open project, we want other institutions to be able to use it. A group in Colombia could adapt it for the school education system or one in Brazil could adapt it for the health sector. The idea is to open the door for different organizations to generate specific models for particular areas like agriculture, culture, and others,” explains the CENIA director.
The supercomputing infrastructure at the University of Tarapacá (UTA) in Arica, Chile, is a fundamental pillar for Latam-GPT. With a projected investment of $10 million, the new center has a cluster of 12 nodes, each equipped with eight state-of-the-art NVIDIA H200 GPUs. This capacity, unprecedented in Chile and the region more broadly, not only enables large-scale model training in the country for the first time, it also encourages decentralization and energy efficiency.
The first version of Latam-GPT will be launched this year. The model will be refined and expanded as new strategic partners join the effort and more robust data sets are integrated into it.
The interview was edited for length and clarity.
WIRED: Tech giants such as Google, OpenAI, and Anthropic have invested billions in their models. What is the technical and strategic argument for the development of a separate model specifically for Latin America?
Álvaro Soto: Regardless of how powerful these other models may be, they are incapable of encompassing everything relevant to our reality. I feel that today they are too focused on the needs of other parts of the world. Imagine if we wanted to use them to modernize the education system in Latin America. If you ask one of these models for an example, it would probably tell you about George Washington.
We should be concerned about our own needs; we cannot wait for others to find the time to ask us what we need. Given that these are new and very disruptive technologies, there is room and a need for us, in our region, to take advantage of their benefits and understand their risks. Having this experience is essential to guiding the use of technology forward along the best path.
This also opens up possibilities for our researchers. Today, Latin American academics have few opportunities to interact in depth with these models. It is as if we wanted to study magnetic resonance imaging but didn’t have a resonator. Latam-GPT seeks to be that fundamental tool so that the scientific community can experiment and advance.
The key input is data. What is the status of the Latam-GPT corpus, and how are you addressing the challenge of including not only variants of Spanish and Portuguese, but also indigenous languages?
We have put a lot of emphasis on generating high-quality data. It’s not just about volume, but also composition. We analyze regional diversity to ensure that the data does not come disproportionately from just one country, but that there is a balanced representation. If we notice that Nicaragua is underrepresented in the data, for example, we’ll actively seek out collaborators there.
We also analyze the diversity of topics—politics, sports, art, and other areas—to have a balanced corpus. And, of course, there is cultural diversity. In this first version, we have focused on having cultural information about our ancestral peoples, such as the Aztecs and the Incas, rather than on the language itself. In the future, the idea is to also incorporate indigenous languages. At CENIA, we are already working on translators for Mapuche and Rapanui, and other groups in the region are doing the same with Guaraní. It is a clear example of something that we have to do ourselves, because no one else will.
Could you tell us more about CENIA and how this initiative was established in Chile?
Between 2017 and 2018, a group of experts, which included me as a member, developed Chile’s National Artificial Intelligence Policy. One of the conclusions of the group was that there was a need to create an institution that would oversee the development of a synergistic and healthy AI ecosystem that encompassed science, technology transfer to industry, and social responsibility. CENIA was created to be that institution.
Although it started in Chile, we have a regional vision and we believe that together we are stronger. We have promoted initiatives such as the Latin American Artificial Intelligence Index, a collaborative study that measures the progress of AI in countries across the region.
Your specialty is cognitive robotics. How does a regional language model relate to an autonomous agent’s ability to interact in a Latin American context?
In cognitive robotics, the cognitive part is intelligence. My career has focused on developing intelligence for physical machines. Today, language models and foundational models are at the forefront of AI. They are the most powerful tools we have, so my work is dedicated to understanding and contributing to the scientific and applied development of this type of technology.
Models face issues around geopolitics and power that have been covered by media. What are the specific challenges in Latin America when it comes to these models?
We face many challenges, but we also have many strengths, such as our openness and our capacity for collaboration, which we have seen in the Latam-GPT project. That said, one of the key areas we need to focus on is education. These technologies are going to change the skills required of younger generations. Rote learning will be less critical; what will be important is knowing how to use the knowledge of AI. We must prepare our young people for this, while also promoting the social sciences and critical thinking. If I had to choose where to apply these technologies, it would be in education, because it addresses the root cause of many of our problems.
A project like this requires massive computing power. Is it realistic to think that our region can develop the necessary infrastructure? What implications does this have for the technological sovereignty of Latin America?
It’s essential. If you want to play football, you need a field and a ball. Here, computing power is the field. We need to develop it, whether in the cloud or in our own data centers. It’s a necessary infrastructure for this new technological era, just as telecommunications infrastructure was for the internet.
Looking ahead to 2030, what would be a successful scenario for a model like Latam-GPT? Will we be technology developers and not simply consumers?
Success would mean that Latam-GPT has played an important role in the development of artificial intelligence in this region. That different organizations can take this technology and apply it, for example, to education. That new generations of Latin Americans are better prepared because they had access to tools that spoke to them in their context, with their cultural references, with figures from our history, and not just using examples from other parts of the world. If we manage to give this technology a Latin American stamp and contribute to its development, the project will have been a great success.
This interview was first published by Wired en Español. It was translated by John Newton.
BMI Calculator – Check your Body Mass Index for free!