What it means to build local AI

https://arab.news/gjmq2
Following OpenAI's public launch of ChatGPT in November 2022, the underpinnings of artificial intelligence large language models seemed firmly 'WIRED' — Western, industrialized, rich, educated, and democratic. Everyone assumed that if large language models spoke a particular language and reflected a particular worldview, it would be a Western one. OpenAI even acknowledged ChatGPT's skew toward Western views and the English language.
But even before OpenAI's US competitors (Google and Anthropic) released their own large language models the following year, Southeast Asian developers had recognized the need for AI tools that would speak to their own region in its many languages — no small task, given that it has more than 1,200 of them.
Moreover, in a region where distant civilizational memories often collide with contemporary, postcolonial histories, language is profoundly political. Even seemingly monolingual countries belie marked diversity: Cambodians speak nearly 30 languages; Thais, roughly 70; and Vietnamese, more than 100. This is also a region where communities mix languages seamlessly, where nonverbal cues speak volumes, and where oral traditions are sometimes more prevalent than textual means of capturing the deep cultural and historical nuances that have been encoded in language.
Not surprisingly, those trying to build truly local AI models for a region with so many underrepresented languages have faced many obstacles, from a paucity of high-quality, high-quantity annotated data to a lack of access to the computing power needed to build and train models from scratch. In some cases, the challenges are even more basic, reflecting a shortage of native speakers and standardized orthography or frequent electricity supply disruptions.
Given these constraints, many of the region's AI developers have settled for fine-tuning established models built by foreign incumbents. This involves taking a pretrained model that has been fed large quantities of data and training it on a smaller dataset for a specific skill or task. Between 2020 and 2023, Southeast Asian language models such as PhoBERT (Vietnamese), IndoBERT (Indonesian) and Typhoon (Thai) were derived from much larger models such as Google's BERT, Meta's RoBERTa (later LLaMA) and France's Mistral. Even the early versions of SeaLLM, a suite of models optimized for regional languages and released by Alibaba's DAMO Academy, were built on Meta, Mistral, and Google's architecture.
But in 2024, Alibaba Cloud's Qwen disrupted this Western dominance, offering Southeast Asia a wider set of options. A Carnegie Endowment for International Peace study found that five of the 21 regional models launched that year were built on Qwen.
Ironically, efforts to localize AI could deepen developers' dependence on much larger players, at least in the initial stages.
Elina Noor
Still, just as Southeast Asian developers previously had to account for a latent Western bias in the available foundation models, now they must be mindful of the ideologically filtered perspectives embedded in pretrained Chinese models. Ironically, efforts to localize AI and ensure greater agency for Southeast Asian communities could deepen developers' dependence on much larger players, at least in the initial stages.
Nonetheless, Southeast Asian developers have begun to address this problem, too. Multiple models, including SEA-LION (a collection of 11 official regional languages), PhoGPT (Vietnamese) and MaLLaM (Malay), have been pre-trained from scratch on a large, generic dataset of each particular language. This key step in the machine-learning process will allow these models to be further fine-tuned for specific tasks.
Although SEA-LION continues to rely on Google's architecture for its pre-training, its use of a regional language dataset has facilitated the development of homegrown models such as Sahabat-AI, which communicates in Indonesian, Sundanese, Javanese, Balinese, and Bataknese. Sahabat-AI proudly describes itself as 'a testament to Indonesia's commitment to AI sovereignty.'
But representing native perspectives also requires a strong base of local knowledge. We cannot faithfully present Southeast Asian perspectives and values without understanding the politics of language, traditional sense-making and historical dynamics.
For example, time and space — widely understood in the modern context to be linear, divisible and measurable for the purposes of maximizing productivity — are perceived differently in many indigenous communities. Balinese historical writings that defy conventional patterns of chronology might be viewed as myths or legends in Western terms, but they continue to shape how these communities make sense of the world.
Historians of the region have cautioned that applying a Western lens to local texts heightens the risk of misinterpreting indigenous perspectives. In the 18th and 19th centuries, Indonesia's colonial administrators frequently read their own understanding of Javanese chronicles into translated reproductions. As a result, many biased British and European observations of Southeast Asians have come to be treated as valid historical accounts, and ethnic categorizations and stereotypes from official documents have been internalized. If AI is trained on this data, the biases could end up further entrenched.
Data is not knowledge. Since language is inherently social and political — reflecting the relational experiences of those who use it — asserting agency in the age of AI must go beyond the technical sufficiency of models that communicate in local languages. It requires consciously filtering legacy biases, questioning assumptions about our identity and rediscovering indigenous knowledge repositories in our languages. We cannot project our cultures faithfully through technology if we barely understand them in the first place.

Hashtags

#AI

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Nvidia Insiders Sold over $1 billion in Stock amid Market Surge, FT Reports

Asharq Al-Awsat

5 hours ago

Asharq Al-Awsat

Nvidia Insiders Sold over $1 billion in Stock amid Market Surge, FT Reports

Nvidia insiders sold over $1 billion worth of company stock in the past year, with a notable uptick in recent trading activity as executives capitalize on surging investor interest in artificial intelligence, the Financial Times reported on Sunday. More than $500 million of the share sales took place this month as the California-based chip designer's share price climbed to an all-time high, the report said, according to Reuters. Jensen Huang, Nvidia's chief executive, started selling shares this week for the first time since September, the SEC filing showed. Nvidia's stock hit a record on Wednesday, and the chipmaker reclaimed the crown as the world's most valuable company after an analyst said the chipmaker was set to ride a "Golden Wave" of artificial intelligence. Its latest gains reflect the US stock market's return to the "AI trade" that fueled massive gains in chip stocks and related technology companies in recent years on optimism about the emerging technology. Nvidia declined to comment on the FT report. Reuters could not immediately confirm the report. Nvidia's shares have rebounded over 60% from their closing low on April 4, when Wall Street was reeling from President Donald Trump's global tariff announcements. US stocks, including Nvidia, have recovered on expectations the White House will reach trade deals to soften the tariffs.

China's Humanoid Robots Generate More Soccer Excitement than their Human Counterparts

Asharq Al-Awsat

11 hours ago

Asharq Al-Awsat

China's Humanoid Robots Generate More Soccer Excitement than their Human Counterparts

While China's men's soccer team hasn't generated much excitement in recent years, humanoid robot teams have won over fans in Beijing based more on the AI technology involved than any athletic prowess shown. Four teams of humanoid robots faced off in fully autonomous 3-on-3 soccer matches powered entirely by artificial intelligence on Saturday night in China's capital in what was touted as a first in China and a preview for the upcoming World Humanoid Robot Games, set to take place in Beijing. According to the organizers, a key aspect of the match was that all the participating robots operated fully autonomously using AI-driven strategies without any human intervention or supervision. Equipped with advanced visual sensors, the robots were able to identify the ball and navigate the field with agility They were also designed to stand up on their own after falling. However, during the match several still had to be carried off the field on stretchers by staff, adding to the realism of the experience. China is stepping up efforts to develop AI-powered humanoid robots, using sports competitions like marathons, boxing, and football as a real-world proving ground. Cheng Hao, founder and CEO of Booster Robotics, the company that supplied the robot players, said sports competitions offer the ideal testing ground for humanoid robots, helping to accelerate the development of both algorithms and integrated hardware-software systems. He also emphasized safety as a core concern in the application of humanoid robots. 'In the future, we may arrange for robots to play football with humans. That means we must ensure the robots are completely safe,' Cheng said. 'For example, a robot and a human could play a match where winning doesn't matter, but real offensive and defensive interactions take place. That would help audiences build trust and understand that robots are safe.' Booster Robotics provided the hardware for all four university teams, while each school's research team developed and embedded their own algorithms for perception, decision-making, player formations, and passing strategies—including variables such as speed, force, and direction, according to Cheng. In the final match, Tsinghua University's THU Robotics defeated the China Agricultural University's Mountain Sea team with a score of 5–3 to win the championship. Wu, a supporter of Tsinghua, celebrated their victory while also praising the competition. 'They (THU) did really well,' he said. 'But the Mountain Sea team (of Agricultural University) was also impressive. They brought a lot of surprises.' China's men have made only one World Cup appearance and have already been knocked out of next years' competition in Canada, Mexico and the United States.

AI is Learning to Lie, Scheme, and Threaten its Creators

Asharq Al-Awsat

14 hours ago

Asharq Al-Awsat

AI is Learning to Lie, Scheme, and Threaten its Creators

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals. In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair. Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed. These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed. This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses. According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts. "O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems. These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives. - 'Strategic kind of deception' - For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios. But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception." The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes. Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up." Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder. "This is not just hallucinations. There's a very strategic kind of deception." The challenge is compounded by limited research resources. While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception." Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS). No rules Current regulations aren't designed for these new problems. The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving. In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules. Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread. "I don't think there's much awareness yet," he said. All this is taking place in a context of fierce competition. Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein. This breakneck pace leaves little time for thorough safety testing and corrections. "Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.". Researchers are exploring various approaches to address these challenges. Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach. Market forces may also provide some pressure for solutions. As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it." Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm. He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.

What it means to build local AI

Hashtags

Try Our AI Features

Comments

Related Articles

Nvidia Insiders Sold over $1 billion in Stock amid Market Surge, FT Reports

China's Humanoid Robots Generate More Soccer Excitement than their Human Counterparts

AI is Learning to Lie, Scheme, and Threaten its Creators

Get Started Now: Download the App