DMR News

Advancing Digital Conversations

AI Singapore and Google collaborate to enhance large language models in Southeast Asia (SEA).

ByYasmeeta Oon

Mar 13, 2024

AI Singapore and Google collaborate to enhance large language models in Southeast Asia (SEA).

Singapore – In a landmark partnership, AI Singapore (AISG) and Google Research have embarked on an ambitious project named SEALD (Southeast Asian Languages in One Network Data) to revolutionize large language models (LLMs) by focusing on Southeast Asian languages. This collaboration aims to address the linguistic diversity of the region by improving the datasets used for training and evaluating these models.

Bridging Linguistic Gaps in Southeast Asia

Project SEALD sets out to tackle the complexity and variety of languages in Southeast Asia, beginning with a focus on Indonesian, Thai, Tamil, Filipino, and Burmese. The initiative promises to make AI technologies more inclusive and accessible, thereby fostering a more connected and understanding society.

Key Languages Targeted in Project SEALD:
  • Indonesian
  • Thai
  • Tamil
  • Filipino
  • Burmese

One of the notable ambitions of Project SEALD is to use LLMs to facilitate communication with underrepresented migrant worker communities in Singapore, who often find regional languages easier to navigate than English. By incorporating culturally relevant data into LLM training, the project aims to strengthen the communication channels between the Singaporean government, employers, and migrant workers.

Yolyn Ang, Vice President of Asia-Pacific Business Development at Google, emphasized the project’s potential impact: “This will open new opportunities and make AI more inclusive, accessible, and helpful for individuals and businesses throughout the region.”

Highlights of the SEALD Initiative:
  • Enhancement of language model capabilities across five key Southeast Asian languages.
  • Creation of diverse and rich datasets to support AI inclusivity.
  • Use of LLMs to support underrepresented communities, improving societal engagement.
  • Development and public sharing of datasets and findings to promote open-source collaboration.

In addition to the datasets, the collaboration will aid in the development of AISG’s Sea-Lion model, further embedding the diversity of Southeast Asian languages into the digital landscape. Both AISG and Google Research plan to make their datasets and insights from Project SEALD accessible to the public, ensuring that the benefits of this initiative can be widely shared.

Parallel to Project SEALD, Google has also launched Project Vaani in India, focusing on enhancing speech data across the country’s 773 districts, thereby showcasing a continued commitment to linguistic diversity and accessibility in AI technologies across Asia.

This collaboration between AI Singapore and Google Research not only marks a significant step towards making AI more inclusive but also highlights the importance of linguistic diversity in the development of technology that serves all members of society.


Related News:


Featured Image courtesy of DALL-E by ChatGPT

Yasmeeta Oon

Just a girl trying to break into the world of journalism, constantly on the hunt for the next big story to share.