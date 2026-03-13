Flash floods are among the deadliest weather disasters globally, causing more than 5,000 deaths each year, yet they remain difficult to predict because of their sudden and localized nature.

Google researchers say they have developed a new method to improve forecasting by analyzing millions of news reports about flooding events around the world.

The company announced Thursday that it used its Gemini artificial intelligence model to build a large dataset of flood events, helping train machine learning models to estimate flash flood risk.

Using News Reports To Fill Weather Data Gaps

Flash floods are difficult to monitor because they occur quickly and often in areas without extensive weather infrastructure.

Unlike temperature or river flow data, which are continuously measured over time, flash flood events often lack detailed records.

That lack of data makes it difficult for deep learning systems to accurately forecast these events.

To address the problem, Google researchers analyzed approximately 5 million news articles from across the world using Gemini.

The model identified and cataloged reports of about 2.6 million flood incidents.

Researchers converted those reports into a geo-tagged time-series dataset called Groundsource.

According to Google Research product manager Gila Loike, this is the first time the company has used a large language model to extract structured environmental data from news reports at this scale.

The dataset and related research were released publicly on Thursday.

Training A Flash Flood Prediction Model

Researchers used the Groundsource dataset as a reference baseline for training a flood prediction model.

The forecasting system uses a Long Short-Term Memory neural network, a type of machine learning model commonly used to analyze time-series data.

The model processes global weather forecasts and estimates the probability that flash floods could occur in a specific location.

Google has integrated the predictions into its Flood Hub platform.

The system currently highlights flash flood risk for urban areas in 150 countries.

The company is also sharing data from the system with emergency response agencies worldwide.

António José Beleza, an emergency response official with the Southern African Development Community, said the system helped his organization respond to floods more quickly during testing.

Model Designed For Regions With Limited Weather Infrastructure

The forecasting system still has limitations.

The current model identifies flood risk across areas of about 20 square kilometers, which is relatively low resolution.

It is also less precise than systems such as the U.S. National Weather Service flood alert network.

Those systems incorporate local radar data to track rainfall in real time.

Google’s model instead relies on global weather forecasts combined with historical event data.

The approach is designed to support regions that lack advanced meteorological infrastructure.

“Because we’re aggregating millions of reports, the Groundsource data set actually helps rebalance the map,” said Juliet Rothenberg, a program manager on Google’s Resilience team.

“It enables us to extrapolate to other regions where there isn’t as much information.”

Potential For New AI-Generated Environmental Datasets

Researchers say the approach could also be applied to other natural hazards that lack large structured datasets.

Rothenberg said the team hopes language models could help assemble data for events such as heat waves or mudslides.

Marshall Moutenot, chief executive of Upstream Tech, said the work reflects broader efforts to create better datasets for machine learning weather forecasting.

Upstream Tech develops forecasting systems for customers including hydropower operators.

“Data scarcity is one of the most difficult challenges in geophysics,” Moutenot said.

“Simultaneously, there’s too much Earth data, and then when you want to evaluate against truth, there’s not enough. This was a really creative approach to get that data.”

Featured image credits: Flickr

