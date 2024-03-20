



Posted by Yossi Matias, VP of Engineering and Research and Google Research Research Scientist, Gray Nearing

Floods are the most common natural disaster, causing approximately $50 billion in economic losses annually around the world. Partly due to the effects of climate change, the incidence of flood-related disasters has more than doubled since 2000. Nearly 1.5 billion people, representing 19% of the world's population, are at significant risk from severe flooding. Upgrading early warning systems to give these people access to accurate and timely information can save thousands of lives each year.

Driven by the potential impact that reliable flood forecasting can have on people's lives around the world, we launched our flood forecasting initiative in 2017. Through this multi-year effort, we have carried out many years of parallel research in building real-time operational flood forecasting. A system that provides alerts through Google Search, Maps, Android Notifications, and Flood Hub. However, further research advances were needed to scale globally, especially in locations where accurate local data are not available.

“Global prediction of extreme flooding in unmetered watersheds,” published in Nature, shows how machine learning (ML) technology can improve global flooding compared to the current state-of-the-art in flood-related countries. We demonstrate how predictions can be significantly improved. Insufficient data. These AI-based technologies extend the reliability of currently available global nowcasts from an average of 0 to 5 days and improve forecasts across regions in Africa and Asia, similar to those currently available in Europe. Did. The model evaluation was carried out in collaboration with the European Center for Medium-Range Weather Forecasts (ECMWF).

These technologies enable Flood Hub to provide real-time river forecasts up to seven days in advance, covering river basins in more than 80 countries. This information can be used by people, communities, governments and international organizations to take proactive action to protect vulnerable people.

Flood forecasting on Google

The ML models that power the FloodHub tool are the result of years of research conducted in collaboration with multiple partners, including academics, governments, international organizations, and NGOs.

In 2018, we piloted an early warning system in the Ganges-Brahmaputra river basin in India, based on the hypothesis that ML could help address the difficult problem of large-scale, reliable flood prediction. It was started. The following year, the pilot was further expanded with a combination of inundation models, real-time water level measurements, elevation mapping, and hydrological modeling.

In collaboration with academics, particularly the JKU Machine Learning Institute, we investigated ML-based hydrological models and showed that LSTM-based models can produce more accurate simulations than traditional conceptual physics-based hydrological models. This research has improved flood forecasting and enabled the expansion of forecast coverage to include all of India and Bangladesh. We also worked with researchers at Yale University to test technological interventions that expand the reach and impact of flood warnings.

Our hydrological model predicts river flooding by processing publicly available meteorological data such as precipitation and physical watershed information. Such models need to be calibrated to long data records from individual river discharge stations. A small proportion of the world's river basins (watersheds) are equipped with flow meters, which are expensive but necessary to provide relevant data, and are useful for hydrological simulation and forecasting in basins lacking this infrastructure. is difficult to do. Declining gross domestic product (GDP) correlates with increased vulnerability to flood risk, and there is an inverse relationship between a country's GDP and the amount of publicly available data within the country. ML helps address this problem by allowing a single model to be trained on all available river data and applied to ungauged watersheds where data is unavailable. In this way, the model can be trained globally and can predict the location of any river.

There is an inverse correlation (vs. logarithm) between the amount of domestically published flow data and national GDP. Streamflow data from Global Runoff Data Center.

Our academic collaboration developed a method to estimate uncertainty in river predictions and led to ML research showing how ML river prediction models synthesize information from multiple data sources . They demonstrated that these models can reliably simulate extreme events even when they are not part of the training data. In an effort to contribute to open science, in 2023 we open sourced a community-driven dataset for large-scale sample hydrology at Nature Scientific Data.

river prediction model

Most hydrological models used by national and international agencies for flood forecasting and river modeling are state-space models, which require only daily inputs (precipitation, temperature, etc.) and the current state of the system (soil moisture, soil moisture, etc.) depends on. snowpack, etc.). LSTM is a variation of state-space models that works by defining a neural network that represents a single time step. Input data (such as current weather conditions) is processed to produce updated state information and output values ​​(streamflow) for that time step. . LSTMs are applied sequentially to make time-series predictions, and in this sense behave similarly to the way scientists typically conceptualize hydrological systems. Empirically, LSTM is found to perform well in the task of river prediction.

An illustration of an LSTM, a neural network that operates sequentially in time. An accessible primer can be found here.

Our river forecast model uses two LSTMs that are applied sequentially. (1) A “hindcast” LSTM incorporates historical weather data (dynamic hindcast features) up to the present (or the time of publication of the forecast). (2) LSTM takes in conditions from Hindcast LSTM along with predicted weather data (dynamic prediction function) to make future predictions. One year of past weather data is input to Hindcast LSTM, and 7 Daily predicted weather data is input into the forecast LSTM. Static features include the geographic and geophysical characteristics of the watershed. These are input into both the subsequent and forecast LSTMs, allowing the model to perform various You will be able to learn different hydrological behaviors and responses in different types of watersheds.

The output from the predictive LSTM is fed to a “head” layer that uses a mixed density network to generate probabilistic predictions (i.e., predicted parameters for the probability distribution over the streamflow). Specifically, the model predicts the parameters of a mixture of heavy-tailed probability density functions called an asymmetric Laplacian distribution at each prediction time step. The result is a mixture density function called the Countable Mixture of Asymmetric Laplacian (CMAL) distribution, which represents a probabilistic prediction of the volumetric discharge of a particular river at a particular time.

LSTM-based river prediction model architecture. Two LSTMs are applied in sequence, one to ingest historical weather data and the other to ingest predicted weather data. The output of the model is the parameters of the probability distribution across the streamflow at each predicted timestep.Input data and training data

This model uses three types of publicly available data inputs, primarily from government sources.

Static watershed attributes representing geographic and geophysical variables: from the HydroATLAS project. It includes data such as long-term climate indicators (precipitation, temperature, snow cover), land cover, and anthropogenic attributes (e.g. night-light indicators as a proxy for human development). ). Historical weather time series data: Used to spin up the model for one year prior to the time the forecast is issued. Data are derived from NASA IMERG, the NOAA CPC Worldwide Unified Gauge-Based Daily Precipitation Analysis, and the ECMWF ERA5 Land Reanalysis. Variables include total daily precipitation, temperature, solar and thermal radiation, snowfall, and surface pressure. Forecast weather time series over a 7-day forecast period: used as input for the forecast LSTM. These data are the same meteorological variables listed above and are obtained from the ECMWF HRES atmospheric model.

The training data are daily flow values ​​from the Global Runoff Data Center over the period 1980 to 2023. To improve accuracy, a single flow prediction model is trained using data from 5,680 diverse basin flow meters (see below).

5,680 flow meter locations providing training data for river prediction models from the Global Runoff Data Center.Improvements to current state-of-the-art technology

We compared our river prediction model to GloFAS version 4, the current state-of-the-art global flood prediction system. These experiments showed that ML can provide early and accurate warnings of larger and more impactful events.

The figure below shows the distribution of F1 scores for predicting events of varying severity at river locations around the world with an accuracy of plus or minus one day. The F1 score is the average of precision and recall, and the severity of the event is measured by the return period. For example, a two-year return period event is a river flow that is expected to be exceeded on average once every two years. Our model achieves reliability scores on average comparable to or better than that of GloFAS nowcasts (0 day lead time) at lead times of up to 4 or 5 days.

Distribution of F1 scores for 2-year return period events in 2,092 basins worldwide for the period 2014-2023 (with different lead times) by GloFAS (blue) and our model (orange). On average, our model statistically outperforms his GloFAS nowcast (lead time 0 days) up to 5 days in advance over 2 years (shown) and 1, 5, and 10 year events (not shown). It's just as accurate. .

Additionally (not shown), our model achieves accuracy for larger, rarer, and extreme events, with precision and recall scores for 5-year return period events as low as for 1-year return period events. The accuracy is equal to or better than that of GloFAS. See the paper for more details.

Looking to the future

This flood forecasting effort is part of our Adaptation and Resilience work and reflects our commitment to combating climate change while building the resilience of global communities. We believe that AI and ML will continue to play a critical role in helping advance science and research to combat climate change.

We actively collaborate with several international aid agencies (such as the Humanitarian Data Center and the Red Cross) to provide actionable flood forecasts. Additionally, in our ongoing collaboration with the World Meteorological Organization (WMO) to support early warning systems for climate disasters, to understand how AI can address real-world challenges faced by national flood forecasting agencies. We are conducting research on

While the research presented here represents a major advance in flood forecasting, it also extends the scope of flood forecasting to more locations around the world and to other types of flood-related events and disasters, such as flash floods and urban flooding. Future work is required to expand. We look forward to continuing our collaboration with academic and professional communities, local government, and industry partners to achieve these goals.

