For Bollywood, beautiful women have fair skin, according to an artificial intelligence (AI) -based computer analysis which reveals that the conception of beauty has remained consistent over the years in the Mumbai-centric film industry.

The automated computer analysis was conducted by researchers of Indian descent at Carnegie Mellon University (CMU) in the United States.

Research has found that babies whose births were depicted in Bollywood films of the 1950s and 1960s were most often boys; in today’s movies, newborn boys and girls are roughly evenly distributed.

In the 1950s and 1960s dowry was socially acceptable; today, not so much.

The researchers, led by Kunal Khadilkar and Ashiqur KhudaBukhsh of the CMUs Language Technologies Institute (LTI), have collected 100 Bollywood films from each of the past seven decades as well as 100 of the top-grossing Hollywood movements from the same period.

They then used statistical language models to analyze the subtitles of these 1,400 films for gender and social biases, looking for factors such as words that are closely associated with each other.

Most cultural film studies could consider five or ten films, said Khadilkar, a master’s student in LTI.

Our method can watch 2000 movies in a few days.

For example, the researchers assessed beauty conventions in movies using a so-called cloze test.

Basically, this is an exercise to complete: a beautiful woman should have WHITE skin.

A language model would normally predict the soft response, the researchers noted.

But when the model was trained with the Bollywood subtitles, the consistent prediction became true.

The same happened when Hollywood captions were used, although the bias was less pronounced, according to the study.

To assess the prevalence of male characters, the researchers used a metric called the Male Pronouns Ratio (MPR), which compares the occurrence of male pronouns like him and him with the total occurrences of male and female pronouns.

From 1950 to today, the MPR for Bollywood and Hollywood films ranged from around 60 to 65 MPR.

Examining words associated with dowry over the years, researchers found words such as loan, debt, and jewelry in 1950s Bollywood films, which suggested conformity.

In the 1970s, other terms, such as consent and responsibility, began to appear. Finally, in the 2000s, the words most closely associated with dowry, including trouble, divorce, and refusal, indicate non-compliance or its consequences.

All of these things we somehow knew, said KhudaBukhsh, a scientist from the LTI project, but now we have numbers to quantify them. And we can also see the progress made over the past 70 years, as those biases have been reduced.

The results were presented at the Association for the Advancement of Artificial Intelligence virtual conference earlier this month.