Connect with us

Tech

No you are not alone. Google is making this big mistake with AI too

Published

on

 


Just last month, an article was shared stating that over 30% of the data Google uses in one of its shared machine learning models was mislabeled with the wrong data. Not only was the model itself error-prone, but the actual training data used by the model itself was also error-prone. If your computer is full of uncorrectable human errors, I doubt anyone using our model can trust the results. A 2021 MIT study found that nearly 6% of the images in his industry-standard ImageNet database were mislabeled. Computer vision, natural language, and speech datasets. If the data used to train these models is so bad, how can these models be trusted or used?

Percentage of time allocated to ML project tasks

cognition

The answer is that you can’t trust the data or the model. As AI progresses, garbage in is definitely garbage out, and AI projects are plagued with significant bad data garbage. If Google, ImageNet, and others are making this mistake, you are definitely making this mistake too. Cognilytica research shows that more than 80% of AI project time is spent managing data, from data collection and aggregation to cleaning and labeling. No matter how much time you put into it, mistakes are bound to happen. That’s if the data quality is good to begin with. Bad data equals bad results. This has been true for all kinds of data-oriented projects for decades, but it’s now a critical issue even for AI projects, which are basically just big data projects.

Data quality is not just bad data

Data is the heart of AI. AI and ML projects are driven not by program code, but by the data they learn from. All too often, organizations move their AI projects too quickly, only to discover later that poor data quality is the cause of their AI system’s failure. Don’t be surprised if your AI project is having problems if the data quality isn’t great.

Data quality includes more than just poor data, such as incorrect data labels, missing or incorrect data points, noisy data, and poor image quality. A major data quality issue also occurs when retrieving or merging datasets. It also happens when we capture data and augment it with third-party data sets. Each of these actions and others introduce many potential sources of data quality issues.

Of course, how do you know the quality of your data before you start your AI project? It is important not to let the project go forward. Teams need to understand data sources such as streaming data, customer data, and third-party data, and how to properly merge and combine data from these various sources. Unfortunately, most data never end up in a good, usable state. You need to remove superfluous, incomplete, duplicate or unusable data. Also, this data should be filtered to minimize bias.

But we’re not done yet. You also need to think about how to transform your data to meet your specific requirements. What do you do to implement data cleansing, data transformation, and data manipulation? Not all data is created equal, and data decay and data drift occur over time.

Have you thought about how you will monitor this data and evaluate this data to ensure that the quality remains at the required level? How are you getting the ? There are also data augmentation steps to consider. If additional data augmentation is required, how do you monitor it? Yes, there are many steps involved in quality data and these are all aspects that need to be considered for a successful project is.

Data labeling in particular is a common area where many teams get stuck. For supervised learning approaches to work, we need to provide good, clean, well-labeled data so that we can learn from our examples. If you are trying to identify images of boats in the ocean, you need to feed the system clean, properly labeled images of boats to train the model. That way, if you feed it an image you’ve never seen before, you’ll have some certainty about whether that image contains a boat. If you just train your system with a boat at sea on a cloudless sunny day, how would you expect your AI system to react when it sees a boat at night or with 50% cloud cover? ? Problems arise when test data does not match real-world data and real-world scenarios.

Even if the team spends a lot of time making sure the test data is perfect, the quality of the training data is often not reflective of the real data. As a public example, AI industry leader Andrew Ng, on his project with Stanford Health, said the quality of the data in his test environment did not match the quality of real-world medical images. He talked about his belief that AI models are useless outside of test environments. This basically brought the entire project to a standstill and failed, putting millions of dollars and years of investment in jeopardy.

Planning for a successful project

All this data quality-centric activity can seem overwhelming. As such, these steps are often skipped. But of course, as mentioned earlier, bad data ruins AI projects. Therefore, inattention to these steps is a major cause of failure in AI projects as a whole. This is why organizations are increasingly adopting his CRISP-DM, Agile, CPMAI and other best practice approaches to avoid missing or skipping critical data quality steps that help avoid AI project failures. The reason is.

The problem of teams moving forward without planning for project success is all too common. In fact, his second and his third phases of both CRISP-DM methodology and CPMAI are data understanding and data preparation. These steps precede the first step of model building and are considered best practices for successful AI organizations.

In fact, if Stanford’s medical project had adopted CPMAI or a similar approach, they would have realized before the millions and years had passed that data quality issues would ruin the project. prize. While it may be comforting to know that celebrities like Andrew Ng and even companies like Google make serious data quality mistakes, it still unnecessarily puts them in the club. You don’t want data quality issues to plague your AI projects.

Sources

1/ https://Google.com/

2/ https://www.forbes.com/sites/cognitiveworld/2022/08/06/no-youre-not-alone-google-is-also-making-this-big-mistake-on-ai/

The mention sources can contact us to remove/changing this article

What Are The Main Benefits Of Comparing Car Insurance Quotes Online

LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: [email protected]: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos

ExBUlletin

to request, modification Contact us at Here or [email protected]