Connect with us

Uncategorized

Multiclass earthquake detection based on velocity and displacement data filtering using machine learning algorithms

Multiclass earthquake detection based on velocity and displacement data filtering using machine learning algorithms

 



Study area and data collection

The region of this study is Indonesia, focusing on the island of Java. Java is the fourth largest island in Indonesia in terms of population density. It is part of the complex convergence zone between the Eurasian plate and the Indo-Australian plate. As a result, the Java region has experienced many seismic and volcanic activities. Between 2006 and 2020, earthquakes and other geographic hazards on the volcano-dotted island of Java killed about 7,000 people, injured, displaced or left homeless 1.8 million people.

Seismic wave acceleration data were collected from the ESM database at 3 different stations, namely CISI, SMRI and UGM located on the island of Java as can be seen in Figure 1.

Figure 1

© 2021 TerraMetrics, Map data © 2022 Google].

Map of java island with stations location. (a). CISI, (b). SMRI, (c). UGM [Imagery

These stations record earthquake events that occurred around Java Island in the past 2006–2009. There are 33 records from CISI, 8 records from SMRI, and 17 records from UGM which make a total of 58 earthquake events that occurred and were recorded around Java Island by those stations. These records contain 3 different channels which are HLE, HLN, and HLZ with acceleration seismic wave information for each channel as shown in Fig. 2. The acceleration seismic wave will be integrated to get velocity and displacement seismic waves which can be used as features to improve models’ performances. Acceleration, velocity, and displacement relation can be described using the math equation19:

Acceleration:

$$Acceleration = a\left( t \right)$$

(1)

Velocity:

$$Velocity \left( {v\left( t \right)} \right) = v_{0} + \mathop \smallint \limits_{0}^{t} a dt$$

(2)

Displacement:

$$Displacement \left( {r\left( t \right)} \right) = r_{0} + \mathop \smallint \limits_{0}^{t} v dt$$

(3)

where, \({v}_{0}\) is the initial value of the velocity and \({r}_{0}\) is the initial position when \(t=t-{t}_{0}\). The integration result of the acceleration seismic wave can be seen in Fig. 3.

Figure 2

Unprocessed dataset for each channel for 1 event.

Figure 3Dataset processing

Data from ESM Database is in the form of an ASCII file containing detailed event information as well as the acceleration seismic wave data for the event. All the data go through the FFT process to get the frequency domain of the seismic wave and then the frequency is used for the filtering process using a Butterworth Bandpass Filter with the order of filter = 2, minimum frequency = 0.1 Hz, and maximum frequency = 30 Hz to reduce the noises. Figure 4 shows the result of the data filtering process in Fig. 2.

Figure 4

Processed dataset for each channel for 1 event.

After the filtering process, the data sampling process will be done. In the data sampling, the acceleration seismic wave data will be split into earthquake and non-earthquake data. The earthquake data and non-earthquake data contain 200 data for 1 seismic event each (equivalent to 1 s because the sampling frequency = 0.005 s) where the earthquake data samples are taken starting from the beginning of P-wave, and the non-earthquake data samples are taken starting from the beginning of the wave until the P-wave arrival. There are a total of 58 seismic events, so each earthquake and non-earthquake dataset will have 3 columns (HLE, HLZ, and HLZ) with 11,600 rows of data for each column [11600, 3]. Then, the 3 columns will be merged into 1 [11600, 1] Using the resulting formulas so only amplitude acceleration will be used as an advantage. The resulting amplitude result of total seismic events can be seen in Fig. 5 for both seismic and non-seismic. Then, label information will be added to the datasets, 0 represents non-seismic data and 1 represents earthquake data.

Figure 5

For sabotage vibration, two groups of sabotage data were recorded by the acceleration sensor. The two subversion datasets will be treated the same as the seismic and non-earthquake datasets. The first subversion datasets contain 11600 data and are labeled as 2 which are captured by shaking the table while the sensor is on top of it (component earthquake). The other data sets contain 750 data labeled 3 which the sensor picks up when a heavy vehicle passes by. There are 4 datasets with a total of 35,550 data. The statistical analysis for each dataset is presented in Table 1. Finally, integration formulas will be applied to all amplitude-acceleration datasets to obtain velocity and displacement amplitudes that can be used as additional features.

Table 1: Statistical analysis of the data sets. Model selection

Several supervised machine learning algorithms used in this study are Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Artificial Neural Network (ANN). SVM is a supervised learning algorithm that can be used to find patterns from a complex data set. SVM is a powerful and versatile machine learning model that is capable of performing linear and nonlinear detection, regression, classification, and external detection. When SVM theory was introduced by Vapnik and Cortes in 1995, SVM was designed to classify two groups (binary classification). The idea behind SVM was previously implemented for the constrained case where training data can be decoupled without errors. In practice, SVM has been implemented for pattern and number recognition. This experience shows that SVM can compete with other classification methods such as decision trees and neural networks. However, the binary SVM approach can be extended to multi-layered scenarios. This will be achieved by decomposing the multi-layer problem into a series of binary analyses. This can be addressed with a binary SVM following either one-on-one or one-on-all strategies.

in SVM,

Input: \({x}_{i}\in {\mathbb{R}}^{D}\), with D = feature dimension,

Output: \(w\)(weights), one for each feature, produced by their linear combination y (the final output of the SVM model is a decision from the input data).

$$ y = w ^{T} x_{i} + b $$

(4)

with b being the bias.

To maximize the margin, the distance from the data points should be minimized to the hyperbola. When the super level cannot completely separate the two classes, it is necessary to add a slack variable (\({\xi }_{i}\)) and a superparameter C. The function of the superparameter is to regulate the use of slack variables, if C is too small, it may be The form is not suitable, and if C is too large, the form can be appropriate.

$$\mathop{{\text{min}}}\limits_{{w,b}}\frac{1}{2}\left \ | w \ right \ | ^{2}+C\sum \limits _{{i=1}}^{m}{\xi _{i}}$$

(5)

When the input data cannot be linearly separated, the data must be mapped to a higher dimensional space. If the new dimension is too big, it will take a long time to map it. Kernel Tricks can solve this problem, it works by adding features ostensibly. In this research Kernel Gaussian RBF (Radial Basis Functions) will be used.

$$K \left({x,l} \right) = \exp \left({\left.{-\gamma } \right\|x – \left.l \right\|^{2}} \right $$

(6)

where: x = feature vector.

l = landmark

$$\gamma = \frac{1}{{2\sigma ^{2}}}$$

DT algorithm can be used for classification and regression. This algorithm can also be used for data with multiple outputs. It classifies the data by forming a tree. Starting from the root node to the terminal node. In each node, there is information about the features used as a condition to determine the direction of the data flow, the Gini impurity, the number of samples that reach the node, the class prediction value, and the class of data in that node.

When defining a branch on DT, information about genetic inclusions is needed for the data. Gini impurities rate a score in the range between 0 and 1, where 0 is when all observations belong to one category, and 1 is the random distribution of items within the categories. And the trait with the least defect will be selected as the next branch 22. In this case, the lower the genetic impurity, the better the split and the less likely the classification error. Equation 7 is the Jenny impurity equation with \({p}_{i}\) being the class probability k at node i, and n being the number of classes.

$${\text{Gini Impurity}}\left({G_{i}} \right)=1 – \mathop \sum \limits_{k=1}^{n} p_{i,k}^{2} $$

(7)

RF is a group learning technique that consists of assembling a large number of decision trees. This technique uses the voting method to determine the rating results. The rating of each DT will be used to determine the final rating. RF uses row and column samples of the data for each tree. This way each tree is trained using different data. This algorithm can reduce the variance without increasing the bias. In addition, the accuracy of this model can be improved by increasing the CART (ntree) model set.

An artificial neural network (ANN) is an information processing system that has specific performance characteristics in common with biological neural networks. Artificial neural networks are used as statistical models in predicting complex systems in engineering. Its massive parallel structure—with a large number of simply connected processing units called, neurons—allows ANN to be used for complex, linear and nonlinear input-output mappings.

The most common ANN training method is the backpropagation algorithm. To reduce errors, this adjusts the weights between neurons. This model is very effective in identifying patterns. The system can exhibit slow convergence and risk local optimization, but it can quickly adapt to new data values. The big challenge is figuring out how many layers there are, how many neurons are in the hidden layer, and how those neurons are connected. The performance of an artificial neural network is highly dependent on these factors and issues. Any of these items could significantly alter the results. For different issues, different ANN architectures will produce different solutions 27.

All models will be used to classify seismic, non-seismic and disruptive vibrations by training and testing data with a ratio of 70:30. Then, the performance of the models will be determined by confusion matrix analysis as one of the common methods used for classification. Table 2 shows the structure of the confusion matrix. From the confusion matrix some information can be retrieved like 28,29:

Sources

1/ https://Google.com/

2/ https://www.nature.com/articles/s41598-022-25098-1

The mention sources can contact us to remove/changing this article

What Are The Main Benefits Of Comparing Car Insurance Quotes Online

LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: [email protected]: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos

ExBUlletin

to request, modification Contact us at Here or [email protected]