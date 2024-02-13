Attendees

The thirty professional table tennis players (17 men and 13 women) who voluntarily participated in the data collection were all students of China Table Tennis College of Shanghai University of Sport, and they were all national division I athletes, national division II athletes , or national master athletes who were right-handed. At least three months before the experiment, all participants had no lower limb injuries or deformities. In addition, they were required to refrain from consuming coffee or food with related additives, such as caffeine, before 6 hours after the experiment and signed a written informed consent. The experiment was approved by the Ethics Committee of Shanghai University of Sport (ethics approval reference: 102772019RT030). All methods were performed in accordance with ethical approval.

Design

The experiment was conducted on a standard size table. The table tennis ball was sent to the participants by a ball projection machine (Y&T V-989H, Zhongshan, China) on the other side of the table. The serve frequency was set at 30 Hz and the ball speed was 6.5 m/s. All subjects used the same paddle to complete the task during the experiment. All movement was captured by both the full HD camera and the MOCAP system. The full HD camera is located behind the ball projection machine. The camera is located 2.30 m above the ground and the sampling rate is 25 Hz, which captures the participant's frontal shot. The MOCAP system (CMTractor 2.0; Shanghai Qingtong Vision Co., Ltd., China) consists of 20 infrared cameras mounted on the laboratory ceiling with a recording frequency of 120 Hz. A camera array was used to track the 3D positions of 38 reflective markers on the participants' bodies.

Each participant had to perform six skill movements. The experiment was divided into two phases: 1) the participant's pre-experiment adaptation phase and 2) the motion capture phase. Before the test, participants were told that the table tennis ball would be launched from the ball projection machine on the opposite side and that they had to hit the ball onto the court. After the adaptation phase (5 serves), participants complete the 55-shot data collection without further instructions.

The 14-joint skeletal diagram is similar to S. Litvak's 15-joint human body model24. The only difference is that the former does not have the shoulder center, which is between the left shoulder and the right shoulder, compared to the latter. The shoulder center can be calculated. We collect fourteen joint time series data and the centroid of the table tennis bat to construct the TTMD6 dataset. The 14 compounds used by TTMD6 are shown in Figure 1.

Figure 1 Skeleton diagram of 14 joints.

Data processing

The complete table tennis movement involves four phases: backswing, stroke, follow-through and recovery. The motion capture system recorded 55 shots of the participants at a time, and the 55 shots were saved to a file. Each file stores almost 20,000 frames of data and requires a single motion to be extracted. First, the motion capture system's analysis software was used to manually mark the starting frame of each movement, and then the motion extraction tool written in C# was used to extract a single movement. The motion visualization tool written in MATLAB is an alternative tool to the analysis software of the motion capture system. This tool can display movements in the form of skeleton diagrams.

Athletes recognize table tennis movements

Table tennis is one of the most popular paddle sports in the world; However, with more than 100 years of development, many different types of skill moves, such as attack, drive, long push, short push, loop, push-and-block, drop shot, off-table chop, smash and lift, have been developed25. Each of these pongs hits a different position on the incoming ping pong. Accurate recognition of these movement patterns is of great importance for the development of table tennis robots, the daily training of athletes and somatosensory games.

Acquiring table tennis skills depends on the guidance of coaches. Experienced coaches can determine the type of movement based on the type of exercise performed by the athlete. Athlete Recognition was introduced to test the recognition accuracy of professional athletes. Our research goal was to identify an algorithm that surpasses the recognition accuracy of traditional methods to help athletes prepare for the next step of building a complementary training system.

Unlike other gesture recognition methods, table tennis players hit with a paddle. The range of motion of a paddle is the embodiment of the coordination of all joints, and an athlete can control the incoming ping pong through the movement of the paddle. The paddle was located on the extension line of the distal end of the upper limb. Compared with the shoulder, elbow and wrist, the paddle has a greater range of motion and faster movement speed, which can better reflect the movement characteristics of table tennis. The paddling trajectories of the six skill movements are shown in Figure 2. Experienced coaches and athletes can predict the type of movement, the trajectory of the ping pong ball, and the landing point based on the movements of the opponent's body and the trajectory of the paddle. . The purpose of this experiment was to demonstrate the feasibility of motion recognition via a paddling trajectory.

Figure 2 Bat trajectory of table tennis movement (a) forehand attack (b) front wheel drive (c) push forehand (D) backhand attack (e) rear wheel drive (F) backhand push.

We selected 30 movements (5 for each skill movement) and used MATLAB software (version 2018A; MathWorks, Inc., USA) to generate animations, and the interval between movements was 2 seconds. Q, W, E, I, O and P on the keyboard correspond to forehand attack, forehand drive, forehand push, backhand attack, backhand drive and backhand push respectively. After recognizing the movement, the athletes pressed the key on the keyboard as quickly as possible and entered the recognition result. The program records the recognition result of the athlete's input and the time it takes to recognize the movement (the first frame of the movement appears until the participant presses the key). We recruited 40 professional table tennis athletes from Shanghai University of Sport to participate in this study. Figure 3 shows the athlete's recognition of the paddling path on the computer.

figure 3 The athlete recognizes the bat trajectory on PC.

Convolutional neural networks with varying length

The basic structure of CNN consists of an input layer, a convolutional layer, a pooling layer, a fully connected layer and an output layer. In general, multiple convolutional layers and pooling layers are provided, convolutional layers and pooling layers are placed alternately, and then a convolutional layer is connected after the pooling layer.26.

The convolutional layer of CNNs does not require fixed-size inputs, but the fully connected layer does. Therefore, the limitation that the input sizes of CNNs must be consistent is due to the fully connected layer used17. The input image needs to be cropped or scaled in terms of image recognition to obtain a fixed size input; However, such a transformation will destroy the aspect ratio and the entire information of the image, thus affecting the recognition accuracy. He et al.17 proposed an SPP-net model, which adds a spatial pyramid pooling (SPP) layer between the last convolutional layer and the first fully connected CNN layer. The SPP layer can ensure that different sizes of CNN input produce the same output size, which breaks the previous limit of the fixed CNN model input, and the improved CNN model has a faster training speed.

The duration of the table tennis movement varies depending on the athlete's ability to perform the movement and the purpose of the task. When training the model, DTW10.27 and shapes9.28 are used to shorten and adjust the time series to a fixed length, but such a transformation will destroy the entire information of the time series. To solve these problems, we design a time series transformation layer in the input layer and the convolution layer, and integrate the transformation layer and CNN into a framework, as shown in Figure 4.

Figure 4 On architecture for 3D time series classification.

Interpolation is a common method for data preprocessing. The most commonly used interpolation methods include nearest neighbor interpolation, linear interpolation, and spline interpolation. Spline interpolation is a commonly used interpolation method for obtaining smooth curves. The cubic spline method is one of the most commonly used interpolation methods. The basic idea of ​​cubic spline interpolation is to divide n intervals within [a, b] and perform cubic spline interpolation adjustment for each interval to generate a smoother curve.

In the time series transformation, l represents the actual length of the time series (the file name contains the actual length information) and k represents the length of the transformed time series. If we are dealing with a time series, we can divide it into three cases: k>l, k=l and k

When l>k, we downsample the raw data to a fixed length. Assuming that the time series T={t 1 ,T N }, the downsampling ratio is l/k, the ith element of the transformed time series, ti=x[i*l/k ]0<=i indication of the operation of the ceiling.

When l=k there is no change in the time series.

When l29 to interpolate the time series. Formula 1 is a mathematical formula that describes cubic spline interpolation. In this article, the cubic spline interpolation is implemented using Python's SciPy toolkit.

$$\left\{\begin{array}{c}S\left(x\right)\epsilon {C}^{2}[a,b]\\ {S}_{({x}_{i})}={y}_{i}\\ S\left(x\right)={a}_{3}{x}^{3} +{a}_{2}{x}^{2}+{a}_{1}x+{a}_{0}\end{array}\right.$$ (1)

In the extreme case, if k is less than the length of the shortest time series, all time series are downsampled only. The time series length of a group of movements has a range. For example, a competitor needs a maximum of 212 frames and a minimum of 169 frames to complete a forehand attack. When k<169 we do not need to consider the first two cases, but only the third case. However, if k is too small, it will affect the recognition accuracy. When all time series have the same length, the neural network can be trained.