Preprocessing Effects in Road Traffic Scene Analysis Using Machine Learning Models

Image-based real-time object detection is a key issue in integrated automatic traffic monitoring and control systems and is one of the main research topics in Intelligent Transportation Systems (ITS). Despite the growing interest in the application of image processing techniques for traffic scene analysis, vehicle detection algorithms are still unreliable in complex real-world backgrounds. Machine learning models are increasingly being applied to automatic video-based object detection and traffic scene analysis due to their excellent performance in real-time pattern recognition. However, the performance of machine learning models is highly dependent on the properties of the input vectors. Studies have shown that the preprocessing of raw image data obtained from digital video can improve the predictive performance of machine learning models. The purpose of this study is to investigate the predictive performance of machine learning models using preprocessing methods such as image size reduction and image filtering. The experiment was performed with the Backpropagation model, which is one of the most popular machine learning models, and the predictive performance was compared to see how it could be affected by different preprocessing techniques.


Introduction
Interest in the topic of Intelligent Transportation Systems (ITS) has increased over the past 30 years with the aim of improving safety, maximizing transport capacity, increasing productivity, improving operational efficiency, ensuring sustainable environment and saving energy.In addition, image-based real-time object detection is one of the main research topics of Intelligent Transportation Systems (ITS) as a core task of the integrated traffic control system.In addition, image-based real-time object detection is a core task of the integrated traffic control system and has become a major research field for Intelligent Transportation Systems (ITS).
For several decades, significant research advances have been made in the field of image processing for traffic scene analysis, such as vehicle counting, traffic congestion measurement, vehicle location tracking and incident detection, but the detection accuracy is still low in complex background images (Hoose and Willumsen, 1987;Siyal and Fathy, 1999;Betke et al., 2000;Pang et al., 2007;Kamijo and Fujimura, 2010;Song et al., 2019).In addition, the application of some learning models such as artificial neural networks (ANNs) and support vector machines (SVMs) has increased tremendously and has been successfully applied to traffic scene analysis due to their remarkable ability to achieve high precision.
In machine learning, the properties of the input vector in training data have a great impact on the reliability and predictive performance of the model.Therefore, it is important to choose a feature input vector to achieve high performance in model training.In addition, preprocessing techniques such as image size reduction and image filtering generate new input vector properties for learning and reduce the computer memory and computing time required for training.
The purpose of this study is to investigate the effect of preprocessing methods on the prediction accuracy of a learning machine.In this study, various preprocessing techniques were experimentally verified using a machine learning model of backpropagation, which is widely used as a learning algorithm among neural network models.

Machine Learning for Traffic Scene Analysis
Currently, the most widely used neural network model is the backpropagation (Rumelhart, et. al., 1986a) model, which is a basic model of deep learning models that are receiving increasing interest in recent years and is widely used in various problems.Some advanced backpropagation models, such as the backpropagation with momentum (Rumelhart, et. al., 1986b), Quickprop (Fahlman, 1988) and Backpropagation with Momentum and Prime-offset (Kim, 2002), have been proposed and widely used, mainly to solve some problems of standard backpropagation, such as computation time for training and the possibility of getting stuck in local minima during training.
In order to increase the speed of learning, Rumelhart et al. (1986b) proposed an enhanced backpropagation model, which is a backpropagation with momentum, by adding a momentum term to the delta rule.This additional term, called momentum, tends to maintain the same direction for the weight changes.The momentum parameter provides a type of momentum in weight space that effectively filters out high-frequency variations of the error-surface in the weight space.Fahlman (1988) has developed an algorithm called Quickprop to speed the learning of the Backpropagation model.Quickprop has a number of parameters such as Maximum growth factor, Weight decay, and Prime-offset that are more fine-tuned than the standard Backpropagation.More importantly, Kim (2002) showed that backpropagation using momentum and prime offset (BPMP) was superior to other backpropagation models.
The learning algorithm of the backpropagation can be summarized as follows (Kim, 2002;Kim, 2018;Kim, 2021): Step 1: Randomize initial weights and set up the input and output vectors Step 2: Calculate output value for each unit of the network by using In Equation (1), In are inputs to the k th unit in the layer l, w is the number of synaptic weights in the network,  k l is a bias term, Kl is the number of l layer units, p is a training pattern and ) ( f is an activation function.
Step 3: For the output layer, l = L, calculate the values of weight changes by using In Equation (2), pk y is the desired output.
Step 4: For the hidden layers, l = 1, ..., L-1, calculate the values of weight changes using (3) Step (5) Step 7: Repeat previous steps, until the average squared error computed over the entire training data set is at an acceptably small value.The error for the output units is calculated by

Preprocessing Methods
In the backpropagation neural network mode, data is frequently preprocessed in order to reduce the number of input units and to reduce network complexity and computing time.While preprocessing makes the network architecture simpler, it may lead to loss of some important information for pattern recognition, such as the geometric relationship of pixel gray values.Nonetheless, preprocessing has improved the predictive performance of neural network models (Kim, 2010).In particular, preprocessing may play a role in noise reduction by removing or smearing some noise in images.
In this study, two common types of preprocessing filters, the arithmetic mean filter and the median filter, were applied to mitigate the effects of data preprocessing on the prediction accuracy of a backpropagation model in machine learning.

Arithmetic mean filter
The arithmetic mean filter can be defined by follows (Gonzalez and Woods, 2008).Let   represent the set of coordinates in a rectangular subimage window (neighborhood) of size a × b, centered at point (x, y).The arithmetic mean filter computes the average value of the corrupted image f(x, y) in the area defined by   .
The value of the restored image ĥ at point (x, y) is simply the arithmetic mean computed using the pixels in the region defined by   .In other words, This operation can be implemented using a spatial filter of size a × b in which all coefficients have value 1/ab.A mean filter smooths local variations in an image, and noise is reduced as a result of blurring.

Median filter
Order-statistic filters are nonlinear spatial filters whose response is based on ordering (ranking) the pixels contained in the image area encompassed by the filter, and then replacing the value of the center pixel with the value determined by the ranking result.The median filter is the best order statistic filter and replaces the value of a pixel by the median of the intensity values in the neighborhood of that pixel (the original value of the pixel is included in the computation of the median).Median filters are popular because, for certain types of random noise, they provide excellent noise-reduction capabilities, with considerably less blurring than linear smoothing filters of similar size.Median filters are particularly effective in the presence of both bipolar and unipolar impulse noise (Gonzalez and Woods, 2008).The median filter can be defined mathematically as, ĥ(, ) = median The value of the pixel at (x, y) is included in the computation of the median.In order to perform median filtering at a point in an image, we first sort the values of the pixel in the neighborhood, determine their median, and assign that value to the corresponding pixel in the filtered image.For example, in a 3× 3 neighborhood the median is the 5th largest value, in a 5× 5 neighborhood it is the 13th largest value, and so on.

Image geometric transformation
Image geometric transformations modify spatial relationships between pixels in an image.Geometric transformation in digital image processing consists of two basic operations: spatial transformation of coordinates and intensity interpolation, which assigns intensity values to spatially transformed pixels (Gonzalez and Woods, 2008).The transformation of coordinates may be expressed as (, ) = (, ) , where (, ) are pixel coordinates in the original image and (, ) are the corresponding pixel coordinates in the transformed image.For example, the transformation (, ) = f(, ) =(x/2, y/2) shrinks the original image to half its size in both spatial directions.Image shrinking is achieved by row-column deletion (eg, to reduce an image by half, delete rows and columns one by one).
In machine learning models, image geometric transformation can serve to reduce the number of input units and remove some noise.In particular, it can improve learning performance by simplifying the network architecture and making training more efficient.In this study, we compare the performance of machine learning using the original image size and the reduced image and examine how the machine learning models can vary depending on preprocessing.

Experimental data sets
For the experiments in this study, all data sets are the same as those of the previous research (Kim, 2010).The task is to classify three different patterns in traffic scenes: 1 st pattern, 2 nd pattern, and 3 rd pattern.Figure 1 shows the preprocessed image data where the 1 st pattern corresponds to the front part of a vehicle, 2 nd pattern corresponds to the top of a vehicle, and 3 rd pattern is an image of the road with no vehicles.The number of data sets for training and test are 230 and 700, respectively.
As shown in Figure 1 , where () is the predicted value of output unit i.
A system is defined as unpredictable if its output occurs 1 in more than one pattern.For the experiments, the network topology are 450(input units)-225(hidden neurons)-3(output units) for 1 st preprocessing (Mean filter with 30 by 15 pixels) and 2 nd preprocessing (Mean filter and geometric transformation (15*8)) and 120(input units)-60(hidden neurons)-3(output units) for Median filter and geometric transformation (30*15) and Median filter and geometric transformation (15*8), respectively.
In the experiments of this study, the network topology utilized 450(input units)-225(hidden neurons)-3(output units) for the first-order preprocessing (mean filter and geometric transformation 30 x 15 pixels) and the second-order preprocessing (mean filter and geometric transformation 15*8 pixels), and the third-order preprocessing (median filter and geometric transformation 30*15 pixels), 4th order preprocessing (and median filter and geometric transformation 15*8 pixels) were applied 120(input units)-60(hidden neurons)-3(output units).The learning model used in this study is BPMP (Backpropagation with momentum and Prime-offset), and the hyperparameter variable values are learning rate of 0.1, momentum of 0.95, and prime offset of 0.1, and the sequential mode was used as the learning method.Ten trials were performed with different initial weights for each data to resolve the effect of initial weights in the neural network models.Table 1, Table2, Table3, Table 4 and Figure 2 show the general performance of the learning machine using the backpropagation model and the effect of preprocessing on image data.
Table 1 shows the prediction error rates over 10 experiments for the mean filter and geometric transformation with image size of 30×15.For pattern 1, the worst-case prediction error was 8.3% for pattern 1, 33.0% for pattern 2, and 0.0% for pattern 3. On the other hand, in the best case, the prediction error was 0.3% for pattern 1, 30.0% for pattern 2, and 22.0% for pattern 3.This means that as the prediction accuracy for pattern1 increases, the prediction accuracy for pattern3 decreases.Also, for pattern 2, the worst-case prediction error was 2.3% for pattern 1, 34.0% for pattern 2, and 0.0% for pattern 3.In the best case, the prediction error was 6.0% for pattern 1, 15.3% for pattern 2, and 2.0% for pattern 3. Also, for pattern 3, the worst-case prediction error was 0.3% for pattern 1, 30.0% for pattern 2, and 22.0% for pattern 3.In the best case, the prediction error was 3.7% for pattern 1, 24.3% for pattern 2, and 0.0% for pattern 3.In this type of preprocessing experiment, the overall performance of the network showed the best performance in terms of prediction rate and prediction accuracy, with patterns 1, 2, and 3 having a total error rate of 9.4% and unpredictability of 6.3%.Table 2 shows the prediction error rates over 10 experiments for the mean filter and geometric transformation with image size of 15×8.The overall performance of the network in this preprocessing experiment showed the best performance in terms of prediction rate and prediction accuracy, with patterns 1, 2, and 3 having a total error rate of 11.7% -the prediction error was 6.0% for pattern 1, 21.3% for pattern 2, and 0.0% for pattern 3 -and unpredictability of 1.6%.The results of this study show that image shrinking may provide somewhat better performance in terms of prediction accuracy and somewhat poor results in terms of predictive ability.
Table 3 shows the prediction error rates over 10 experiments for the median filter and geometric transformation with image size of 30×15.The overall performance of the network in this preprocessing experiment showed the best performance in terms of prediction rate and prediction accuracy, with patterns 1, 2, and 3 having a total error rate of 9.1% -the prediction error was 4.0% for pattern 1, 17.3% for pattern 2, and 0.0% for pattern 3 -and unpredictability of 11.6%.Table 4 shows the prediction error rates over 10 experiments for the median filter and geometric transformation with image size of 15×8.The overall performance of the network in this preprocessing experiment showed the best performance in terms of prediction rate and prediction accuracy, with patterns 1, 2, and 3 having a total error rate of 11.0% -the prediction error was 5.0% for pattern 1, 20.0% for pattern 2, and 2.0% for pattern 3 -and unpredictability of 4.4%.

Conclusion
Preprocessing plays an important role in learning machine models with complex input images.Some preprocessing methods, such as image size reduction and image filtering, can improve predictive performance in a machine learning with backpropagation learning algorithm.
In this study, four preprocessing methods were used to develop image processing technology for traffic scene analysis using a robust backpropagation machine learning model.According to the results of the study, there was a difference in the recognition rate for recognizing each pattern according to the preprocessing method.Therefore, it is necessary to develop various preprocessing techniques to improve the predictive performance of machine learning.In particular, in order to increase the accuracy of the prediction system, it is important to analyze the characteristics of the original data as well as the preprocessed data.This means that preprocessing makes the network simpler and reduces or eliminates image noise, but some important information about pattern recognition, such as geometric relationships of object and pixel grayscale, may be lost.
The experimental results of this study show that various preprocessing techniques have different prediction accuracy depending on the learning model, so the results of this study can be used to increase the prediction rate and reduce the prediction error rate in various machine learning models.

Figure 1 .
Figure 1.Preprocessed images of three patterns

Figure 2 .
Figure 2. Prediction error rate distribution for three patterns of preprocessed images