top of page

A comparison of machine learning methods for predicting final vehicle destinations

Actualizado: 27 oct 2021


ree

1. Introduction

This paper revisits the Taxi Trip Prediction problem which was hosted by Kaggle.com as an online challenge [1]. The original objective of the challenge was to build a predictive framework that predicts the final destination and the total travel duration of a taxi ride in Porto, Portugal. In contrast, the project in this report aims to investigate the problem of predicting the region of a vehicle ride destination. All vehicle ride samples collected in the project are from the same urban area. The area is partitioned into a fixed number of areas, where each area is labeled as a class. If the destination of a ride falls in one of these predefined areas, the ride is labeled with the class assigned to that region. In this way, the destination prediction problem is formulated as a classification problem. Moreover, a ride is primarily represented by a time sequence of locations measured by GPS sensor. Given an initial portion of the time sequence as input, it is intended to develop a classifier that predicts the destination as one of the predefined classes. This proposed project has the potential utility of analyzing the traffic flow in a road network. In particular, this type of predictive model could be used to forecast traffic jams, as well as avoid or mitigate upcoming traffic congestions.

  1. Related work

A taxi fleet of reasonable size has a significant impact on the quality of life, traffic congestion, better management of infrastructure and traffic flow [2]. Hence, from a strategic view point, it becomes important to not only understand the density of traffic in a region at any given time, but also the duration and the final destination of the taxi. Additionally, proper taxi fleet size management strategy also balances operational efficiency, mitigate supply-demand mismatch and increases transportation satisfaction of residents [3].

Predictive frameworks help optimize the efficiency of their electronic dispatch system. Though there are other factors which contribute to an ill-optimized dispatch system [4], in this paper, we focus on solving the final regional destination prediction problem. Though it would be ideal to know the exact final destination of a taxi ride given the G.P.S. coordinates, previous works [5,6] have not been successful in developing a predictive framework which utilizes only G.P.S. coordinates without the aid of prior cluster/region information nor metadata corresponding to location, caller type, time .etc [7]. However, such metadata may or may not be available depending on the taxi company and privacy laws. After 2015, the number of riders using rideshare apps to taking a taxi has equalled [8]. In addition to connecting drivers with ride requestors, if the ride share apps are able to predict the final destination regions of ride requesters who take a particular trajectory which is either repetitive because of daily routines or commonly visited places, the apps can communicate with its drivers to better recommend places to earn quickly and shortly or to stick to rides in a closed region to avoid longer trips [8,9].

Previous works [5,6] have explored this problem using memory networks, sequential networks and classic MultiLayer Perceptrons (MLPs) to predict the final destination of a taxi ride given a portion of the trajectory either from varying time stamps or sequentially occuring coordinates along the trajectory, metadata and prior clustering information. In this work, the authors propose several models which aim to predict the regions of the city being the final destination (which is given as prior information to previous work) given a portion of the trajectory. This regional prediction problem could then be extended further to predict the final destination coordinates by training a within region/cluster regressor. The prior information provided to the networks in the previous works is an important piece of information which helps in the coordinate prediction problem by performing an interpolation within the convex hull of the cluster regions. In this manner, the final destination prediction problem can be reduced into two simpler sub-problems without requiring any additional input data. However, in this paper, the authors only explore the region prediction problem

  1. Methodology

Over 7000 taxi trajectories are used to create the training data. Each trajectory consists of a series of GPS longitude and latitude coordinates. A subset of location measurements in the series are collected to create the input features for the machine learning models. The target labels are created using a k-means clustering algorithm.

Three machine learning models are tested and compared: Multi-Layer Perceptron (MLP), Support Vector Machine (SVM) and Recurrent Neural Network (RNN) that has a sequence-to-sequence structure with attention mechanism.

  1. Data preprocessing for SVN

The data obtained from the Kaggle competition, was a CSV (comma separated values) file containing seven columns with metadata such as:


  • Trip ID.

  • Indicator of how the service was requested.

  • Identifier for each phone number used to request a service.

  • Indicator of which taxi stand responded.

  • Identifier for each taxi.

  • Timestamp for each taxi ride.

  • Day type to differentiate holidays, weekends, and workdays.

  • Indicator for missing data in the GPS data stream.

  • List of GPS coordinates for each taxi ride.


However, to train the classifier models, only the final column is considered, which contains a series of longitude and latitude coordinates in the form of a string. Therefore, the first step for data preprocessing the data is to convert this string into a numpy array.


Because anomalies were detected within various trajectories, data smoothing technique was required. An example of an anomaly is shown in Figure 1, where one of the data points shows a spike assumed to be a localization error during the data collection from the GPS.


Figure 1 : Graph of a series of longitudes in one trajectory with two consecutive anomalies.


To remove this noise in the data, the longitude and latitude were analyzed separately for each trajectory. The mean and standard deviation were calculated for each trajectory’s latitudes. Any point in a trajectory’s longitude outside of three (3) times its standard deviation would be removed along with its corresponding latitude. Figure 2 depicts the same series of longitudes as Figure 1 after the smoothing has been completed. As seen the anomaly was detected and removed from the trajectory.

Figure 2 : Graph of a series of longitudes in a trajectory after anomalies are detected and removed.


The same process was applied with respect to the latitudes, however the upper and lower limits were defined by 1.5 times the standard deviation. Given there was a larger variation in the latitudes it was necessary to reduce the limits in order to detect all anomalies.



Figure 3 : Graphs of a series of latitudes in a trajectory before (left) and after (right) anomalies are detected and removed.


Additionally, given the data models requirement of a minimum of 20 coordinates, trajectories with a smaller quantity of coordinates were removed. Finally the coordinates of each trajectory were standardized by obtaining the absolute value of the latitude and longitude and mapping the coordinates to values in a range from 0 to 1.


  1. Featurization (MLP, SVM):

From the standardized trajectories, for each trajectory, we get the first 5 points and the last 5 points until the last before 3 as features. We partition all the trajectories taken by the taxis in Porto into multiple regions using a k-means clustering with k being 15. While working on the project we found that having larger clusters (small k value) helped improve the classification accuracy during training.


Figure 4: Featurization for SVM and MLP models.

  1. Data preprocessing for RNN :

From the trajectory dataset, sample a trajectory whose length is N (where N>=15). Divide trajectory into two parts. The first part, the input consists of all x-y coordinate values except the values at the last 5-time steps. The other part, named labels, consists of the coordinate values at the last 5-time steps.


Figure 5: Input points (red) and target (purple) from the trajectory.

  1. Featurization (RNN):

Train a k-mean model to partition the map into 30 regions. Use the trained k-mean model to label every x-y coordinates in input and labels, such that the x-y coordinates in traj_initial and traj_final are transformed into integer values ranging from 1 to 30. Perform such transformation to all 7120 trajectories in the dataset.


Figure 6: Clusters and trajectories in Porto

  1. Labeling

The goal of trajectory prediction: Given an initial part of the 2D trajectory, predict in which region of the city the destination of the trajectory is. The label of trajectory data: Categorical values which represent the regions of the city

To create such labels, we need to partition the city into multiple regions. K-mean algorithm is chosen to perform partitioning, where input is defined as the data points of trajectory destinations.


Figure 7: K-means clustering

  1. Models

The trajectory data is sequential in nature and so are its features. Though a number of time-series, sequential data prediction algorithms and models exist, time series models are not suited to handle the sequential trajectory data. Unlike time-series data where the samples move from left to right as a function of time, the GPS coordinates do not necessarily follow the same order and at times could be random. Trajectory coordinates vary in 2 dimensions and since they represent road networks, they do not follow an order. Hence, we try to capture the dynamics of the trajectory by treating the features as being, 1. Independent of time, therefore using Support Vector Machines and Multilayer Perceptron models, and 2. Dependent on time, therefore training an RNN model.

  1. Multi-class Support Vector Machines:

Support Vector Machines (SVM) are classically two-class classifiers. However, in this paper, the authors are solving a multi-class classification problem. To this end, an extension of inherent binary classifiers for multi-class classification is achieved via i) One versus All approach or ii) One versus One approach. In this paper, we use the One versus One approach training a total of C(C-1)/2 classifiers, one for each different pair of classes. Though only C classifiers need to be trained via the other approach, the approach chosen is less prone to imbalance in dataset (dominance of particular classes) as observed in Figure 7.


The classes/regions from which taxis depart from and arrive in is primarily located close to the center of the city, encapsulating a majority of the classes and trajectory paths. Additionally, as the value of k changes, the number of classifiers which need to be trained decreases which leads to each classifier being trained with a higher subset of training samples and hence a higher classification accuracy. This trade-off between number of regions and the density of trajectory points in a region leads to the class imbalance problem which is better handled via the one versus one approach. An ensemble consisting of C(C-1)/2 classifiers will discriminate every pair of classes given a data sample, assigning C(C-1)/2 labels per data sample. Finally, majority voting is used to decide the most probable class to which the data sample belongs.


Following the featurization step, we initialize a soft-margin, multi-class SVM with a linear kernel. The dataset is split into train and test set with 80% of the data used as training samples. Using the fitted SVM model, we then try to predict the final destination given a trajectory sample from the test set. During training, multiple kernels such as the polynomial and radial basis function (RBF) kernels were tested to identify the most suitable kernel which is able to separate the classes, and the linear SVM achieved the highest accuracy. The final testing accuracy of the SVM on the test set is 57%.


  1. Multilayer Perceptron:



Figure 8: MLP Model


Multilayer Perceptron (MLP) is a classical type of Artificial Neural Network (ANN) comprising of one or more layers of neurons. Such an ANN is suitable for multi-classification prediction problems where inputs are assigned to a class and trained in a supervised or semi-supervised manner. In this paper, the authors construct and train an MLP in a supervised manner. While the input to the network is a flattened vector with the elements being the standardized latitude and longitude coordinates of part of the trajectory, the output of the network gives a class prediction score for each input using a softmax activation function in the final hidden layer. Softmax function outputs a vector that represents the probability distributions of a list of potential possibilities.


The objective of the MLP is to minimize the cross-entropy loss, hence maximizing the likelihood of observing the input data given the class label. Our network hyper-parameters and architecture is as described: batch size = 25, epochs = 25, learning rate = 0.01, number of hidden layers = 3, input layer = 24 neurons, hidden layer 1 = 22 neurons, hidden layer 2 = 20 neurons and hidden layer 3 = 15 neurons. The final testing accuracy of MLP on the test set is 49%.

  1. Recurrent Neural Networks Model

The RNN model takes the region labels of the points{p1, p2,..., pt+5}as input, and predicts the labels of the last five points {pL-4, pL-3,..., pL}where L is the total length of the trajectory. Specifically, let piR2 denote a point on a trajectory, we use a pre-trained k-mean model to assign a label ci{1, 2,..., 30} to pi. The RNN model is designed as a sequence-to-sequence RNN model with attention mechanism [10], as shown in Fig. 9. This model consists of an encoder network and a decoder network. The encoder network is a Gated Recurrent Unit [11] (GRU) RNN that reads a sequence of labels {c1, c2,..., c10} and generates a sequence of outputs {h1, h2,..., h10}. The decoder takes {h1, h2,..., h10} as input, and computes a prediction {y1, y2,..., y5} where yi (i{1, 2,..., 5}) is a probability mass function that represents the predicted region label of the ith point in {pL-4, pL-3,..., pL}. To implement the attention mechanism, the decoder uses its own hidden layers {h1, h2,..., h5} to compute a sequence of attention weights, then uses the weights to compute a weighted combination of the encoder’s output {h1, h2,..., h10}. The weighted combination is passed to a fully-connected layer with ReLU activation function before being sent to the GRU blocks and log-softmax function for label prediction.


Figure 9: RNN Model with attention


The motivation for choosing the sequence-to-sequence RNN model is that the sampled points from a taxi trajectory can be considered as a time-sequence, and the RNN model helps to impose such sequential pattern in featurization and prediction. The attention mechanism is included in the network design because we assume that different parts of the initial trajectory have a different level of influence on the final trajectory {pL-4, pL-3,..., pL}, and the attention mechanism enables the decoder to learn such difference in neural network training.


Since the neural network is designed to predict the categorical labels of the final trajectory {pL-4, pL-3,..., pL, negative log-likelihood is chosen as the loss function for network training. Stochastic gradient descent algorithm is used as the optimization solver. The learning rate is set as 0.01. The batch size is 1. After training for 10 epochs, the prediction accuracy of the model is evaluated to be 82%.

  1. Discussion


Amongst the models investigated, GRU-RNN yields a significantly higher classification accuracy compared with SVM and MLP. The authors assume that such a difference is due to the sequential nature of the dataset. In general, recurrent network models follow Markovian paradigm which assumes that the next output is dependent on the previous inputs and hence capture useful information from sequential data, as opposed to SVM and MLP with assumes that the features are independent of one another.


In addition, the attention mechanism in the RNN model also helps to extract the features of the trajectories, as it assigns particular weights to important features from a particular trajectory in predicting the destination.

  1. Conclusion

After implementing the three prediction models, RNN is found to yield the highest accuracy in predicting the destination of a taxi. This proposed project has the potential utility in analyzing the traffic flow in a road network. It could help to avoid or mitigate upcoming traffic jams.

References

  1. A. Alpher, , and J. P. N. Fotheringham-Smythe. Frobnication revisited. Journal of Foo, 13(1):234–778, 2003.

  2. Fu, X., Sun, M. P., & Sun, H. (2017). Taxi Commute Recognition and Temporal-spatial Characteristics Analysis Based on GPS Data. China J. Highw. Transp, 30, 134-143.

  3. Yang, Y., He, Z., Song, Z., Fu, X., & Wang, J. (2018). Investigation on structural and spatial characteristics of taxi trip trajectory network in Xi’an, China. Physica A: Statistical Mechanics and its Applications, 506, 755-766.

  4. Castro, P. S., Zhang, D., Chen, C., Li, S., & Pan, G. (2013). From taxi GPS traces to social and community dynamics: A survey. ACM Computing Surveys (CSUR), 46(2), 17.

  5. De Brébisson, A., Simon, É., Auvolat, A., Vincent, P., & Bengio, Y. (2015). Artificial neural networks applied to taxi destination prediction. arXiv preprint arXiv:1508.00021.

  6. Lam, H. T., Diaz-Aviles, E., Pascale, A., Gkoufas, Y., & Chen, B. (2015). (Blue) taxi destination and trip time prediction from partial trajectories. arXiv preprint arXiv:1509.05257.

  7. Challenge, D. (2015). On Learning from Taxi GPS Traces: ECML–PKDD.

  8. Scheiber, N. (2017). How Uber uses psychological tricks to push its drivers’ buttons. The New York Times, 2.

  9. Froehlich, J., & Krumm, J. (2008). Route prediction from trip observations (No. 2008-01-0201). SAE Technical Paper.

  10. Bahdanau, D., Cho, K. and Bengio, Y., 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

  11. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. and Bengio, Y., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.


Comentarios


bottom of page