Skip to main content


In the Post-COVID era, e-commerce companies have experienced a notable increase in retail sales, rising from 5.6% to approximately 11%. This surge is largely attributed to the adoption of predictive analysis techniques. Leading retailers like Amazon have used these methods to significantly enhance customer satisfaction and boost sales. Machine learning algorithms have been instrumental in developing product recommendation engines, suggesting products based on customers’ past behavior, browsing history, and purchase history. For Amazon, this approach has contributed to a sales increase of about 35%. Another application of predictive analytics in retail is the determination of goods and service prices based on customers’ past interactions.

This work introduces and develops a machine learning-based tool that leverages customer behavior (predictive analysis) in the e-commerce sector. The tool aims to predict products that will appeal to customers (product recommendation), and it can also drive tailored product advertisements. This AI service is user-friendly and enables e-commerce companies to make effective product recommendations. The effectiveness of our solution is demonstrated through our retail shopping solutions, highlighting the significant role of AI in predictive analytics for retail businesses.

Our proposed algorithm utilizes user-profile details, visited pages, and click data to infer user interests. Additionally, we analyze click patterns for purchased products to enhance prediction accuracy.

Literature Review

This section presents the latest developments in the recommendation system paradigm, with a focus on predictive analysis. The literature identifies user behavior in predictive analysis, primarily encompassing click information, user ratings on purchased items, comments, sharing functionalities, and cart data. Various methods have been extensively employed in the development of product recommendation systems. These systems, based on predictive analysis, account for changes in user behavior, especially when users don’t provide explicit feedback like product ratings and comments. For instance, one method extracts ratings from a user’s purchase history concerning specific products. The data relates to items frequently purchased by a single user, and through the user-item matrix collaborative filtering technique, the system recommends items aligned with the users’ interests.


  • Data Collection Data collection is a crucial step in our methodology. To identify products that align closely with user preferences, we first establish the types of data needed. We utilized a publicly available dataset from the internet, comprising user profile information and click history. This dataset contains 294,864 records for 51,386 unique users, focusing primarily on user click history.
  • Preprocessing We applied data mining techniques to cleanse and prepare the dataset for our model. The process involved data cleaning, extraction of relevant information, and feature discovery. Subsequently, we identified variables crucial for our model, such as user IP, click information, access dates and times, accessed pages, and product details (name, type, and ID). We also eliminated duplicate records and removed entries lacking specific purchased products.
  • Data Visualization To gain a comprehensive understanding of the dataset, we employed data visualization. This approach allowed us to analyze online product shopping trends based on various criteria, including time series analysis, user interest-based shopping, and user access page analysis.
Figure 1: time series analysis - daily online product shopping
Figure 2: time series analysis - monthly online product shopping
Figure 3: user interest analysis

Proposed Model

Our developed AI model utilizes deep neural networks (DNN) to implement a collaborative filtering-based algorithm, aimed at achieving predictive analysis for product recommendation. This innovative approach combines DNN—a robust machine learning architecture—with the word2vec mechanism to effectively analyze user click-based features. To broaden the perspective, we also explore alternative classifiers such as XGBoost, Random Forest, and Support Vector Machine (SVM). These alternatives provide varied approaches to classification, each with its unique strengths, thereby enriching our model’s capability to adapt to different predictive analysis scenarios in e-commerce.

Specifically, we employed the AutoRec architecture [5] which is a denoising autoencoder network with a collaborative filtering model. It can be used to learn lower-dimensional feature representations at the bottleneck layer, or to fill the blanks of the interaction matrix directly in the reconstruction layer. It is designed to accept both an item-based and user-based autoencoder. It takes in the latent vector of each item or user, maps it to the hidden layers and then attempts to reconstruct the item or user in the output layer, thereby predicting the ranking of products recommended to the user.

The Word2vec technique, an integral part of our model, is a two-stage neural network designed to transform data into vector space representations. We have effectively utilized this technique to generate a model of the ‘nearest neighbor’ for products. This model is intricately based on a combination of user purchasing records and their click sequences. By doing so, we ensure that the product recommendations are not only relevant but also highly personalized, reflecting the unique browsing and purchasing patterns of each user.


Our experimental results were rigorously evaluated using various metrics, including Mean Absolute Error, Mean Square Error, and Root Mean Square Error, to ensure comprehensive and accurate assessment. Additionally, the model underwent extensive evaluation using real data across diverse retail sectors, encompassing goods and services, travel, automotive, and general retail industries. This wide-ranging evaluation not only demonstrates the versatility of our model but also underscores its applicability and effectiveness in different retail contexts, thereby reinforcing the robustness and reliability of our predictive analysis approach.

The model outputs a ranking of products closely aligned with the user’s purchase history and click behavior. We rigorously compared the model’s predicted values against actual user choices to assess its accuracy. The strong correlation between these values attests to the model’s effectiveness in leveraging user behavior for product recommendation. Notably, the model’s utility extends beyond product recommendations, as it is adaptable to various tasks that analyze user behavior.


  1. Zhang, Z.P., Kudo, Y., Murai, T., Ren, Y.G. (2019). “Enhancing Recommendation Accuracy of Item-Based Collaborative Filtering via Item-Variance Weighting.” Appl. Sci., vol.9, pp. 1928.
  2. Lu, J., Wu, D., Mao, M., Wang, W., Zhang, G. (2015). “Recommender System Application Developments: A Survey.” Decis. Support Syst., vol. 74, pp. 12-32.
  3. Zheng, L., Lu, C.T., et al. (2019). “MARS: Memory Attention-Aware Recommender System.” In Proceedings of the 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Washington, DC, USA, pp. 11-20.
  4. Sedan, S., Menon, A.K., Sanner, S., Xie, L. (2015). “Autorec: Autoencoders Meet Collaborative Filtering.” In WWW, pp. 111-112.