Serengeti’s team started with gathering, processing and analysing data required for the machine learning process. After all the necessary data was processed and analysed, together with our partner we’ve started with development and evaluation of models and specific learning procedures.
The architecture of the RedAI application consists of PostgreSQL, ASP MVC Core API&Pages, AI Workers, Web Browser, and Mobile App. Serengeti’s task in this project was focused on AI Workers, i.e., the part related to the use of machine learning technologies with emphasis on deep learning. This part is key for the project delivery and data processing as it enables detection and classification of SKUs, which are the main components of business process automation (BPA).
The RedAI application is based on image processing. It uses computer vision for:
- Image detection
- Image classification
- Image segmentation
The input image is processed through the deep learning concept.
After the input, the picture goes through reconstruction and inference. The first step in the deep learning concept of image processing is reconstruction, which consists of features, match, reconstruct, and dense reconstruct. In this part, the picture goes through a 3D reconstruction of the shelf. After the completion of the 3D reconstruction, the inference phase follows.
The inference has three parts – detection, classification, and segmentation – and it represents the most important part of the application.
For inference detection, we have used Faster R-CNN, one of the three main models for object detection. Faster R-CNN is a part of a series of models that use region proposals for object detection. It uses a pre-trained CNN generating a convolutional feature map while using it as part of the Region Proposal Network (RPN) that finds region proposals. We have set this model to be trained and optimized in accordance with the number of product classes on the shelves that need to be detected.
For inference classification, we have used a pre-trained neural network based on convolutional neural networks with several convolutional and one fully-connected layer.
For the third step, inference segmentation, we have used a deep learning model called a fully convolutional network. This model is trained and optimized for one segmentation class. In this case, it was trained for a single shelf.
The final part of image processing is the output image result, which gives a reconstructed shelf. During this phase, outliners are removed, and the image is down sampled. It is necessary to find the shelf plane and detect and contour the back project, after which the reconstruction scale is estimated. Using NMS RedAI, the application then merges shelf plane detections and creates its outputs through extended discovery (dots, contours, and endpoints) and coordinates normalization.
After merging shelf plan detections, the application merges shelf plane contours into shelves and creates planogram data. It assigns the product to shelves and calculates shelf shares.
The next feature that Serengeti’s team is working on is price recognition and 3D product mapping, which requires further application of advanced and modern technologies.