Quality Detection of Stored Potatoes via Transfer Learning: A CNN and Vision Transformer Approach

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Image-based deep learning provides a non-invasive, scalable solution for monitoring potato quality during storage, addressing key challenges such as sprout detection, weight loss estimation, and shelf-life prediction. In this study, images and corresponding weight data were collected over a 200-day period under controlled temperature and humidity conditions. Leveraging powerful pre-trained architectures of ResNet, VGG, DenseNet, and Vision Transformer (ViT), we designed two specialized models: (1) a high-precision binary classifier for sprout detection, and (2) an advanced multi-class predictor to estimate weight loss and forecast remaining shelf-life with remarkable accuracy. DenseNet achieved exceptional performance, with 98.03% accuracy in sprout detection. Shelf-life prediction models performed best with coarse class divisions (2-5 classes), achieving over 89.83% accuracy, while accuracy declined for finer divisions (6-8 classes) due to subtle visual differences and limited data per class. These findings demonstrate the feasibility of integrating image-based models into automated sorting and inventory systems, enabling early identification of sprouted potatoes and dynamic categorization based on storage stage. Practical implications include improved inventory management, differential pricing strategies, and reduced food waste across supply chains. While predicting exact shelf-life intervals remains challenging, focusing on broader class divisions ensures robust performance. Future research should aim to develop generalized models trained on diverse potato varieties and storage conditions to enhance adaptability and scalability. Overall, this approach offers a cost-effective, non-destructive method for quality assessment, supporting efficiency and sustainability in potato storage and distribution.

💡 Research Summary

This paper presents a comprehensive deep‑learning framework for non‑destructive quality assessment of stored potatoes, focusing on two key tasks: (1) binary detection of sprouting and (2) multiclass prediction of weight loss, which is used to estimate remaining shelf‑life. The authors collected a longitudinal dataset over 200 days under controlled temperature (21 ± 2 °C) and relative humidity (70 ± 10 %). Eighteen tubers of the FC3 cultivar were stored in trays, and images of each tray were captured every 1–5 days with a Nikon Coolpix P100 camera. Individual potatoes were manually cropped to 250 × 250 pixel resolution, and the weight of each tray was recorded using a precision balance. Weight loss percentage was calculated relative to the initial weight, and a shelf‑life threshold of 10 % loss was defined; the remaining shelf‑life was derived as the number of days until this threshold would be reached.

Four state‑of‑the‑art architectures—VGG‑16, ResNet‑50, DenseNet‑121, and Vision Transformer (ViT)—were employed via transfer learning. All models were initialized with ImageNet‑pretrained weights and fine‑tuned on the potato dataset. The top classification layer was replaced to match the required number of output classes. Hyper‑parameter optimization was performed using GridSearchCV combined with 5‑fold cross‑validation; 80 % of the images were used for training and 20 % for testing (255 training, 51 test images).

For sprout detection, the binary classifier achieved a best accuracy of 98.03 % with DenseNet‑121, outperforming ResNet‑50 (≈95 %), VGG‑16 (≈94 %), and ViT (≈93 %). DenseNet’s dense connectivity, which reuses features from all preceding layers, contributed to its superior performance while keeping the parameter count relatively low (~8 M).

The multiclass shelf‑life prediction was framed as classification of cumulative weight‑loss intervals. The authors experimented with 2 to 8 class configurations, each dividing the 0–10 % loss range equally and adding a final “>10 %” class. Accuracy remained high (≈90–92 %) for coarse granularity (2–5 classes) but dropped sharply (below 78 %) for finer granularity (6–8 classes). The decline is attributed to subtle visual differences in early storage stages and reduced sample size per class, leading to class imbalance.

Model size and computational cost were also compared: VGG‑16 (~14.9 M parameters), ResNet‑50 (~23.5 M), DenseNet‑121 (~8 M), and ViT (~86 M). Despite ViT’s larger footprint, its performance lagged behind CNNs in this limited‑data scenario, suggesting that transformer‑based vision models may require substantially larger and more diverse datasets to realize their potential.

The study highlights several practical implications. A high‑accuracy sprout detector can be integrated into automated sorting lines, enabling early removal of sprouted tubers and reducing waste. Coarse‑grained shelf‑life classification provides reliable inputs for inventory management, dynamic pricing, and logistics planning. Deploying the models as cloud‑based APIs linked to on‑site cameras would allow real‑time monitoring across the supply chain.

Limitations include reliance on a single cultivar and a single controlled environment, modest image resolution, and insufficient data for fine‑grained classes. Future work should expand the dataset to multiple varieties, storage conditions, and higher‑resolution or multispectral imaging, as well as explore data‑augmentation, class‑weighting, and model compression for edge deployment.

In conclusion, the paper demonstrates that transfer‑learning‑based convolutional networks—particularly DenseNet‑121—can accurately detect sprouting and estimate remaining shelf‑life from simple RGB images of stored potatoes. The findings support the feasibility of cost‑effective, non‑destructive quality assessment tools that can improve efficiency, reduce waste, and enhance sustainability in the potato supply chain.

Quality Detection of Stored Potatoes via Transfer Learning: A CNN and Vision Transformer Approach

💡 Research Summary

Comments & Academic Discussion

Leave a Comment