Title: Evaluating Contextual Intelligence in Recyclability: A Comprehensive Study of Image-Based Reasoning Systems
ArXiv ID: 2601.00905
Date: 2025-12-31
Authors: Eliot Park, Abhi Kumar, Pranav Rajpurkar
📝 Abstract
While the importance of efficient recycling is widely acknowledged, accurately determining the recyclability of items and their proper disposal remains a complex task for the general public. In this study, we explore the application of cutting-edge vision-language models (GPT-4o, GPT-4o-mini, and Claude 3.5) for predicting the recyclability of commonly disposed items. Utilizing a curated dataset of images, we evaluated the models' ability to match objects to appropriate recycling bins, including assessing whether the items could physically fit into the available bins. Additionally, we investigated the models' performance across several challenging scenarios: (i) adjusting predictions based on location-specific recycling guidelines; (ii) accounting for contamination or structural damage; and (iii) handling objects composed of multiple materials. Our findings highlight the significant advancements in contextual understanding offered by these models compared to previous iterations, while also identifying areas where they still fall short. The continued refinement of context-aware models is crucial for enhancing public recycling practices and advancing environmental sustainability.
💡 Deep Analysis
📄 Full Content
Evaluating Contextual Intelligence in Recyclability:
A Comprehensive Study of Image-Based Reasoning
Systems
Eliot Park
Harvard College
Cambridge, MA 02138
eliot_park@college.harvard.edu
Abhi Kumar
Stanford University
Stanford, CA 94305
abhi1@stanford.edu
Pranav Rajpurkar
Department of Biomedical Informatics
Harvard Medical School
Boston, MA 02115
pranav_rajpurkar@hms.harvard.edu
Abstract
While the importance of efficient recycling is widely acknowledged, accurately
determining the recyclability of items and their proper disposal remains a complex
task for the general public. In this study, we explore the application of cutting-edge
vision-language models (GPT-4o, GPT-4o-mini, and Claude 3.5) for predicting the
recyclability of commonly disposed items. Utilizing a curated dataset of images,
we evaluated the models’ ability to match objects to appropriate recycling bins,
including assessing whether the items could physically fit into the available bins.
Additionally, we investigated the models’ performance across several challenging
scenarios: (i) adjusting predictions based on location-specific recycling guide-
lines; (ii) accounting for contamination or structural damage; and (iii) handling
objects composed of multiple materials. Our findings highlight the significant
advancements in contextual understanding offered by these models compared to
previous iterations, while also identifying areas where they still fall short. The
continued refinement of context-aware models is crucial for enhancing public
recycling practices and advancing environmental sustainability.
1
Introduction
Effective waste management, particularly through recycling, is essential in promoting environmental
sustainability. In 2018, the United States generated approximately 292.4 million tons of municipal
solid waste, equating to 4.9 pounds per person per day [6]. Of this, 32.1 percent was either recycled
or composted, a notable achievement but one that highlights the significant proportion of waste still
destined for landfills. Within this waste stream, certain materials, such as paper and paperboard,
achieved recycling rates as high as 68.2 percent, while others, such as plastics, lagged far behind at just
8.7 percent [6]. These disparities underscore the need for innovative approaches to improve recycling
rates across all categories, including a better understanding by the general public in distinguishing
which items should be recycled.
arXiv:2601.00905v1 [cs.CV] 31 Dec 2025
- Analyze the materials
present and their method of
adhesion
- Predict object recyclability
based on materials and
cleanliness
- Some objects must be
separated before recycling
(e.g., a glass jar with a
metal cap); if not, it is not
recyclable. Separability is
difficult to assess.
- Analyze transformations
(cleanliness and structure)
- Predict recyclability based on
location-specific guidelines
- Difficult to find or generate
images that show the same
object’s transformation of
contamination/damage
- Guidelines often insufficient
to determine truth labels
- City- or country-specific
guidelines vary, e.g., how to
recycle glass and soiled items
- Identify not only the object
but its conditions
- Use the provided guidelines
to predict object recyclability
- Guidelines often insufficient
to determine truth labels
1. Matching with multiple
types of disposal bins
2. Location-specific guidelines
3. Contamination or damage
in the object
4. Multi-material objects
- Analyze the available bins
and the size of the openings
- Analyze the object and its
material, size, and
cleanliness
- Place the object in the
correct bin
- Often fails to recognize all
the openings
- Attempts to put large items
(e.g., box) into small bins.
GPT-4o
Before:
After:
No
Location
Boston
London
San
Francisco
Expected: yes
Result: yes
Expected: “none”
Result: “left”
Without
Location:
With Boston
Guidelines:
Expected: yes
Result: no
Expected: yes
Result: yes
Expected: yes
Result: yes
GPT-4o
Task
Common
Problems
Example
Claude 3.5
GPT-4o-mini
Expected: yes
Result: no
Figure 1: Overview of our study. Four contextual predictions are tested for three models.
2
Related Works
In a recent work [2], the potential of general vision-language models, specifically Contrastive
Language-Image Pretraining (CLIP) [5], for automating the classification of waste materials for
recycling was explored. The results were substantially better compared to previous approaches using
simple convolution neural networks [7, 3, 4] with the model achieving an accuracy of 89% in zero-shot
classification into a dozen different disposal methods. However, the approach had notable limitations.
CLIP’s reliance on a predefined list of potential items meant it struggled with items outside of this
list, reducing its effectiveness in real-world applications where waste items are highly varied. In
particular, the model was not designed to handle common but challenging cases such as greasy, dirty,
or broken items, w