Robotic systems for household object rearrangement often rely on latent preference models inferred from human demonstrations. While e ective at prediction, these models o er limited insight into the interpretable factors that guide human decisions. We introduce an explicit formulation of object arrangement preferences along four interpretable constructs: spatial practicality (putting items where they naturally t best in the space), habitual convenience (making frequently used items easy to reach), semantic coherence (placing items together if they are used for the same task or are contextually related), and commonsense appropriateness (putting things where people would usually expect to nd them). To capture these constructs, we designed and validated a self-report questionnaire through a 63-participant online study. Results con rm the psychological distinctiveness of these constructs and their explanatory power across two scenarios (kitchen and living room). We demonstrate the utility of these constructs by integrating them into a Monte Carlo Tree Search (MCTS) planner and show that when guided by participant-derived preferences, our planner can generate reasonable arrangements that closely align with those generated by participants. This work contributes a compact, interpretable formulation of object arrangement preferences and a demonstration of how it can be operationalized for robot planning.
Object rearrangement, the problem of organizing items within a space to achieve a desired con guration [4], is a central challenge for service robots operating in everyday environments. Here, a robot must be capable not only of manipulating objects, but also of deciding where each object should go in a way that aligns with a user's organizational preferences. Human organizational preferences are diverse (e.g. one person may want mugs by the kettle, while another may prefer them in a cabinet) and one-size-ts-all [25,33] de nitions of what an acceptable arrangement is might fail to account for these di erences. For robots to be useful in this context, they must be equipped with object rearrangement models that capture the salient criteria behind these preferences and that can adapt to di erences across users and scenes, especially in shared environments.
Prior work on the personalization of object rearrangement has aimed to tailor placements to re ect an individual user’s subjective spatial preferences rather than a universal notion of tidiness [24]. Abdo et al. [1] predicted user-speci c groupings via collaborative ltering, while [24] introduced a framework for learning latent embeddings of tidying style from demonstrations. More recent systems approximate user preferences with zero-shot visual prompting of vision-language models [33], infer them from prior and current scene context [39,40], or actively query users when demonstrations are ambiguous [53]. While these methods move beyond a ‘one-size-ts-all’ approach, they do so by implicitly using latent representations that capture an overall preference signal without revealing the underlying factors that shape it. This makes it di cult to both understand why objects are placed where they are or tune arrangements according to speci c priorities (e.g., convenience over aesthetics) or di erent scenarios without intensive retraining.
To address these limitations, we propose grounding personalized object rearrangement in interpretable constructs that re ect how people organize their environments, while remaining adaptable to variation across users and contexts. Speci cally, we formulate a compact representation of human organizational preferences in terms of four constructs: spatial practicality, habitual convenience, semantic coherence, and commonsense appropriateness, and investigate whether these human-aligned constructs are su cient to explain how people reason about object arrangements in common household spaces. Our work makes three contributions:
• Interpretable formulation of arrangement preferences:
We show that the four explicit arrangement constructs (spatial, habitual, semantic, commonsense) capture variation across individuals and scenarios (i.e. kitchen and living room). • A measurement tool for the proposed constructs: We design and validate a self-report questionnaire that quanti es how strongly each construct in uences participants’ judgments and establish that the constructs form a reliable and psychologically meaningful basis. • Preferences-alinged arrangement generation: We formulate cost functions for the constructs and integrate these into a Monte Carlo Tree Search (MCTS) planner for arrangement. This approach produces arrangements that align with human preferences when using participant-derived weights.
Most robotic object rearrangement systems optimize for a single, universal de nition of what constitutes a “good” organization. In the indoor household environments, e.g., kitchens and living rooms, organization is primarily de ned at the object-and room-levels, which are often described in spatial cognition as gural and vista spaces [18,31]. These methods use visuo-semantic priors and commonsense reasoning to move objects to plausible locations [23,44], minimize spatial ow elds [15], learn arrangement cost functions [25], or leverage 3D mapping and semantic search [48]. While e ective at achieving tidy con gurations, these methods cannot account for diverse user-speci c organizational styles. In contrast, our work formulates users’ object (re)arrangement preference as a combination of four interpretable constructs, which is exible and capable of accommodating diverse user preferences. For personalized rearrangement, Abdo et al. [1] used collaborative ltering to model co-occurrence patterns of object groupings, but this assumes a xed organizational schema is given a priori and thus captures statistical regularities without explaining the underlying rationale. Other approaches to personalized rearrangement extract latent “tidying styles” from user-arranged scenes [24], use large language models to summarize examples into rules [55], infer preferred placements from partial arrangements [40], employ zero-shot vision-language models [33], or actively query users to resolve ambiguities [53]. These advances enable personalization and achieve good predictive performance, but rely on implicit representations that hide the principles guiding the
This content is AI-processed based on open access ArXiv data.