Improving Data Reusability in Interactive Information Retrieval: Insights from the Community

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this study, we conducted semi-structured interviews with 21 IIR researchers to investigate their data reuse practices. This study aims to expand upon current findings by exploring IIR researchers’ information-obtaining behaviors regarding data reuse. We identified the information about shared data characteristics that IIR researchers need when evaluating data reusability, as well as the sources they typically consult to obtain this information. We consider this work to be an initial step toward revealing IIR researchers’ data reuse practices and identifying what the community needs to do to promote data reuse. We hope that this study, as well as future research, will inspire more individuals to contribute to ongoing efforts aimed at designing standards, infrastructures, and policies, as well as fostering a sustainable culture of data sharing and reuse in this field.

💡 Research Summary

This paper investigates how researchers in the Interactive Information Retrieval (IIR) community assess the reusability of shared datasets and where they obtain the information needed for that assessment. Building on a prior interview study, the authors conducted semi‑structured interviews with 21 IIR scholars (8 professors, 1 post‑doc, and 12 PhD students) from institutions across Asia, Australia, Europe, and North America. Participants had at least three years of experience in IIR, ensuring a depth of domain knowledge. Interviews were carried out in English or Chinese, transcribed, and subjected to iterative thematic coding, yielding a set of coherent categories that directly address two research questions: (RQ1) What characteristics of research data do IIR scholars consider when judging its reusability? (RQ2) How do they gather the necessary information about those characteristics?

The analysis reveals that data reusability is not perceived as an intrinsic property of a dataset but as a judgment constructed by the researcher based on a combination of contextual, methodological, documentary, credibility, and legal/ethical cues. Four major data characteristics emerged:

Context and Methods of Data Production – Researchers need to know why, how, and under what conditions the data were originally collected. Details about experimental design, pre‑ and post‑interaction procedures, task instructions, and any potential biases are essential for judging whether a dataset can support a new research question.
Documentation Quality – Comprehensive documentation (README files, metadata schemas, variable dictionaries, usage guides) is a prerequisite. Even publicly available datasets are often unusable because the documentation does not clearly describe the data’s content, structure, or intended use.
Creator Credibility and Community Validation – The reputation of the data creators, their institutional affiliation, and evidence of community vetting (e.g., prior citations, inclusion in well‑known repositories such as TREC) serve as social trust signals. Datasets from recognized labs or large‑scale initiatives are deemed more reliable than those from unknown individuals.
Legal and Ethical Constraints – Explicit statements about licensing, privacy protection, and ethical approvals are required. In the absence of clear legal terms, researchers prefer to collect their own data rather than risk non‑compliance.

Regarding information sources, participants reported using four main channels:

Academic literature (papers that originally used the dataset) to infer context and methodological details.
Official data repositories (e.g., institutional or community‑maintained archives) that provide the dataset and its metadata; these were rated highest in trustworthiness.
Colleagues and professional networks for informal insights, unpublished documentation, or clarifications.
Project reports, code repositories, and supplementary materials that often contain detailed preprocessing pipelines and experimental protocols.

The authors argue that these findings highlight a gap between the existing data‑sharing practices in traditional IR (where large benchmark collections are well‑documented and widely reused) and the more nuanced needs of IIR, which deals with user behavior data, complex interaction logs, and context‑dependent experimental setups.

Based on the insights, the paper proposes concrete recommendations to foster a more reusable data ecosystem in IIR:

Capture Contextual Metadata at Source – Develop tooling that automatically logs experimental context (task description, participant demographics, interface configuration) alongside raw interaction logs.
Standardized Documentation Templates – Adopt minimal yet comprehensive templates (e.g., README, data dictionary, usage examples) to ensure that essential information is always present.
Data Credibility Mechanisms – Introduce peer‑reviewed data “certifications” and community‑driven rating systems to signal trustworthiness.
Clear Legal/Ethical Guidelines – Provide standardized licensing options and anonymization protocols to reduce uncertainty about reuse permissions.
Policy and Funding Support – Encourage institutions and funding agencies to mandate data‑sharing plans that incorporate the above standards, and to allocate resources for maintaining reusable data infrastructures.

In sum, the study offers the first qualitative map of what IIR researchers need to decide whether to reuse a dataset and how they obtain that information. By articulating these needs and suggesting concrete infrastructural and policy interventions, the paper makes a significant contribution toward building a sustainable culture of data sharing and reuse in the interactive information retrieval community.

Improving Data Reusability in Interactive Information Retrieval: Insights from the Community

💡 Research Summary

Comments & Academic Discussion

Leave a Comment