This project examines how migrants, immigration systems, and related policies are represented in large-scale information retrieval (IR) datasets, with a particular focus on MS MARCO and immigration-related queries and passages. Building on critical data studies and work on algorithmic bias, the project asks how seemingly “neutral” benchmarks encode specific framings of migrants (e.g., as risks, workers, students, or burdens).

Goals

Methods

Status and Outputs

If you are interested in this line of work or would like to collaborate on responsible IR and migration-focused datasets, feel free to reach out.