Schedule for: 24w5283 - Bridging Prediction and Intervention Problems in Social Systems

Beginning on Sunday, June 2 and ending Friday June 7, 2024

All times in Banff, Alberta time, MDT (UTC-6).

Sunday, June 2
16:00 - 17:30 Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
20:00 - 22:00 Informal gathering
Meet and Greet in PDC (BIRS Lounge)
(Other (See Description))
Monday, June 3
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
08:45 - 09:00 Introduction and Welcome by BIRS Staff
A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions.
(TCPL 201)
09:00 - 09:30 Lydia Liu: Opening session by organizers
Introductory session by Lydia Liu, Deborah Raji and Angela Zhou.
(TCPL 201)
09:30 - 10:00 Amanda Coston: Validity, problem formulation, and data-driven decision making
Data-driven decision making is common in societally high-stakes settings, from child welfare and criminal justice to healthcare and consumer lending. In this talk we explore often overlooked issues in problem formulation and validity that threaten the suitability of these data-driven systems for real-world use. First, we draw on validity theory from the social sciences in order to develop a taxonomy of challenges that threaten validity in the algorithmic decision-making context. Next we turn to the question of how to identify and address these challenges in practice. We propose a guidebook that structures deliberation on early-stage problem formulation that we co-designed in collaboration with public sector agency leaders and AI developers, frontline workers, and community advocates. Finally, we consider validity challenges in evaluation of these systems, focusing on a common challenge — selectively missing data. We demonstrate how to address the missing data problem using techniques from causal inference.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Manish Raghavan: Reconciling human expertise and algorithmic predictions
In many contexts, algorithmic predictions perform comparably to human expert judgement. However, there are plenty of good reasons to want humans to remain involved in decision-making. Here, we explore one such reason: humans can access information that algorithms cannot. For example, in medical settings, algorithms may be used to assess pathologies based on fixed data, but doctors may directly examine patients. We build a framework to incorporate expert judgements to distinguish between instances that are algorithmically indistinguishable, with the goal of producing predictions that outperform both humans and algorithms in isolation. We evaluate our methods on clinical risk prediction contexts, finding that while algorithms outperform humans on average, humans add valuable information in identifiable cases.
(Online)
11:00 - 11:30 Eli Ben-Michael: Does AI help humans make better decisions? A methodological framework for experimental evaluation
The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to answer experimentally this question with no additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with a human making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that AI recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that AI-alone decisions generally perform worse than human decisions with or without AI assistance. Finally, AI recommendations tend to impose cash bail on non-white arrestees more often than necessary when compared to white arrestees.
(TCPL 201)
11:30 - 12:00 Talia Gillis: Regulatory challenges for algorithmic fair lending
Consumer credit has become central to many debates on algorithmic fairness, discrimination and the regulation of AI, highlighting the critical need for robust regulatory frameworks. I will discuss a number of legal regulatory challenges, including challenges related to bridging the gap between traditional fair lending law and current practices, such as traditional discrimination law’s focus on input scrutiny. I will also discuss wedges between emerging regulatory frameworks and the realities of algorithmic decision-making, such as regulating for humans-in-the-loop and the generalizability of disparity metrics. Finally, I will touch upon current opportunities to operationalize fair lending compliance in ways that were limited in the past.
(TCPL 201)
12:00 - 13:30 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
14:00 - 14:20 Group Photo
Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo!
(TCPL Foyer)
14:30 - 14:37 Luke Guerdan: Lightning talk (TCPL 201)
14:37 - 14:43 Angelina Wang: Lightning talk (TCPL 201)
14:43 - 14:50 Ezinne Nwanko: Lightning talk (TCPL 201)
14:50 - 14:57 Michael Zanger-Tishler: Lightning talk (TCPL 201)
15:00 - 15:30 Poster Session and Coffee Break (TCPL Foyer)
15:30 - 15:45 Jessica Hullman: Provocation (TCPL 201)
15:45 - 16:00 Ashia Wilson: Provocation (TCPL 201)
16:00 - 16:15 Benjamin Recht: Provocation (TCPL 201)
16:15 - 17:00 Lydia Liu: Panel: Clinical vs. Statistical Judgment
Provocations (short perspective talks) followed by panel discussion Panelists [TBD]: Ben Recht, Jessica Hullman, Ziad Obermeyer, Ashia Wilson
(TCPL 201)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Tuesday, June 4
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 09:30 Daniel Malinsky: Risk prediction vs individualized treatment rules for cardiovascular care decisions: a health equity perspective
Treatment decisions in clinical settings are often informed by predictive algorithms, i.e., "risk scores" that quantify the probability that a patient with certain characteristics will experience some negative medical event. In cardiovascular care guidelines, for example, risk scores are recommended to inform decisions about antihypertensive (and lipid lowering) medical treatments. From a causal inference perspective, a better (but more challenging) approach would be to use "optimal" individualized treatment rules, which take into account treatment effect heterogeneity and recommend treatment to those patients thought to benefit most from an intervention. In this talk I will report some results that compare these two "paradigms" for making treatment decisions using data from a large multi-ethnic observational cohort study, with special attention to the consequences for health equity.
(TCPL 201)
09:30 - 10:00 Razieh Nabi: Mitigating Unfair Biases in Statistical Learning with a Focus on Causal Constraints
Ensuring algorithmic fairness presents a unique challenge due to its dynamic and context-specific nature. This complexity necessitates (i) the selection of fairness criteria that are both theoretically robust and practically applicable, aligning with societal norms, as well as (ii) mitigating the pre-specified fairness criterion. We provide a framework for these mitigations that embraces a wide range of fairness definitions. Due to time constraints, this talk concentrates on mitigating bias through causal and counterfactual path-specific effects within predictive models. By leveraging methods from causal inference, constrained optimization, and semiparametric statistics, we develop an optimal predictive model that neutralizes undesirable path-specific effects. Our approach demonstrates how fairness considerations can be systematically integrated into predictive modeling by modifying the unconstrained optimal risk minimizer. This method not only elucidates the complexities of embedding fairness into algorithmic processes but also showcases practical applications, offering a robust strategy for promoting equitable outcomes across various sectors.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Joshua Loftus: Model-agnostic explanation tools and their limitations
Tools for interpretable machine learning or explainable artificial intelligence can be used to audit algorithms for fairness or other desired properties. In a "black-box" setting--one without access to the algorithm's internal structure--the methods available to an auditor may be model-agnostic. These methods are based on varying inputs while observing differences in outputs, and include some of the most popular interpretability tools like Shapley values and Partial Dependence Plots. Such explanation methods have important limitations. Moreover, their limitations can impact audits with consequences for outcomes such as fairness. This talk will highlight key lessons that regulators, auditors, or other users of model-agnostic explanation tools must keep in mind when interpreting their output. Although we focus on a selection of tools for interpretation and on fairness as an example auditing goal, our lessons generalize to many other applications of model-agnostic explanations. These tools are increasing in popularity, which makes understanding their limitations an important research direction. That popularity is driven largely by their ease of use and portability. In high-stakes settings like an audit, however, it may be worth the extra work to use tools based on causal modeling that can incorporate background information and be tailored to each specific application.
(TCPL 201)
11:00 - 11:30 Kosuke Imai: Estimating Racial Disparities When Race is Not Observed
The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census data to predict race. Unfortunately, the residuals of BISG are often correlated with the outcomes of interest, generally attenuating estimates of racial disparities. To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics. We introduce a new class of models, Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that take BISG probabilities as inputs and produce racial disparity estimates by using surnames as an instrumental variable for race. Our estimation method is scalable, making it possible to analyze large-scale administrative data. We also show how to address potential violations of the key identification assumptions. A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration. Finally, we apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service. Open-source software is available which implements the proposed methodology.
(TCPL 201)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:00 - 15:00 Working Session (TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 15:45 Lily Hu: Provocation (TCPL 201)
15:45 - 16:00 Alexander Tolbert: Provocation (TCPL 201)
16:00 - 16:15 Solon Barocas: Provocation (Online)
16:15 - 17:00 Lydia Liu: Panel: Normative analysis in causal inference
Panelists: Lily Hu, Alexander Tolbert, Dan Malinsky, Joshua Loftus
(TCPL 201)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Wednesday, June 5
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 09:30 Berk Ustun: When personalization harms performance
Machine learning models often include group attributes like sex, age, and HIV status for the sake of personalization – i.e., to assign more accurate predictions to heterogeneous subpopulations. In this talk, I will describe how such practices inadvertently lead to worsenalization, by assigning unnecessarily inaccurate predictions to minority groups. I will discuss how these effects violate our basic expectations from personalization in applications like clinical decision support, and describe how they arise due to standard practices in algorithm development. I will end by highlighting work on how to address these issues in practice – first, by setting "personalization budgets" to test for worsenalization; second, by developing "participatory prediction systems" where individuals can consent to personalization at prediction time.
(TCPL 201)
09:30 - 10:00 Matthew Salganik: The origins of unpredictability in life trajectory prediction tasks
Why are some life outcomes difficult to predict? We investigated this question through in-depth qualitative interviews with 40 families sampled from a multi-decade longitudinal study. Our sampling and interviewing process were informed by the earlier efforts of hundreds of researchers to predict life outcomes for participants in this study. The qualitative evidence we uncovered in these interviews combined with a mathematical decomposition of prediction error lead us to create a new conceptual framework. Our specific evidence and our more general framework suggest that unpredictability should be expected in many life outcome prediction tasks, even in the presence of complex algorithms and large datasets. Our work provides a foundation for future empirical and theoretical work on unpredictability in human lives.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Juan Carlos Perdomo: The Relative Value of Prediction in Algorithmic Decision Making
Algorithmic predictions are increasingly used to inform the allocations of goods and services in the public sphere. In these domains, predictions serve as a means to an end. They provide stakeholders with insights into the likelihood of future events in order to improve decision making quality, and enhance social welfare. However, if maximizing welfare is the question, to what extent is improving prediction the best answer? In this talk, I will present a formal framework that analyzes how the impacts of prediction on welfare compare to those of other structural policy levers, like expanding access or investing in intervention quality.
(TCPL 201)
11:00 - 11:30 Shion Guha: Deconstructing Risk in Predictive Risk Models
Predictive Risk Models (PRM) have become commonplace in many government agencies to provide optimal data-driven decision-making outcomes in high- risk contexts such as criminal justice, child welfare, homelessness, immigration etc. While such technology continues to be acquired and implemented rapidly throughout the government because of the perceived benefits of cost reductions and better decision-making outcomes, recent research has pointed out several issues in how PRMs are developed. Notably, existing risk assessment approaches underlie much of the training data for these PRMs. But what exactly are these PRMs predicting? In this talk, I use empirical studies in the context of child welfare to deconstruct and interrogate what “risk” in PRMs actually means and provide provocative directions for the community to discuss how we can move beyond our existing PRM development approaches.
(TCPL 201)
11:30 - 11:37 Ben Laufer: Lightning talk (TCPL 201)
11:37 - 11:43 Hammaad Adam: Lightning talk (TCPL 201)
11:43 - 11:50 Sayash Kapoor: Lightning talk (TCPL 201)
11:50 - 11:57 Roshni Sahoo: Lightning talk (TCPL 201)
12:00 - 12:30 Poster Session (TCPL Foyer)
12:30 - 13:30 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:30 - 17:30 Free Afternoon (Banff National Park)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Thursday, June 6
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 09:30 Daniel Ho: Evaluating AI Systems in Government
This talk will describe our efforts at the RegLab to pilot and evaluate AI interventions in governmental settings.
(TCPL 201)
09:30 - 10:00 Simone Zhang: Social Mechanisms of Performative Prediction
Predictions of social outcomes raise concerns about performativity: the potential for predictions to influence the world, including the very outcomes they aim to forecast. Existing technical work has formalized these feedback dynamics and proposed optimization approaches in such settings. In this talk, I bridge these efforts with social science scholarship pointing to the diverse social mechanisms through which predictions can have performative effects. Focusing on prediction in policy settings and at evaluation interfaces, I present a framework that disaggregates ways that decision-makers, decision subjects, and societal third parties react to predictions. Within this framework, I distinguish between responses to specific predictions and responses to the broader characteristics of a prediction system. Elucidating theses social mechanisms can broaden our understanding of how predictions intervene in social systems and inform technical and non-technical strategies for the responsible deployment of prediction models with performative potential.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Bryan Wilder: Learning treatment effects while treating those in need
Policymakers often face a dilemma in allocating a limited resource: should they give the resource to people who are believed to be in greater need now, or randomize in order to learn more for the future? Currently, most programs operate at the extreme points of this tradeoff. At one extreme, individuals are scored according to some measure of need and resources are given exclusively to those with highest need. At the other, a randomized controlled trial is conducted, which assigns treatment independently of need but enables credible estimation of causal effects. We propose a framework for optimal policy design that spans the entire spectrum between these extremes. In our framework, the policymaker specifies a utility function that encodes their preexisting preferences for which individuals receive treatment, based on individuals' observed covariates. Then, we find a treatment assignment rule that minimizes the error with which we can estimate the average treatment effect of the intervention, subject to a constraint on the policy's expected utility. By varying the utility constraint, we can generate the entire Pareto frontier of the learning-vs-targeting tradeoff. We give strong sample complexity guarantees for the policy learning problem and provide a computationally efficient strategy to implement it. We then apply our framework to data from two human service settings in Allegheny County, Pennsylvania. Our results show that optimized policies can substantially mitigate the tradeoff between learning and targeting. For example, it is often possible to obtain 90% of the optimal utility while ensuring that the average treatment effect can be estimated with less than 2x the samples that a RCT would require.
(TCPL 201)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:30 - 15:00 Working session (TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 15:45 Sendak Mark: Provocation (TCPL 201)
15:45 - 16:00 Suresh Venkatasubramanian: Provocation (TCPL 201)
16:00 - 16:15 Marissa Gerchick: Provocation (TCPL 201)
16:15 - 16:45 Lydia Liu: Panel: Understanding and assessing models in deployment (TCPL 201)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Friday, June 7
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 09:30 Closing Session (TCPL 201)
09:30 - 10:00 Working session / long break
Next steps for white paper for interested participants.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Checkout by 11AM
5-day workshop participants are welcome to use BIRS facilities (TCPL ) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 11AM.
(Front Desk - Professional Development Centre)
12:00 - 13:30 Lunch from 11:30 to 13:30 (Vistas Dining Room)