1,720,952 research outputs found

    Steering Stories: Confronting Narratives of Driving Automation through Contestational Artifacts

    Full text link
    In this paper, we problematize popular narratives of driving automation. Whether positive or negative, these propagate simplistic assumptions about human abilities and reinforce technocratic approaches to mobility innovation. We build on narrative approaches to participatory research and adversarial design, to explore how design-led confrontation can create opportunities for reflection on implicit assumptions and narratives that stakeholders may refer to when discussing and making decisions about automated driving technologies. Specifically, we discuss the results of four focus groups where we used contestational artifacts to promote critical discussions and confront taken-for-granted beliefs among stakeholders. We reflect on the results to distill methodological insight and design recommendations for conducting adversarial participatory design research as a way towards confronting dominant narratives. Together with the methodological approach, the main contribution of this work, we also provide a set of narrative tensions that can be used to question common beliefs surrounding automated driving futures.</p

    MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

    No full text
    Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms into a Pareto-optimal policy. Through maintaining a distribution over scalarization weights, our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies. We empirically demonstrate the effectiveness of MORAL in two scenarios, which model a delivery and an emergency task that require an agent to act in the presence of normative conflicts. Overall, we consider our research a step towards multi-objective RL with learned rewards, bridging the gap between current reward learning and machine ethics literature.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Human-Robot InteractionInteractive Intelligenc

    MARL-iDR: Multi-Agent Reinforcement Learning for Incentive-based Residential Demand Response

    No full text
    Distribution System Operators (DSOs) are responsible preventing grid congestion, while accounting for growing demand and the intermittent nature of renewable energy resources. Incentive-based demand response programs promise real-time flexibility to relieve grid congestion. To include residential consumers in these programs, aggregators can financially incentivize participants to reduce their energy demand and make aggregated energy reduction available to DSOs. A key challenge for aggregators is to coordinate heterogeneous preferences from multiple participants while preserving their privacy. This thesis proposes MARL-iDR: a decentralized Multi-Agent Reinforcement Learning approach to an incentive-based demand response program. The approach respects participants' privacy and preferences and makes decisions in real-time when deployed. The aggregator and each participant are controlled by Deep Reinforcement Learning agents that learn to maximize their reward. The aggregator agent learns a policy that dispatches suitable incentives to participants based on total energy demand and a target reduction, while minimizing financial costs. The participant agent learns to respond to these incentives by reducing consumption to a fraction of the original demand. The participant agents curtail or shift requested household appliances based on the selected consumption reduction using a novel Disjunctively Constrained Knapsack Problem optimization, while minimizing residents' dissatisfaction. A case study with real-world electricity data from 25 households demonstrates the capability to induce demand-side flexibility. The approach is compared to the case without demand response and to a centralized myopic baseline approach. A 9% reduction of the Peak-to-Average ratio (PAR) was achieved compared to the original PAR (no demand response)

    Preference-Based Reinforcement Learninig in Demand Response Programs

    No full text
    ncentive-based demand response (iDR) programs serve as important tools for distributed system operators (DSOs) to achieve a reduction in electricity demand during periods of grid overload. During these programs, participants can decide to curtail their consumption in exchange for financial incentives. Deciding the amount of curtailment for a participant is often the result of individual preferences. Reinforcement Learning (RL) methods have been employed to automate participants’ decision-making in these programs, often relying on predefined reward designs based on observed behavioral patterns. This thesis introduced PbRL-iDR: a reinforcement learning approach that can learn a reward function unique to individual participants by querying them for preference labels on a set of trajectories. PbRL-iDR trains the reward model and the policy on an alternating cycle. First, queries are sent to the simulated participant to update the current reward model. Later, the updated reward model is used to improve the policy. Variations of the PbRL-iDR algorithm are proposed to optimize query efficiency: active query selection (AQS) and parameter transfer from model ensemble (PTME). Through experimentation, PbRL-iDR demonstrated comparable performance to a DQN-based method, albeit with a slower convergence. An ablation study was performed to test the efficacy of AQS and PTME in reducing the number of queries necessary to learn a reward function. Results suggest that AQS can help the policy converge sooner and after fewer queries when compared to PbRL-iDR without AQS. The same experiment showed that using PTME failed to yield similar improvements.Computer Scienc

    Elucidating a ‘black-box’ transcends explaining the algorithm: Exploring Explainable AI (XAI) as a way to address AI implementation challenges in the Dutch public sector

    No full text
    Responding to the trend of increasing use of artificial intelligence (AI), we need to ensure applications of AI are designed, implemented, utilised and evaluated in a careful manner. Explainable AI, or XAI, meaning; - given a certain audience, the details and reasons of both technical processes of the algorithm-support system and the reasoning behind the system to make its functioning clear oreasy to understand - is one of the ways to responsibly design and implement AI systems. This research looks into AI-supported public decision-making processes in the Netherlands and the role and possible contribution of XAI in such a context. To this end, I conducted a mixed-method qualitative study; interviewing sixteen respondents from three key-actor groups within two Dutch national public sector executive bodies, additionally performing three observations and document-analysis. Differentiating between different phases of an AI system’s implementation life-cycle, the study unveils how the respective actors - managers, data scientists and domain experts/(potential) AI users - encounter various challenges in bringing an AI system from idea to production. The empirical findings show that many AI systems, whilst technically developed, are not deployed or adopted by the wider organisation. The study discerns the challenges hindering the AI implementation process from an organisational,human and technical point of view. Moreover, the study highlights the need to approach XAI from a multi-purpose, multi-actor perspective; both acknowledging that various actors need different kinds of explanations, but also bridging different respective professional worldviews to apprehend one another.XAI is often seen as a one-size-fits-all solution for various implementation challenges, however the study shows that certain challenges need to be addressed at least beyond traditional ways of XAI from a computer science perspective, and perhaps beyond XAI all together. As such, the insights of thisthesis contribute to generating a more realistic idea about the opportunities and limitations of XAI, within real-world AI implementation processes in the public sector.AI Oversight Lab: TNOMetropolitan Analysis, Design and Engineering (MADE

    Participatory AI in Marginalized Communities: Exploring Strategies for Inclusive Stakeholder Engagement in Algorithmic Development

    No full text
    In today's society, the rapid progression of digitization has led to the automation of various facets of human existence. This transformation has been facilitated by the utilization of algorithms, which are instrumental in driving efficient and effective automated processes. These algorithms have also found widespread adoption in the public sector, where they are employed to streamline and optimize various tasks and operations. The integration of algorithms in the public sector has brought about significant advancements in areas such as predictive policing, social welfare allocation, and healthcare.However, the use and development of these automated processes were subjected to concerns from the public about privacy, bias, accountability, and transparency. Since these concerns are mainly coming from citizens, their involvement in the process of developing algorithmic systems can potentially be of help.We explore the potential of participatory AI in marginalized communities as a means of obtaining valuable input from citizens regarding the development of these algorithmic systems employed by the public sector. One Piece of our approach involves hosting discussions in local community centers in marginalized neighborhoods. Our focus is on dilemmas that are relevant to algorithm design and evaluation decisions, and we frame these dilemmas in various ways, including forms that may not directly relate to societal impact, but are understandable for laypeople. Our key findings suggest that involving marginalized citizens can bring valuable perspectives and insights that are otherwise ignored. By incorporating public perspectives into algorithm development, we can promote inclusive decision-making processes and ensure that algorithms align with community values.Computer Science | Data Science and Technolog

    Investigating Inverse Reinforcement Learning from Human Behavior: Effect of Demonstrations with Temporal Biases on Learning Rewards using Inverse Reinforcement Learning

    No full text
    Inverse Reinforcement Learning (IRL) is a machine learning technique used for learning rewards from the behavior of an expert agent. With complex agents, such as humans, the maximized reward may not be easily retrievable. This is because humans are prone to cognitive biases. Cognitive biases are a form of deviation from rationality that affects everyday human decision-making. Time inconsistent decision-making is a type of a temporal cognitive bias where planning of future actions may vary at different points of time. Existing research in this field explores using IRL algorithms in numerous real-life situations. However, few works examine the effects of temporal biases on the recovered reward function. Hence in this research, we propose a methodology to generate synthetic demonstrations that emulate human data with this bias. An existing method, Maximum Entropy IRL (MEIRL) algorithm is used to recover reward functions from expert models containing aforementioned biases and compare them to the performance of unbiased models. The demonstrations are in a form of Markov Decision Process (MDP), implemented in a Grid- World environment. Temporal biases will be implemented within the expert demonstrations as different types of agents that portray a specific behavior. Our findings show that all biases affect reward learning to a considerable extent, with that effect having different magnitudes depending on different comparisons.CSE3000 Research ProjectComputer Science and Engineerin

    Augment it Maybe?: Improving Deep Vision Models with Adversarial Scene Text Augmentation

    No full text
    Image data augmentation has been regarded as a reliable and effective way to increase the data available for training. With the advent and rise of Generative AI, generative data augmentation has been shown to realize even better gains in performance for downstream tasks. However, these performance gains are often the cause of "extra information" being seeped into the generated examples via pre-trained model weights, heuristic inclusions etc. In this paper, we showcase the impact of text-in-image augmentation on the performance of an underlying downstream task (classification or recognition). This study specifically looks at the difference in performance when training a classifier under three settings- no augmentation, transform-based augmentation, and generative augmentation- and investigate whether and where this augmentation can be successfully employed to experience gains in performance, without letting any "extra information" seep in. We try to observe this difference in performance under varying amounts of training samples, and for samples with varying similarities to that of the original training data. We also present a new GAN structure- conditional Classification Deep Convolutional GAN (or the CcGAN)- as an improved baseline over the conditional Deep Convolutional GAN (cDCGAN) for our experiments which gave a 4\% performance gain over unaugmented data with no 'extra information'. We find that in certain settings and examples, there exists a performance advantage to train vision models in text-in-image settings using real and generated data. We also confirm that the amount of original training samples available affect the test accuracy achieved by generative augmentation, where a huge fall-off can be seen in extremely low- and high- data regimes; however, it seems to maximize performance at a ”sweet spot” where the robustness and variability added by the generated samples help to realize performance gains. It was also observed that the 1x and 5x augmentations performed better than other configurations. Lastly, we find that the similarity of generations does not affect model performance and does not vary consistently with model performance for most settings.Link to code for this project: http://tinyurl.com/AugmentItMaybe-codebaseComputer Scienc

    Counterfactual Explanations of Learned Reward Functions

    No full text
    Learning rewards from humans is a promising approach to aligning AI with human values. However, methods are not able to consistently extract the correct reward functions from demonstrations or feedback. To allow humans to understand the limitations and misalignments of a learned reward function we adopt the technique of counterfactual explanations from the field of eXplainable AI (XAI).Concretely, we propose Counterfactual Trajectory Explanations (CTEs) as an approach to contrast an original with a counterfactual partial trajectory and the rewards they receive.We devise and test 2 methods for generating CTEs of which a generation method based on Monte Carlo Tree Search proves to be the most effective.The CTEs are optimised for 6 quality criteria that were derived from the literature and tested experimentally. We found that most quality criteria are beneficial for creating more informative CTEs, while Validity stands out as contributing especially much to making explanations informative.Finally, we measure how informative the generated explanations are to a proxy-human model. While the model is not able to capture all aspects of the reward function, it does learn a substantial amount of knowledge that generalises to different trajectory distributions from the CTEs. These results present the use of counterfactuals, and more generally XAI methods, on learned reward function as a promising avenue for further inquiry.Computer Scienc

    Improving Confidence in the Estimation of Values and Norms

    No full text
    Autonomous agents (AA) will increasingly be interacting with us in our daily lives. While we want the benefits attached to AAs, it is essential that their behavior is aligned with our values and norms. Hence, an AA will need to estimate the values and norms of the humans it interacts with, which is not a straightforward task when solely observing an agent's behavior. This paper analyses to what extent an AA is able to estimate the values and norms of a simulated human agent (SHA) based on its actions in the ultimatum game. We present two methods to reduce ambiguity in profiling the SHAs: one based on search space exploration and another based on counterfactual analysis. We found that both methods are able to increase the confidence in estimating human values and norms, but differ in their applicability, the latter being more efficient when the number of interactions with the agent is to be minimized. These insights are useful to improve the alignment of AAs with human values and norms.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive IntelligenceInformation and Communication TechnologyEthics & Philosophy of Technolog
    corecore