{"id":92,"date":"2015-06-25T16:25:54","date_gmt":"2015-06-25T20:25:54","guid":{"rendered":"http:\/\/avid.cs.umass.edu\/wordpress\/?page_id=92"},"modified":"2020-04-11T06:31:18","modified_gmt":"2020-04-11T10:31:18","slug":"publications","status":"publish","type":"page","link":"https:\/\/dream.cs.umass.edu\/?page_id=92","title":{"rendered":"Publications"},"content":{"rendered":"<div class=\"teachpress_pub_list\"><form name=\"tppublistform\" method=\"get\"><a name=\"tppubs\" id=\"tppubs\"><\/a><\/form><div class=\"tablenav\"><div class=\"tablenav-pages\"><span class=\"displaying-num\">249 entries<\/span> <a class=\"page-numbers button disabled\">&laquo;<\/a> <a class=\"page-numbers button disabled\">&lsaquo;<\/a>  1 of 5 <a href=\"https:\/\/dream.cs.umass.edu\/?page_id=92&amp;limit=2&amp;tgid=&amp;yr=&amp;type=&amp;usr=&amp;auth=&amp;tsr=#tppubs\" title=\"next page\" class=\"page-numbers button\">&rsaquo;<\/a> <a href=\"https:\/\/dream.cs.umass.edu\/?page_id=92&amp;limit=5&amp;tgid=&amp;yr=&amp;type=&amp;usr=&amp;auth=&amp;tsr=#tppubs\" title=\"last page\" class=\"page-numbers button\">&raquo;<\/a> <\/div><\/div><table class=\"teachpress_publication_list\"><tr>\r\n                    <td>\r\n                        <h3 class=\"tp_h3\" id=\"tp_h3_2025\">2025<\/h3>\r\n                    <\/td>\r\n                <\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Soumyadip Ghosh, Peter J Haas, L Jeff Hong, Jonathan Ozik, Benjamin Thengvall<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('343','tp_abstract')\" style=\"cursor:pointer;\">Simulation Optimization 2050 and Beyond<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">2025 Winter Simulation Conference (WSC), <\/span><span class=\"tp_pub_additional_year\">2025<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_343\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('343','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_343\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('343','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_343\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{Ghosh2025,<br \/>\r\ntitle = {Simulation Optimization 2050 and Beyond},<br \/>\r\nauthor = {Soumyadip Ghosh and Peter J Haas and L Jeff Hong and Jonathan Ozik and Benjamin Thengvall},<br \/>\r\nyear  = {2025},<br \/>\r\ndate = {2025-12-07},<br \/>\r\njournal = {2025 Winter Simulation Conference (WSC)},<br \/>\r\nabstract = {The goal of this panel was to envision the future of simulation optimization research and practice over the next 25 years. The panel was composed of five simulation researchers from academia, industry, and research laboratories who shared their perspectives on the challenges and opportunities facing the field in light of contemporary advances in artificial intelligence, machine learning and computing hardware. The panelists also discussed the role simulation optimization can, should, and will play in supporting future decision-making under uncertainty. This paper serves as a collection of the panelists' prepared statements.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('343','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_343\" style=\"display:none;\"><div class=\"tp_abstract_entry\">The goal of this panel was to envision the future of simulation optimization research and practice over the next 25 years. The panel was composed of five simulation researchers from academia, industry, and research laboratories who shared their perspectives on the challenges and opportunities facing the field in light of contemporary advances in artificial intelligence, machine learning and computing hardware. The panelists also discussed the role simulation optimization can, should, and will play in supporting future decision-making under uncertainty. This paper serves as a collection of the panelists' prepared statements.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('343','tp_abstract')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Sneha Gathani, Kevin Li, Raghav Thind, Sirui Zeng, Matthew Xu, Peter J Haas, Cagatay Demiralp, Zhicheng Liu<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('342','tp_abstract')\" style=\"cursor:pointer;\">PRAXA: A Framework for What-If Analysis<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">arXiv preprint arXiv:2510.09791, <\/span><span class=\"tp_pub_additional_year\">2025<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_342\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('342','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_342\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('342','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_342\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('342','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_342\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{Gathani2025b,<br \/>\r\ntitle = {PRAXA: A Framework for What-If Analysis},<br \/>\r\nauthor = {Sneha Gathani and Kevin Li and Raghav Thind, Sirui Zeng and Matthew Xu and Peter J Haas and Cagatay Demiralp and Zhicheng Liu},<br \/>\r\nurl = {https:\/\/arxiv.org\/abs\/2510.09791},<br \/>\r\nyear  = {2025},<br \/>\r\ndate = {2025-10-10},<br \/>\r\njournal = {arXiv preprint arXiv:2510.09791},<br \/>\r\nabstract = {arious analytical techniques-such as scenario modeling, sensitivity analysis, perturbation-based analysis, counterfactual analysis, and parameter space analysis-are used across domains to explore hypothetical scenarios, examine input-output relationships, and identify pathways to desired results. Although termed differently, these methods share common concepts and methods, suggesting unification under what-if analysis. Yet a unified framework to define motivations, core components, and its distinct types is lacking. To address this gap, we reviewed 141 publications from leading visual analytics and HCI venues (2014-2024). Our analysis (1) outlines the motivations for what-if analysis, (2) introduces Praxa, a structured framework that identifies its fundamental components and characterizes its distinct types, and (3) highlights challenges associated with the application and implementation. Together, our findings establish a standardized vocabulary and structural understanding, enabling more consistent use across domains and communicate with greater conceptual clarity. Finally, we identify open research problems and future directions to advance what-if analysis.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('342','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_342\" style=\"display:none;\"><div class=\"tp_abstract_entry\">arious analytical techniques-such as scenario modeling, sensitivity analysis, perturbation-based analysis, counterfactual analysis, and parameter space analysis-are used across domains to explore hypothetical scenarios, examine input-output relationships, and identify pathways to desired results. Although termed differently, these methods share common concepts and methods, suggesting unification under what-if analysis. Yet a unified framework to define motivations, core components, and its distinct types is lacking. To address this gap, we reviewed 141 publications from leading visual analytics and HCI venues (2014-2024). Our analysis (1) outlines the motivations for what-if analysis, (2) introduces Praxa, a structured framework that identifies its fundamental components and characterizes its distinct types, and (3) highlights challenges associated with the application and implementation. Together, our findings establish a standardized vocabulary and structural understanding, enabling more consistent use across domains and communicate with greater conceptual clarity. Finally, we identify open research problems and future directions to advance what-if analysis.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('342','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_342\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-arxiv\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/arxiv.org\/abs\/2510.09791\" title=\"https:\/\/arxiv.org\/abs\/2510.09791\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2510.09791<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('342','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_book\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Mohammad Hadi Nezhad, Francisco Enrique Vicente Castro, Eugene Mak, Peter J Haas, Danielle Allessio, Leon Osterweil, Injila Rasul, Heather Conboy, Ivon Arroyo<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('341','tp_abstract')\" style=\"cursor:pointer;\">Embedding Ethical Awareness in Computer Science and AI Education: The PEaRCE Approach to Responsible Computing<\/a> <span class=\"tp_pub_type book\">Book<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_publisher\">Springer Nature Switzerland, <\/span><span class=\"tp_pub_additional_year\">2025<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_341\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('341','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_341\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('341','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_341\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('341','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_341\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@book{Nezhad2025,<br \/>\r\ntitle = {Embedding Ethical Awareness in Computer Science and AI Education: The PEaRCE Approach to Responsible Computing},<br \/>\r\nauthor = {Mohammad Hadi Nezhad and Francisco Enrique Vicente Castro and Eugene Mak and Peter J Haas and Danielle Allessio and Leon Osterweil and Injila Rasul and Heather Conboy and Ivon Arroyo},<br \/>\r\nurl = {https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-98414-3_10},<br \/>\r\nyear  = {2025},<br \/>\r\ndate = {2025-07-15},<br \/>\r\njournal = {International Conference on Artificial Intelligence in Education},<br \/>\r\npages = {135-149},<br \/>\r\npublisher = {Springer Nature Switzerland},<br \/>\r\nabstract = {Despite the widespread use of computer systems, their societal impacts are often poorly understood, highlighting the need for the AIED community to acknowledge and contribute to incorporating Ethical and Responsible Computing (ERC) into Computer Science (CS) education. We introduce the Platform for Ethical and Responsible Computing Education (PEaRCE), an interactive tool created to integrate ERC into post-secondary education through realistic workplace simulations. PEaRCE scenarios guide students through ERC quandaries\u2014awareness, decision-making, feedback\u2014during the processes of developing advanced AIED and other technologies that may have beneficial and harmful impacts. Moreover, we integrate PEaRCE into CS courses via a sequence of structured learning modules and trained \u201cERC Teaching Assistants (TAs)\u201d to support the integration process. We present insights from our deployment experiences, show PEaRCE\u2019s potential to enhance ERC awareness and reasoning, and discuss the possibility of embedding PEaRCE into AI in Education courses.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {book}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('341','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_341\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Despite the widespread use of computer systems, their societal impacts are often poorly understood, highlighting the need for the AIED community to acknowledge and contribute to incorporating Ethical and Responsible Computing (ERC) into Computer Science (CS) education. We introduce the Platform for Ethical and Responsible Computing Education (PEaRCE), an interactive tool created to integrate ERC into post-secondary education through realistic workplace simulations. PEaRCE scenarios guide students through ERC quandaries\u2014awareness, decision-making, feedback\u2014during the processes of developing advanced AIED and other technologies that may have beneficial and harmful impacts. Moreover, we integrate PEaRCE into CS courses via a sequence of structured learning modules and trained \u201cERC Teaching Assistants (TAs)\u201d to support the integration process. We present insights from our deployment experiences, show PEaRCE\u2019s potential to enhance ERC awareness and reasoning, and discuss the possibility of embedding PEaRCE into AI in Education courses.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('341','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_341\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-98414-3_10\" title=\"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-98414-3_10\" target=\"_blank\">https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-98414-3_10<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('341','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Sneha Gathani, Zhicheng Liu, Peter J Haas, \u00c7a\u011fatay Demiralp<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('340','tp_abstract')\" style=\"cursor:pointer;\">What-if analysis for business professionals: Current practices and future opportunities<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, <\/span><span class=\"tp_pub_additional_pages\">pp. 1-17, <\/span><span class=\"tp_pub_additional_year\">2025<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_340\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('340','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_340\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('340','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_340\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('340','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_340\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{Gathani2025,<br \/>\r\ntitle = {What-if analysis for business professionals: Current practices and future opportunities},<br \/>\r\nauthor = {Sneha Gathani and Zhicheng Liu and Peter J Haas and \u00c7a\u011fatay Demiralp},<br \/>\r\nurl = {https:\/\/dl.acm.org\/doi\/full\/10.1145\/3706598.3713672},<br \/>\r\nyear  = {2025},<br \/>\r\ndate = {2025-04-26},<br \/>\r\njournal = {Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems},<br \/>\r\npages = {1-17},<br \/>\r\nabstract = {What-if analysis (WIA) is essential for data-driven decision-making, allowing users to assess how changes in variables impact outcomes and explore alternative scenarios. Existing WIA research primarily supports the workflows of data scientists and analysts, and largely overlooks business professionals who engage in WIA through non-technical means. To bridge this gap, we conduct a two-part user study with 22 business professionals across marketing, sales, product, and operations roles. The first study examines their existing WIA practices, tools, and challenges. Findings reveal that business professionals perform many WIA techniques independently using rudimentary tools due to various constraints. We then implement representative WIA techniques in a visual analytics prototype and use it as a probe to conduct a follow-up study evaluating business professionals\u2019 practical use of the techniques. Results show that these techniques improve decision-making efficiency and confidence while underscoring the need for better support in data preparation, risk assessment, and domain knowledge integration. Finally, we offer design recommendations to enhance future business analytics systems.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('340','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_340\" style=\"display:none;\"><div class=\"tp_abstract_entry\">What-if analysis (WIA) is essential for data-driven decision-making, allowing users to assess how changes in variables impact outcomes and explore alternative scenarios. Existing WIA research primarily supports the workflows of data scientists and analysts, and largely overlooks business professionals who engage in WIA through non-technical means. To bridge this gap, we conduct a two-part user study with 22 business professionals across marketing, sales, product, and operations roles. The first study examines their existing WIA practices, tools, and challenges. Findings reveal that business professionals perform many WIA techniques independently using rudimentary tools due to various constraints. We then implement representative WIA techniques in a visual analytics prototype and use it as a probe to conduct a follow-up study evaluating business professionals\u2019 practical use of the techniques. Results show that these techniques improve decision-making efficiency and confidence while underscoring the need for better support in data preparation, risk assessment, and domain knowledge integration. Finally, we offer design recommendations to enhance future business analytics systems.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('340','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_340\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dl.acm.org\/doi\/full\/10.1145\/3706598.3713672\" title=\"https:\/\/dl.acm.org\/doi\/full\/10.1145\/3706598.3713672\" target=\"_blank\">https:\/\/dl.acm.org\/doi\/full\/10.1145\/3706598.3713672<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('340','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Amir Khosheghbal, Peter J Haas, Chaitra Gopalappa<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('339','tp_abstract')\" style=\"cursor:pointer;\">Mechanistic modeling of social conditions in disease-prediction simulations via copulas and probabilistic graphical models: HIV case study<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Health Care Management Science, <\/span><span class=\"tp_pub_additional_volume\">28  <\/span>, <span class=\"tp_pub_additional_pages\">pp. 28-49, <\/span><span class=\"tp_pub_additional_year\">2025<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_339\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('339','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_339\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('339','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_339\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('339','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_339\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{Khosheghbal2025,<br \/>\r\ntitle = {Mechanistic modeling of social conditions in disease-prediction simulations via copulas and probabilistic graphical models: HIV case study},<br \/>\r\nauthor = {Amir Khosheghbal and Peter J Haas and Chaitra Gopalappa},<br \/>\r\nurl = {https:\/\/link.springer.com\/article\/10.1007\/s10729-024-09694-3},<br \/>\r\nyear  = {2025},<br \/>\r\ndate = {2025-03-05},<br \/>\r\njournal = {Health Care Management Science},<br \/>\r\nvolume = {28 },<br \/>\r\npages = {28-49},<br \/>\r\nabstract = {As social and economic conditions are key determinants of HIV, the United States \u2018National HIV\/AIDS Strategy (NHAS)\u2019, in addition to care and treatment, aims to address mental health, unemployment, food insecurity, and housing instability, as part of its strategic plan for the \u2018Ending the HIV Epidemic\u2019 initiative. Although mechanistic models of HIV play a key role in evaluating intervention strategies, social conditions are typically not part of the modeling framework. Challenges include the unavailability of coherent statistical data for social conditions and behaviors. We developed a method, combining undirected graphical modeling with copula methods, to integrate disparate data sources, to estimate joint probability distributions for social conditions and behaviors. We incorporated these in a national-level network model, Progression and Transmission of HIV (PATH 4.0), to simulate behaviors as functions of social conditions and HIV transmissions as a function of behaviors. As a demonstration for the potential applications of such a model, we conducted two hypothetical what-if intervention analyses to estimate the impact of an ideal 100% efficacious intervention strategy. The first analysis modeled care behavior (using viral suppression as proxy) as a function of depression, neighborhood, housing, poverty, education, insurance, and employment status. The second modeled sexual behaviors (number of partners and condom-use) as functions of employment, housing, poverty, and education status, among persons who exchange sex. HIV transmissions and disease progression were then simulated as functions of behaviors to estimate incidence reductions. Social determinants are key drivers of many infectious and non-infectious diseases. Our work enables the development of decision support tools to holistically evaluate the syndemics of health and social inequity.<br \/>\r\n},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('339','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_339\" style=\"display:none;\"><div class=\"tp_abstract_entry\">As social and economic conditions are key determinants of HIV, the United States \u2018National HIV\/AIDS Strategy (NHAS)\u2019, in addition to care and treatment, aims to address mental health, unemployment, food insecurity, and housing instability, as part of its strategic plan for the \u2018Ending the HIV Epidemic\u2019 initiative. Although mechanistic models of HIV play a key role in evaluating intervention strategies, social conditions are typically not part of the modeling framework. Challenges include the unavailability of coherent statistical data for social conditions and behaviors. We developed a method, combining undirected graphical modeling with copula methods, to integrate disparate data sources, to estimate joint probability distributions for social conditions and behaviors. We incorporated these in a national-level network model, Progression and Transmission of HIV (PATH 4.0), to simulate behaviors as functions of social conditions and HIV transmissions as a function of behaviors. As a demonstration for the potential applications of such a model, we conducted two hypothetical what-if intervention analyses to estimate the impact of an ideal 100% efficacious intervention strategy. The first analysis modeled care behavior (using viral suppression as proxy) as a function of depression, neighborhood, housing, poverty, education, insurance, and employment status. The second modeled sexual behaviors (number of partners and condom-use) as functions of employment, housing, poverty, and education status, among persons who exchange sex. HIV transmissions and disease progression were then simulated as functions of behaviors to estimate incidence reductions. Social determinants are key drivers of many infectious and non-infectious diseases. Our work enables the development of decision support tools to holistically evaluate the syndemics of health and social inequity.<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('339','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_339\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/link.springer.com\/article\/10.1007\/s10729-024-09694-3\" title=\"https:\/\/link.springer.com\/article\/10.1007\/s10729-024-09694-3\" target=\"_blank\">https:\/\/link.springer.com\/article\/10.1007\/s10729-024-09694-3<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('339','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_presentation\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Alexandra Meliou, Azza Abouzied, Peter J Haas, Riddho R Haque, Anh Mai, Vasileios Vittis<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('338','tp_abstract')\" style=\"cursor:pointer;\">Data management perspectives on prescriptive analytics (invited talk)<\/a> <span class=\"tp_pub_type presentation\">Presentation<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_date\">14.05.2025<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_338\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('338','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_338\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('338','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_338\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('338','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_338\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@misc{Meliou2025,<br \/>\r\ntitle = {Data management perspectives on prescriptive analytics (invited talk)},<br \/>\r\nauthor = {Alexandra Meliou and Azza Abouzied and Peter J Haas and Riddho R Haque and Anh Mai and Vasileios Vittis},<br \/>\r\nurl = {https:\/\/par.nsf.gov\/biblio\/10627357},<br \/>\r\nyear  = {2025},<br \/>\r\ndate = {2025-05-14},<br \/>\r\nabstract = {Decision makers in a broad range of domains, such as finance, transportation, manufacturing, and healthcare, often need to derive optimal decisions given a set of constraints and objectives. Traditional solutions to such constrained optimization problems are typically application-specific, complex, and do not generalize. Further, the usual workflow requires slow, cumbersome, and error-prone data movement between a database, and predictive-modeling and optimization packages. All of these problems are exacerbated by the unprecedented size of modern data-intensive optimization problems. The emerging research area of in-database prescriptive analytics aims to provide seamless domain-independent, declarative, and scalable approaches powered by the system where the data typically resides: the database. Integrating optimization with database technology opens up prescriptive analytics to a much broader community, amplifying its benefits. We discuss how deep integration between the DBMS, predictive models, and optimization software creates opportunities for rich prescriptive-query functionality with good scalability and performance. Summarizing some of our main results and ongoing work in this area, we highlight challenges related to usability, scalability, data uncertainty, and dynamic environments, and argue that perspectives from data management research can drive novel strategies and solutions.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {presentation}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('338','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_338\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Decision makers in a broad range of domains, such as finance, transportation, manufacturing, and healthcare, often need to derive optimal decisions given a set of constraints and objectives. Traditional solutions to such constrained optimization problems are typically application-specific, complex, and do not generalize. Further, the usual workflow requires slow, cumbersome, and error-prone data movement between a database, and predictive-modeling and optimization packages. All of these problems are exacerbated by the unprecedented size of modern data-intensive optimization problems. The emerging research area of in-database prescriptive analytics aims to provide seamless domain-independent, declarative, and scalable approaches powered by the system where the data typically resides: the database. Integrating optimization with database technology opens up prescriptive analytics to a much broader community, amplifying its benefits. We discuss how deep integration between the DBMS, predictive models, and optimization software creates opportunities for rich prescriptive-query functionality with good scalability and performance. Summarizing some of our main results and ongoing work in this area, we highlight challenges related to usability, scalability, data uncertainty, and dynamic environments, and argue that perspectives from data management research can drive novel strategies and solutions.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('338','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_338\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/par.nsf.gov\/biblio\/10627357\" title=\"https:\/\/par.nsf.gov\/biblio\/10627357\" target=\"_blank\">https:\/\/par.nsf.gov\/biblio\/10627357<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('338','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr>\r\n                    <td>\r\n                        <h3 class=\"tp_h3\" id=\"tp_h3_2024\">2024<\/h3>\r\n                    <\/td>\r\n                <\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Peter J Haas<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('337','tp_abstract')\" style=\"cursor:pointer;\">Tutorial: Artificial Neural Networks for Discrete-Event Simulation<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">2024 Winter Simulation Conference (WSC), <\/span><span class=\"tp_pub_additional_year\">2024<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_337\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('337','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_337\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('337','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_337\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('337','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_337\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{Haas2024,<br \/>\r\ntitle = {Tutorial: Artificial Neural Networks for Discrete-Event Simulation},<br \/>\r\nauthor = {Peter J Haas},<br \/>\r\nurl = {https:\/\/ieeexplore.ieee.org\/abstract\/document\/10838940},<br \/>\r\ndoi = {10.1109\/WSC63780.2024.10838940},<br \/>\r\nyear  = {2024},<br \/>\r\ndate = {2024-12-15},<br \/>\r\njournal = {2024 Winter Simulation Conference (WSC)},<br \/>\r\nabstract = {This advanced tutorial explores some recent applications of artificial neural networks (ANNs) to stochastic discrete-event simulation (DES). We first review some basic concepts and then give examples of how ANNs are being used in the context of DES to facilitate simulation input modeling, random variate generation, simulation metamodeling, optimization via simulation, and more. Combining ANNs and DES allows exploitation of the deep domain knowledge embodied in simulation models while simultaneously leveraging the ability of modern ML techniques to capture complex patterns and relationships in data.<br \/>\r\n},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('337','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_337\" style=\"display:none;\"><div class=\"tp_abstract_entry\">This advanced tutorial explores some recent applications of artificial neural networks (ANNs) to stochastic discrete-event simulation (DES). We first review some basic concepts and then give examples of how ANNs are being used in the context of DES to facilitate simulation input modeling, random variate generation, simulation metamodeling, optimization via simulation, and more. Combining ANNs and DES allows exploitation of the deep domain knowledge embodied in simulation models while simultaneously leveraging the ability of modern ML techniques to capture complex patterns and relationships in data.<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('337','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_337\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/10838940\" title=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/10838940\" target=\"_blank\">https:\/\/ieeexplore.ieee.org\/abstract\/document\/10838940<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1109\/WSC63780.2024.10838940\" title=\"Follow DOI:10.1109\/WSC63780.2024.10838940\" target=\"_blank\">doi:10.1109\/WSC63780.2024.10838940<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('337','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Riddho R Haque, Anh L Mai, Matteo Brucato, Azza Abouzied, Peter J Haas, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('336','tp_abstract')\" style=\"cursor:pointer;\">Stochastic sketchrefine: Scaling in-database decision-making under uncertainty to millions of tuples<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">arXiv preprint arXiv:2411.17915, <\/span><span class=\"tp_pub_additional_year\">2024<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_336\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('336','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_336\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('336','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_336\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('336','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_336\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{Haque2024,<br \/>\r\ntitle = {Stochastic sketchrefine: Scaling in-database decision-making under uncertainty to millions of tuples},<br \/>\r\nauthor = {Riddho R Haque and Anh L Mai and Matteo Brucato and Azza Abouzied and Peter J Haas and Alexandra Meliou},<br \/>\r\nurl = {https:\/\/arxiv.org\/abs\/2411.17915},<br \/>\r\nyear  = {2024},<br \/>\r\ndate = {2024-11-24},<br \/>\r\njournal = {arXiv preprint arXiv:2411.17915},<br \/>\r\nabstract = {Decision making under uncertainty often requires choosing packages, or bags of tuples, that collectively optimize expected outcomes while limiting risks. Processing Stochastic Package Queries (SPQs) involves solving very large optimization problems on uncertain data. Monte Carlo methods create numerous scenarios, or sample realizations of the stochastic attributes of all the tuples, and generate packages with optimal objective values across these scenarios. The number of scenarios needed for accurate approximation - and hence the size of the optimization problem when using prior methods - increases with variance in the data, and the search space of the optimization problem increases exponentially with the number of tuples in the relation. Existing solvers take hours to process SPQs on large relations containing stochastic attributes with high variance. Besides enriching the SPaQL language to capture a broader class of risk specifications, we make two fundamental contributions towards scalable SPQ processing. First, to handle high variance, we propose risk-constraint linearization (RCL), which converts SPQs into Integer Linear Programs (ILPs) whose size is independent of the number of scenarios used. Solving these ILPs gives us feasible and near-optimal packages. Second, we propose Stochastic SketchRefine, a divide and conquer framework that breaks down a large stochastic optimization problem into subproblems involving smaller subsets of tuples. Our experiments show that, together, RCL and Stochastic SketchRefine produce high-quality packages in orders of magnitude lower runtime than the state of the art.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('336','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_336\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Decision making under uncertainty often requires choosing packages, or bags of tuples, that collectively optimize expected outcomes while limiting risks. Processing Stochastic Package Queries (SPQs) involves solving very large optimization problems on uncertain data. Monte Carlo methods create numerous scenarios, or sample realizations of the stochastic attributes of all the tuples, and generate packages with optimal objective values across these scenarios. The number of scenarios needed for accurate approximation - and hence the size of the optimization problem when using prior methods - increases with variance in the data, and the search space of the optimization problem increases exponentially with the number of tuples in the relation. Existing solvers take hours to process SPQs on large relations containing stochastic attributes with high variance. Besides enriching the SPaQL language to capture a broader class of risk specifications, we make two fundamental contributions towards scalable SPQ processing. First, to handle high variance, we propose risk-constraint linearization (RCL), which converts SPQs into Integer Linear Programs (ILPs) whose size is independent of the number of scenarios used. Solving these ILPs gives us feasible and near-optimal packages. Second, we propose Stochastic SketchRefine, a divide and conquer framework that breaks down a large stochastic optimization problem into subproblems involving smaller subsets of tuples. Our experiments show that, together, RCL and Stochastic SketchRefine produce high-quality packages in orders of magnitude lower runtime than the state of the art.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('336','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_336\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-arxiv\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/arxiv.org\/abs\/2411.17915\" title=\"https:\/\/arxiv.org\/abs\/2411.17915\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2411.17915<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('336','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr>\r\n                    <td>\r\n                        <h3 class=\"tp_h3\" id=\"tp_h3_2023\">2023<\/h3>\r\n                    <\/td>\r\n                <\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Wang Cen, Peter J Haas<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('335','tp_abstract')\" style=\"cursor:pointer;\">Efficient Hybrid Simulation Optimization via Graph Neural Network Metamodeling<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">2023 Winter Simulation Conference (WSC), <\/span><span class=\"tp_pub_additional_year\">2023<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_335\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('335','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_335\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('335','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_335\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('335','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_335\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{Cen2023,<br \/>\r\ntitle = {Efficient Hybrid Simulation Optimization via Graph Neural Network Metamodeling},<br \/>\r\nauthor = {Wang Cen and Peter J Haas},<br \/>\r\ndoi = {10.1109\/WSC60868.2023.10408474},<br \/>\r\nyear  = {2023},<br \/>\r\ndate = {2023-12-10},<br \/>\r\njournal = {2023 Winter Simulation Conference (WSC)},<br \/>\r\nabstract = {Simulation metamodeling is essential for speeding up optimization via simulation to support rapid decision making. During optimization, the metamodel, rather than expensive simulation, is used to compute objective values. We recently developed graphical neural metamodels (GMMs) that use graph neural networks to allow the graphical structure of a simulation model to be treated as a metamodel input parameter that can be varied along with scalar inputs. In this paper we provide novel methods for using GMMs to solve hybrid optimization problems where both real-valued input parameters and graphical structure are jointly optimized. The key ideas are to modify Monte Carlo tree search to incorporate both discrete and continuous optimization and to leverage the automatic differentiation infrastructure used for neural network training to quickly compute gradients of the objective function during stochastic gradient descent. Experiments on stochastic activity network and warehouse models demonstrate the potential of our method.<br \/>\r\n},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('335','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_335\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Simulation metamodeling is essential for speeding up optimization via simulation to support rapid decision making. During optimization, the metamodel, rather than expensive simulation, is used to compute objective values. We recently developed graphical neural metamodels (GMMs) that use graph neural networks to allow the graphical structure of a simulation model to be treated as a metamodel input parameter that can be varied along with scalar inputs. In this paper we provide novel methods for using GMMs to solve hybrid optimization problems where both real-valued input parameters and graphical structure are jointly optimized. The key ideas are to modify Monte Carlo tree search to incorporate both discrete and continuous optimization and to leverage the automatic differentiation infrastructure used for neural network training to quickly compute gradients of the objective function during stochastic gradient descent. Experiments on stochastic activity network and warehouse models demonstrate the potential of our method.<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('335','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_335\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1109\/WSC60868.2023.10408474\" title=\"Follow DOI:10.1109\/WSC60868.2023.10408474\" target=\"_blank\">doi:10.1109\/WSC60868.2023.10408474<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('335','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Pracheta Amaranath, Peter J. Haas, David Jensen, Sam Witty<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('334','tp_abstract')\" style=\"cursor:pointer;\">Causal Dynamic Bayesian Networks for Simulation Metamodeling<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">2023 Winter Simulation Conference (WSC), <\/span><span class=\"tp_pub_additional_pages\">pp. 746-757, <\/span><span class=\"tp_pub_additional_year\">2023<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_334\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('334','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_334\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('334','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_334\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('334','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_334\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{test,<br \/>\r\ntitle = {Causal Dynamic Bayesian Networks for Simulation Metamodeling},<br \/>\r\nauthor = {Pracheta Amaranath and Peter J. Haas and David Jensen and Sam Witty},<br \/>\r\nurl = {https:\/\/ieeexplore.ieee.org\/abstract\/document\/10407746},<br \/>\r\ndoi = {10.1109\/WSC60868.2023.10407746},<br \/>\r\nyear  = {2023},<br \/>\r\ndate = {2023-12-10},<br \/>\r\njournal = {2023 Winter Simulation Conference (WSC)},<br \/>\r\npages = {746-757},<br \/>\r\nabstract = {A traditional metamodel for a discrete-event simulation approximates a real-valued performance measure as a function of the input-parameter values. We introduce a novel class of metamodels based on modular dynamic Bayesian networks (MDBNs), a subclass of probabilistic graphical models which can be used to efficiently answer a rich class of probabilistic and causal queries (PCQs). Such queries represent the joint probability distribution of the system state at multiple time points, given observations of, and interventions on, other state variables and input parameters. This paper is a first demonstration of how the extensive theory and technology of causal graphical models can be used to enhance simulation metamodeling. We demonstrate this potential by showing how a single MDBN for an M\/M\/1 queue can be learned from simulation data and then be used to quickly and accurately answer a variety of PCQs, most of which are out-of-scope for existing metamodels.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('334','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_334\" style=\"display:none;\"><div class=\"tp_abstract_entry\">A traditional metamodel for a discrete-event simulation approximates a real-valued performance measure as a function of the input-parameter values. We introduce a novel class of metamodels based on modular dynamic Bayesian networks (MDBNs), a subclass of probabilistic graphical models which can be used to efficiently answer a rich class of probabilistic and causal queries (PCQs). Such queries represent the joint probability distribution of the system state at multiple time points, given observations of, and interventions on, other state variables and input parameters. This paper is a first demonstration of how the extensive theory and technology of causal graphical models can be used to enhance simulation metamodeling. We demonstrate this potential by showing how a single MDBN for an M\/M\/1 queue can be learned from simulation data and then be used to quickly and accurately answer a variety of PCQs, most of which are out-of-scope for existing metamodels.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('334','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_334\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/10407746\" title=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/10407746\" target=\"_blank\">https:\/\/ieeexplore.ieee.org\/abstract\/document\/10407746<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1109\/WSC60868.2023.10407746\" title=\"Follow DOI:10.1109\/WSC60868.2023.10407746\" target=\"_blank\">doi:10.1109\/WSC60868.2023.10407746<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('334','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Juelin Liu, Sandeep Polisetty, Hui Guan, Marco Serafini<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('333','tp_abstract')\" style=\"cursor:pointer;\">GraphMini: Accelerating Graph Pattern Matching Using Auxiliary Graphs<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">International Conference on Parallel Architectures and Compilation Techniques, <\/span><span class=\"tp_pub_additional_year\">2023<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_333\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('333','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_333\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('333','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_333\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('333','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_333\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{liu2023graphmini,<br \/>\r\ntitle = {GraphMini: Accelerating Graph Pattern Matching Using Auxiliary Graphs},<br \/>\r\nauthor = {Juelin Liu and Sandeep Polisetty and Hui Guan and Marco Serafini},<br \/>\r\nurl = {https:\/\/marcoserafini.github.io\/assets\/pdf\/GraphMini.pdf},<br \/>\r\nyear  = {2023},<br \/>\r\ndate = {2023-11-01},<br \/>\r\njournal = {International Conference on Parallel Architectures and Compilation Techniques},<br \/>\r\nabstract = {Graph pattern matching is a fundamental problem encountered by many common graph mining tasks and the basic building block of several graph mining systems. This paper explores for the first time how to proactively prune graphs to speed up graph pattern matching by leveraging the structure of the query pattern and the input graph. We propose building auxiliary graphs, which are different pruned versions of the graph, during query execution. This requires careful balancing between the upfront cost of building and managing auxiliary graphs and the gains of faster set operations. To this end, we propose GraphMini, a new system that uses query compilation and a new cost model to minimize the cost of building and maintaining auxiliary graphs and maximize gains. Our evaluation shows that using GraphMini can achieve one order of magnitude speedup compared to state-of-the-art subgraph enumeration systems on commonly used benchmarks.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('333','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_333\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Graph pattern matching is a fundamental problem encountered by many common graph mining tasks and the basic building block of several graph mining systems. This paper explores for the first time how to proactively prune graphs to speed up graph pattern matching by leveraging the structure of the query pattern and the input graph. We propose building auxiliary graphs, which are different pruned versions of the graph, during query execution. This requires careful balancing between the upfront cost of building and managing auxiliary graphs and the gains of faster set operations. To this end, we propose GraphMini, a new system that uses query compilation and a new cost model to minimize the cost of building and maintaining auxiliary graphs and maximize gains. Our evaluation shows that using GraphMini can achieve one order of magnitude speedup compared to state-of-the-art subgraph enumeration systems on commonly used benchmarks.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('333','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_333\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/marcoserafini.github.io\/assets\/pdf\/GraphMini.pdf\" title=\"https:\/\/marcoserafini.github.io\/assets\/pdf\/GraphMini.pdf\" target=\"_blank\">https:\/\/marcoserafini.github.io\/assets\/pdf\/GraphMini.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('333','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Francisco Castro, Sahitya Raipura, Heather Conboy, Peter J Haas, Leon Osterweil, Ivon Arroyo<\/p><p class=\"tp_pub_title\">Piloting an Interactive Ethics and Responsible Computing Learning Environment in Undergraduate CS Courses <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Proceedings of the 54th ACM Technical Symposium on Computer Science Education, <\/span><span class=\"tp_pub_additional_year\">2023<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_332\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('332','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_332\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{castro2023piloting,<br \/>\r\ntitle = {Piloting an Interactive Ethics and Responsible Computing Learning Environment in Undergraduate CS Courses},<br \/>\r\nauthor = {Francisco Castro and Sahitya Raipura and Heather Conboy and Peter J Haas and Leon Osterweil and Ivon Arroyo},<br \/>\r\nyear  = {2023},<br \/>\r\ndate = {2023-11-01},<br \/>\r\njournal = {Proceedings of the 54th ACM Technical Symposium on Computer Science Education},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('332','tp_bibtex')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Wang Cen, Peter J Haas<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('331','tp_abstract')\" style=\"cursor:pointer;\">NIM: Generative Neural Networks for Automated Modeling and Generation of Simulation Inputs<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">ACM Transactions on Modeling and Computer Simulation, <\/span><span class=\"tp_pub_additional_volume\">33 <\/span><span class=\"tp_pub_additional_number\">(3), <\/span><span class=\"tp_pub_additional_pages\">pp. 1\u201326, <\/span><span class=\"tp_pub_additional_year\">2023<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_331\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('331','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_331\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('331','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_331\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{cen2023nim,<br \/>\r\ntitle = {NIM: Generative Neural Networks for Automated Modeling and Generation of Simulation Inputs},<br \/>\r\nauthor = {Wang Cen and Peter J Haas},<br \/>\r\nyear  = {2023},<br \/>\r\ndate = {2023-11-01},<br \/>\r\njournal = {ACM Transactions on Modeling and Computer Simulation},<br \/>\r\nvolume = {33},<br \/>\r\nnumber = {3},<br \/>\r\npages = {1--26},<br \/>\r\nabstract = {Fitting stochastic input-process models to data and then sampling from them are key steps in a simulation study, but highly challenging to non-experts. We present Neural Input Modeling (NIM), a generative-neural-network (GNN) framework that exploits modern data-rich environments to automatically capture simulation input processes and then generate samples from them. The basic GNN that we develop, called NIM-VL, comprises (i) a variational-autoencoder (VAE) architecture that learns the probability distribution of the input data while avoiding overfitting and (ii) Long Short-Term Memory (LSTM) components that concisely capture statistical dependencies across time. We show how the basic GNN architecture can be modified to exploit known distributional properties\u2014such as i.i.d. structure, nonnegativity, and multimodality\u2014in order to increase accuracy and speed, as well as to handle multivariate processes, categorical-valued processes, and extrapolation beyond the training data for certain nonstationary processes. We also introduce an extension to NIM called \u201cconditional\u201d NIM (CNIM), which can learn from training data obtained under various realizations of a (possibly time-series-valued) stochastic \u201ccondition\u201d, such as temperature or inflation rate, and then generate sample paths given a value of the condition not seen in the training data. This enables users to simulate a system under a specific working condition by customizing a pre-trained model; CNIM also facilitates what-if analysis. Extensive experiments show the efficacy of our approach. NIM can thus help overcome one of the key barriers to simulation for non-experts.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('331','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_331\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Fitting stochastic input-process models to data and then sampling from them are key steps in a simulation study, but highly challenging to non-experts. We present Neural Input Modeling (NIM), a generative-neural-network (GNN) framework that exploits modern data-rich environments to automatically capture simulation input processes and then generate samples from them. The basic GNN that we develop, called NIM-VL, comprises (i) a variational-autoencoder (VAE) architecture that learns the probability distribution of the input data while avoiding overfitting and (ii) Long Short-Term Memory (LSTM) components that concisely capture statistical dependencies across time. We show how the basic GNN architecture can be modified to exploit known distributional properties\u2014such as i.i.d. structure, nonnegativity, and multimodality\u2014in order to increase accuracy and speed, as well as to handle multivariate processes, categorical-valued processes, and extrapolation beyond the training data for certain nonstationary processes. We also introduce an extension to NIM called \u201cconditional\u201d NIM (CNIM), which can learn from training data obtained under various realizations of a (possibly time-series-valued) stochastic \u201ccondition\u201d, such as temperature or inflation rate, and then generate sample paths given a value of the condition not seen in the training data. This enables users to simulate a system under a specific working condition by customizing a pre-trained model; CNIM also facilitates what-if analysis. Extensive experiments show the efficacy of our approach. NIM can thus help overcome one of the key barriers to simulation for non-experts.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('331','tp_abstract')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Brian Hentschel, Peter J Haas, Yuanyuan Tian<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('316','tp_abstract')\" style=\"cursor:pointer;\">Exact PPS Sampling with Bounded Sample Size<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Information Processing Letters, <\/span><span class=\"tp_pub_additional_volume\">182 <\/span>, <span class=\"tp_pub_additional_year\">2023<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_316\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('316','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_316\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('316','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_316\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('316','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_316\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{hentschel2023exact,<br \/>\r\ntitle = {Exact PPS Sampling with Bounded Sample Size},<br \/>\r\nauthor = {Brian Hentschel and Peter J Haas and Yuanyuan Tian},<br \/>\r\nurl = {https:\/\/www.sciencedirect.com\/science\/article\/pii\/S002001902300025X?casa_token=znQuScsK51AAAAAA:0EEwp2QPaW7PetEN0NGy8wN0dsOC6Thx9voXduNx6ivnyMIg0WVuMI83TVVt2yiXqu0n4Atnwg},<br \/>\r\nyear  = {2023},<br \/>\r\ndate = {2023-08-01},<br \/>\r\njournal = {Information Processing Letters},<br \/>\r\nvolume = {182},<br \/>\r\nabstract = {Probability proportional to size (PPS) sampling schemes with a target sample size aim to produce a sample comprising a specified number n of items while ensuring that each item in the population appears in the sample with a probability proportional to its specified \"weight\" (also called its \"size\"). These two objectives, however, cannot always be achieved simultaneously. Existing PPS schemes prioritize control of the sample size, violating the PPS property if necessary. We provide a new PPS scheme that allows a different trade-off: our method enforces the PPS property at all times while ensuring that the sample size never exceeds the target value n. The sample size is exactly equal to n if possible, and otherwise has maximal expected value and minimal variance. Thus we bound the sample size, thereby avoiding storage overflows and helping to control the time required for analytics over the sample, while allowing the user complete control over the sample contents. The method is both simple to implement and efficient, being a one-pass streaming algorithm with an amortized processing time of O(1) per item.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('316','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_316\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Probability proportional to size (PPS) sampling schemes with a target sample size aim to produce a sample comprising a specified number n of items while ensuring that each item in the population appears in the sample with a probability proportional to its specified &quot;weight&quot; (also called its &quot;size&quot;). These two objectives, however, cannot always be achieved simultaneously. Existing PPS schemes prioritize control of the sample size, violating the PPS property if necessary. We provide a new PPS scheme that allows a different trade-off: our method enforces the PPS property at all times while ensuring that the sample size never exceeds the target value n. The sample size is exactly equal to n if possible, and otherwise has maximal expected value and minimal variance. Thus we bound the sample size, thereby avoiding storage overflows and helping to control the time required for analytics over the sample, while allowing the user complete control over the sample contents. The method is both simple to implement and efficient, being a one-pass streaming algorithm with an amortized processing time of O(1) per item.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('316','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_316\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S002001902300025X?casa_token=znQuScsK51AAAAAA:0EEwp2QPaW7PetEN0NGy8wN0dsOC6Thx9voXduNx6ivnyMIg0WVuMI83TVVt2yiXqu0n4Atnwg\" title=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S002001902300025X?casa_token=z[...]\" target=\"_blank\">https:\/\/www.sciencedirect.com\/science\/article\/pii\/S002001902300025X?casa_token=z[...]<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('316','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Chaitra Gopalappa, Hari Balasubramanian, Peter J Haas<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('317','tp_abstract')\" style=\"cursor:pointer;\">A new mixed agent-based network and compartmental simulation framework for joint modeling of related infectious diseases- application to sexually transmitted infections<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Infectious Disease Modelling, <\/span><span class=\"tp_pub_additional_volume\">8 <\/span><span class=\"tp_pub_additional_number\">(1), <\/span><span class=\"tp_pub_additional_pages\">pp. 84-100, <\/span><span class=\"tp_pub_additional_year\">2023<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_317\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('317','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_317\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('317','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_317\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('317','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_317\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{gopalappa2023new,<br \/>\r\ntitle = {A new mixed agent-based network and compartmental simulation framework for joint modeling of related infectious diseases- application to sexually transmitted infections},<br \/>\r\nauthor = {Chaitra Gopalappa and Hari Balasubramanian and Peter J Haas},<br \/>\r\ndoi = {https:\/\/doi.org\/10.1016\/j.idm.2022.12.003},<br \/>\r\nyear  = {2023},<br \/>\r\ndate = {2023-03-01},<br \/>\r\njournal = {Infectious Disease Modelling},<br \/>\r\nvolume = {8},<br \/>\r\nnumber = {1},<br \/>\r\npages = {84-100},<br \/>\r\nabstract = {Background<br \/>\r\nA model that jointly simulates infectious diseases with common modes of transmission can serve as a decision-analytic tool to identify optimal intervention combinations for overall disease prevention. In the United States, sexually transmitted infections (STIs) are a huge economic burden, with a large fraction of the burden attributed to HIV. Data also show interactions between HIV and other sexually transmitted infections (STIs), such as higher risk of acquisition and progression of co-infections among persons with HIV compared to persons without. However, given the wide range in prevalence and incidence burdens of STIs, current compartmental or agent-based network simulation methods alone are insufficient or computationally burdensome for joint disease modeling. Further, causal factors for higher risk of coinfection could be both behavioral (i.e., compounding effects of individual behaviors, network structures, and care behaviors) and biological (i.e., presence of one disease can biologically increase the risk of another). However, the data on the fraction attributed to each are limited.<br \/>\r\n<br \/>\r\nMethods<br \/>\r\nWe present a new mixed agent-based compartmental (MAC) framework for jointly modeling STIs. It uses a combination of a new agent-based evolving network modeling (ABENM) technique for lower-prevalence diseases and compartmental modeling for higher-prevalence diseases. As a demonstration, we applied MAC to simulate lower-prevalence HIV in the United States and a higher-prevalence hypothetical Disease 2, using a range of transmission and progression rates to generate burdens replicative of the wide range of STIs. We simulated sexual transmissions among heterosexual males, heterosexual females, and men who have sex with men (men only and men and women). Setting the biological risk of co-infection to zero, we conducted numerical analyses to evaluate the influence of behavioral factors alone on disease dynamics.<br \/>\r\n<br \/>\r\nResults<br \/>\r\nThe contribution of behavioral factors to risk of coinfection was sensitive to disease burden, care access, and population heterogeneity and mixing. The contribution of behavioral factors was generally lower than observed risk of coinfections for the range of hypothetical prevalence studied here, suggesting potential role of biological factors, that should be investigated further specific to an STI.<br \/>\r\n<br \/>\r\nConclusions<br \/>\r\nThe purpose of this study is to present a new simulation technique for jointly modeling infectious diseases that have common modes of transmission but varying epidemiological features. The numerical analysis serves as proof-of-concept for the application to STIs. Interactions between diseases are influenced by behavioral factors, are sensitive to care access and population features, and are likely exacerbated by biological factors. Social and economic conditions are among key drivers of behaviors that increase STI transmission, and thus, structural interventions are a key part of behavioral interventions. Joint modeling of diseases helps comprehensively simulate behavioral and biological factors of disease interactions to evaluate the true impact of common structural interventions on overall disease prevention. The new simulation framework is especially suited to simulate behavior as a function of social determinants, and further, to identify optimal combinations of common structural and disease-specific interventions.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('317','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_317\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Background<br \/>\r\nA model that jointly simulates infectious diseases with common modes of transmission can serve as a decision-analytic tool to identify optimal intervention combinations for overall disease prevention. In the United States, sexually transmitted infections (STIs) are a huge economic burden, with a large fraction of the burden attributed to HIV. Data also show interactions between HIV and other sexually transmitted infections (STIs), such as higher risk of acquisition and progression of co-infections among persons with HIV compared to persons without. However, given the wide range in prevalence and incidence burdens of STIs, current compartmental or agent-based network simulation methods alone are insufficient or computationally burdensome for joint disease modeling. Further, causal factors for higher risk of coinfection could be both behavioral (i.e., compounding effects of individual behaviors, network structures, and care behaviors) and biological (i.e., presence of one disease can biologically increase the risk of another). However, the data on the fraction attributed to each are limited.<br \/>\r\n<br \/>\r\nMethods<br \/>\r\nWe present a new mixed agent-based compartmental (MAC) framework for jointly modeling STIs. It uses a combination of a new agent-based evolving network modeling (ABENM) technique for lower-prevalence diseases and compartmental modeling for higher-prevalence diseases. As a demonstration, we applied MAC to simulate lower-prevalence HIV in the United States and a higher-prevalence hypothetical Disease 2, using a range of transmission and progression rates to generate burdens replicative of the wide range of STIs. We simulated sexual transmissions among heterosexual males, heterosexual females, and men who have sex with men (men only and men and women). Setting the biological risk of co-infection to zero, we conducted numerical analyses to evaluate the influence of behavioral factors alone on disease dynamics.<br \/>\r\n<br \/>\r\nResults<br \/>\r\nThe contribution of behavioral factors to risk of coinfection was sensitive to disease burden, care access, and population heterogeneity and mixing. The contribution of behavioral factors was generally lower than observed risk of coinfections for the range of hypothetical prevalence studied here, suggesting potential role of biological factors, that should be investigated further specific to an STI.<br \/>\r\n<br \/>\r\nConclusions<br \/>\r\nThe purpose of this study is to present a new simulation technique for jointly modeling infectious diseases that have common modes of transmission but varying epidemiological features. The numerical analysis serves as proof-of-concept for the application to STIs. Interactions between diseases are influenced by behavioral factors, are sensitive to care access and population features, and are likely exacerbated by biological factors. Social and economic conditions are among key drivers of behaviors that increase STI transmission, and thus, structural interventions are a key part of behavioral interventions. Joint modeling of diseases helps comprehensively simulate behavioral and biological factors of disease interactions to evaluate the true impact of common structural interventions on overall disease prevention. The new simulation framework is especially suited to simulate behavior as a function of social determinants, and further, to identify optimal combinations of common structural and disease-specific interventions.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('317','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_317\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.1016\/j.idm.2022.12.003\" title=\"Follow DOI:https:\/\/doi.org\/10.1016\/j.idm.2022.12.003\" target=\"_blank\">doi:https:\/\/doi.org\/10.1016\/j.idm.2022.12.003<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('317','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr>\r\n                    <td>\r\n                        <h3 class=\"tp_h3\" id=\"tp_h3_2022\">2022<\/h3>\r\n                    <\/td>\r\n                <\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Ryan McKenna, Brett Mullins, Daniel Sheldon, Gerome Miklau<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('328','tp_abstract')\" style=\"cursor:pointer;\">AIM: an adaptive and iterative mechanism for differentially private synthetic data<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the VLDB Endowment, <\/span><span class=\"tp_pub_additional_pages\">pp. 2599\u20132612, <\/span><span class=\"tp_pub_additional_year\">2022<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_328\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('328','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_328\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('328','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_328\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('328','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_328\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{McKenna2022AIM,<br \/>\r\ntitle = {AIM: an adaptive and iterative mechanism for differentially private synthetic data},<br \/>\r\nauthor = {Ryan McKenna and Brett Mullins and Daniel Sheldon and Gerome Miklau},<br \/>\r\ndoi = {https:\/\/doi.org\/10.14778\/3551793.3551817},<br \/>\r\nyear  = {2022},<br \/>\r\ndate = {2022-06-01},<br \/>\r\nbooktitle = {Proceedings of the VLDB Endowment},<br \/>\r\njournal = {Proceedings of the VLDB Endowment},<br \/>\r\nvolume = {15},<br \/>\r\nnumber = {11},<br \/>\r\npages = {2599\u20132612},<br \/>\r\nabstract = {We propose AIM, a new algorithm for differentially private synthetic data generation. AIM is a workload-adaptive algorithm within the paradigm of algorithms that first selects a set of queries, then privately measures those queries, and finally generates synthetic data from the noisy measurements. It uses a set of innovative features to iteratively select the most useful measurements, reflecting both their relevance to the workload and their value in approximating the input data. We also provide analytic expressions to bound per-query error with high probability which can be used to construct confidence intervals and inform users about the accuracy of generated data. We show empirically that AIM consistently outperforms a wide variety of existing mechanisms across a variety of experimental settings.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('328','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_328\" style=\"display:none;\"><div class=\"tp_abstract_entry\">We propose AIM, a new algorithm for differentially private synthetic data generation. AIM is a workload-adaptive algorithm within the paradigm of algorithms that first selects a set of queries, then privately measures those queries, and finally generates synthetic data from the noisy measurements. It uses a set of innovative features to iteratively select the most useful measurements, reflecting both their relevance to the workload and their value in approximating the input data. We also provide analytic expressions to bound per-query error with high probability which can be used to construct confidence intervals and inform users about the accuracy of generated data. We show empirically that AIM consistently outperforms a wide variety of existing mechanisms across a variety of experimental settings.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('328','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_328\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.14778\/3551793.3551817\" title=\"Follow DOI:https:\/\/doi.org\/10.14778\/3551793.3551817\" target=\"_blank\">doi:https:\/\/doi.org\/10.14778\/3551793.3551817<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('328','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Dave Archer, Michael A August, Georgios Bouloukakis, Christopher Davison, Mamadou H Diallo, Dhrubajyoti Ghosh, Christopher T Graves, Michael Hay, Xi He, Peeter Laud, Steve Lu, Ashwin Machanavajjhala, Sharad Mehrotra, Gerome Miklau, Alisa Pankova, Shantanu Sharma, Nalini Venkatasubramanian, Guoxi Wang, Roberto Yus<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('327','tp_abstract')\" style=\"cursor:pointer;\">Transitioning from testbeds to ships: an experience study in deploying the TIPPERS Internet of Things platform to the US Navy<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">The Journal of Defense Modeling and Simulation, <\/span><span class=\"tp_pub_additional_volume\">19 <\/span><span class=\"tp_pub_additional_number\">(3), <\/span><span class=\"tp_pub_additional_pages\">pp. 501-517, <\/span><span class=\"tp_pub_additional_year\">2022<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_327\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('327','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_327\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('327','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_327\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('327','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_327\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{Archer2022transition,<br \/>\r\ntitle = {Transitioning from testbeds to ships: an experience study in deploying the TIPPERS Internet of Things platform to the US Navy},<br \/>\r\nauthor = {Dave Archer and Michael A August and Georgios Bouloukakis and Christopher Davison and Mamadou H Diallo and Dhrubajyoti Ghosh and Christopher T Graves and Michael Hay, Xi He, Peeter Laud and Steve Lu and Ashwin Machanavajjhala and Sharad Mehrotra and Gerome Miklau and Alisa Pankova and Shantanu Sharma and Nalini Venkatasubramanian and Guoxi Wang and Roberto Yus},<br \/>\r\ndoi = {https:\/\/doi.org\/10.1177\/154851292095638},<br \/>\r\nyear  = {2022},<br \/>\r\ndate = {2022-07-01},<br \/>\r\njournal = {The Journal of Defense Modeling and Simulation},<br \/>\r\nvolume = {19},<br \/>\r\nnumber = {3},<br \/>\r\npages = {501-517},<br \/>\r\nabstract = {This paper describes the collaborative effort between privacy and security researchers at nine different institutions along with researchers at the Naval Information Warfare Center to deploy, test, and demonstrate privacy-preserving technologies in creating sensor-based awareness using the Internet of Things (IoT) aboard naval vessels in the context of the US Navy\u2019s Trident Warrior 2019 exercise. Funded by DARPA through the Brandeis program, the team built an integrated IoT data management middleware, entitled TIPPERS, that supports privacy by design and integrates a variety of Privacy Enhancing Technologies (PETs), including differential privacy, computation on encrypted data, and fine-grained policies. We describe the architecture of TIPPERS and its use in creating a smart ship that offers IoT-enabled services such as occupancy analysis, fall detection, detection of unauthorized access to spaces, and other situational awareness scenarios. We describe the privacy implications of creating IoT spaces that collect data that might include individuals\u2019 data (e.g., location) and analyze the tradeoff between privacy and utility of the supported PETs in this context.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('327','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_327\" style=\"display:none;\"><div class=\"tp_abstract_entry\">This paper describes the collaborative effort between privacy and security researchers at nine different institutions along with researchers at the Naval Information Warfare Center to deploy, test, and demonstrate privacy-preserving technologies in creating sensor-based awareness using the Internet of Things (IoT) aboard naval vessels in the context of the US Navy\u2019s Trident Warrior 2019 exercise. Funded by DARPA through the Brandeis program, the team built an integrated IoT data management middleware, entitled TIPPERS, that supports privacy by design and integrates a variety of Privacy Enhancing Technologies (PETs), including differential privacy, computation on encrypted data, and fine-grained policies. We describe the architecture of TIPPERS and its use in creating a smart ship that offers IoT-enabled services such as occupancy analysis, fall detection, detection of unauthorized access to spaces, and other situational awareness scenarios. We describe the privacy implications of creating IoT spaces that collect data that might include individuals\u2019 data (e.g., location) and analyze the tradeoff between privacy and utility of the supported PETs in this context.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('327','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_327\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.1177\/154851292095638\" title=\"Follow DOI:https:\/\/doi.org\/10.1177\/154851292095638\" target=\"_blank\">doi:https:\/\/doi.org\/10.1177\/154851292095638<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('327','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Sainyam Galhotra, Anna Fariha, Raoni Louren\u00e7o, Juliana Freire, Alexandra Meliou, Divesh Srivastava<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('323','tp_abstract')\" style=\"cursor:pointer;\">DataPrism: Exposing Disconnect between Data and Systems<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2022, <\/span><span class=\"tp_pub_additional_pages\">pp. 217-231, <\/span><span class=\"tp_pub_additional_year\">2022<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_323\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('323','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_323\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('323','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_323\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('323','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_323\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{galhotra2022dataprism,<br \/>\r\ntitle = {DataPrism: Exposing Disconnect between Data and Systems},<br \/>\r\nauthor = {Sainyam Galhotra and Anna Fariha and Raoni Louren\u00e7o and Juliana Freire and Alexandra Meliou and Divesh Srivastava},<br \/>\r\ndoi = {https:\/\/doi.org\/10.1145\/3514221.3517864},<br \/>\r\nyear  = {2022},<br \/>\r\ndate = {2022-06-01},<br \/>\r\nbooktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2022},<br \/>\r\npages = {217-231},<br \/>\r\nabstract = {As data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of data. E.g., a health-monitoring system that is designed under the assumption that weight is reported in lbs will malfunction when encountering weight reported in kilograms. Like software debugging, which aims to find bugs in the source code or runtime conditions, our goal is to debug data to identify potential sources of disconnect between the assumptions about some data and systems that operate on that data. We propose DataPrism, a framework to identify data properties (profiles) that are the root causes of performance degradation or failure of a data-driven system. Such identification is necessary to repair data and resolve the disconnect between data and systems. Our technique is based on causal reasoning through interventions: when a system malfunctions for a dataset, DataPrism alters the data profiles and observes changes in the system's behavior due to the alteration. Unlike statistical observational analysis that reports mere correlations, DataPrism reports causally verified root causes -- in terms of data profiles -- of the system malfunction. We empirically evaluate DataPrism on seven real-world and several synthetic data-driven systems that fail on certain datasets due to a diverse set of reasons. In all cases, DataPrism identifies the root causes precisely while requiring orders of magnitude fewer interventions than prior techniques.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('323','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_323\" style=\"display:none;\"><div class=\"tp_abstract_entry\">As data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of data. E.g., a health-monitoring system that is designed under the assumption that weight is reported in lbs will malfunction when encountering weight reported in kilograms. Like software debugging, which aims to find bugs in the source code or runtime conditions, our goal is to debug data to identify potential sources of disconnect between the assumptions about some data and systems that operate on that data. We propose DataPrism, a framework to identify data properties (profiles) that are the root causes of performance degradation or failure of a data-driven system. Such identification is necessary to repair data and resolve the disconnect between data and systems. Our technique is based on causal reasoning through interventions: when a system malfunctions for a dataset, DataPrism alters the data profiles and observes changes in the system's behavior due to the alteration. Unlike statistical observational analysis that reports mere correlations, DataPrism reports causally verified root causes -- in terms of data profiles -- of the system malfunction. We empirically evaluate DataPrism on seven real-world and several synthetic data-driven systems that fail on certain datasets due to a diverse set of reasons. In all cases, DataPrism identifies the root causes precisely while requiring orders of magnitude fewer interventions than prior techniques.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('323','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_323\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.1145\/3514221.3517864\" title=\"Follow DOI:https:\/\/doi.org\/10.1145\/3514221.3517864\" target=\"_blank\">doi:https:\/\/doi.org\/10.1145\/3514221.3517864<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('323','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Maliha Tashfia Islam, Anna Fariha, Alexandra Meliou, Babak Salimi<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('322','tp_abstract')\" style=\"cursor:pointer;\">Through the data management lens: Experimental analysis and evaluation of fair classification<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2022., <\/span><span class=\"tp_pub_additional_pages\">pp. 232-246, <\/span><span class=\"tp_pub_additional_year\">2022<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_322\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('322','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_322\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('322','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_322\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('322','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_322\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{islam2022through,<br \/>\r\ntitle = {Through the data management lens: Experimental analysis and evaluation of fair classification},<br \/>\r\nauthor = {Maliha Tashfia Islam and Anna Fariha and Alexandra Meliou and Babak Salimi},<br \/>\r\ndoi = {https:\/\/doi.org\/10.1145\/3514221.3517841},<br \/>\r\nyear  = {2022},<br \/>\r\ndate = {2022-06-01},<br \/>\r\nbooktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2022.},<br \/>\r\npages = {232-246},<br \/>\r\nabstract = {Classification, a heavily-studied data-driven machine learning task, drives an increasing number of prediction systems involving critical human decisions such as loan approval and criminal risk assessment. However, classifiers often demonstrate discriminatory behavior, especially when presented with biased data. Consequently, fairness in classification has emerged as a high-priority research area. Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness, including the topic of fair classification. The interdisciplinary efforts in fair classification, with machine learning research having the largest presence, have resulted in a large number of fairness notions and a wide range of approaches that have not been systematically evaluated and compared. In this paper, we contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, robustness to data errors, sensitivity to underlying ML model, data efficiency, and stability using a variety of metrics and real-world datasets. Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance. We also discuss general principles for choosing approaches suitable for different practical settings, and identify areas where data-management-centric solutions are likely to have the most impact.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('322','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_322\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Classification, a heavily-studied data-driven machine learning task, drives an increasing number of prediction systems involving critical human decisions such as loan approval and criminal risk assessment. However, classifiers often demonstrate discriminatory behavior, especially when presented with biased data. Consequently, fairness in classification has emerged as a high-priority research area. Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness, including the topic of fair classification. The interdisciplinary efforts in fair classification, with machine learning research having the largest presence, have resulted in a large number of fairness notions and a wide range of approaches that have not been systematically evaluated and compared. In this paper, we contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, robustness to data errors, sensitivity to underlying ML model, data efficiency, and stability using a variety of metrics and real-world datasets. Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance. We also discuss general principles for choosing approaches suitable for different practical settings, and identify areas where data-management-centric solutions are likely to have the most impact.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('322','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_322\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.1145\/3514221.3517841\" title=\"Follow DOI:https:\/\/doi.org\/10.1145\/3514221.3517841\" target=\"_blank\">doi:https:\/\/doi.org\/10.1145\/3514221.3517841<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('322','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, Zafeiria Moumoulidou<\/p><p class=\"tp_pub_title\">Improved approximation and scalability for fair max-min diversification <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">ICDT 2022, <\/span><span class=\"tp_pub_additional_year\">2022<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_resource_link\"><a id=\"tp_links_sh_321\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('321','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_321\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('321','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_321\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{addanki2022improved,<br \/>\r\ntitle = {Improved approximation and scalability for fair max-min diversification},<br \/>\r\nauthor = {Raghavendra Addanki and Andrew McGregor and Alexandra Meliou and Zafeiria Moumoulidou},<br \/>\r\nurl = {https:\/\/arxiv.org\/abs\/2201.06678},<br \/>\r\nyear  = {2022},<br \/>\r\ndate = {2022-01-01},<br \/>\r\nbooktitle = {ICDT 2022},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('321','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_321\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-arxiv\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/arxiv.org\/abs\/2201.06678\" title=\"https:\/\/arxiv.org\/abs\/2201.06678\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2201.06678<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('321','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Wang Cen, Peter J. Haas<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('320','tp_abstract')\" style=\"cursor:pointer;\">Enhanced Simulation Metamodeling via Graph and Generative Neural Networks<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Winter Simulation Conference, WSC 2022, Singapore, December 11-14, 2022, <\/span><span class=\"tp_pub_additional_pages\">pp. 2748-2759, <\/span><span class=\"tp_pub_additional_year\">2022<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_320\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('320','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_320\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('320','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_320\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('320','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_320\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{cen2022enhanced,<br \/>\r\ntitle = {Enhanced Simulation Metamodeling via Graph and Generative Neural Networks},<br \/>\r\nauthor = {Wang Cen and Peter J. Haas},<br \/>\r\ndoi = {https:\/\/doi.org\/10.1109\/WSC57314.2022.10015361},<br \/>\r\nyear  = {2022},<br \/>\r\ndate = {2022-01-01},<br \/>\r\nbooktitle = {Winter Simulation Conference, WSC 2022, Singapore, December 11-14, 2022},<br \/>\r\npages = {2748-2759},<br \/>\r\nabstract = {For large, complex simulation models, simulation metamodeling is crucial for enabling simulation-based-optimization under uncertainty in operational settings where results are needed quickly. We enhance simulation metamodeling in two important ways. First, we use graph neural networks (GrNN) to allow the graphical structure of a simulation model to be treated as a metamodel input parameter that can be varied along with real-valued and integer-ordered inputs. Second, we combine GrNNs with generative neural networks so that a metamodel can rapidly produce not only a summary statistic like E[Y] , but also a sequence of i.i.d. samples of Y or even a stochastic process that mimics dynamic simulation outputs. Thus a single metamodel can be used to estimate multiple statistics for multiple performance measures. Our metamodels can potentially serve as surrogate models in digital-twin settings. Preliminary experiments indicate the promise of our approach.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('320','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_320\" style=\"display:none;\"><div class=\"tp_abstract_entry\">For large, complex simulation models, simulation metamodeling is crucial for enabling simulation-based-optimization under uncertainty in operational settings where results are needed quickly. We enhance simulation metamodeling in two important ways. First, we use graph neural networks (GrNN) to allow the graphical structure of a simulation model to be treated as a metamodel input parameter that can be varied along with real-valued and integer-ordered inputs. Second, we combine GrNNs with generative neural networks so that a metamodel can rapidly produce not only a summary statistic like E[Y] , but also a sequence of i.i.d. samples of Y or even a stochastic process that mimics dynamic simulation outputs. Thus a single metamodel can be used to estimate multiple statistics for multiple performance measures. Our metamodels can potentially serve as surrogate models in digital-twin settings. Preliminary experiments indicate the promise of our approach.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('320','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_320\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.1109\/WSC57314.2022.10015361\" title=\"Follow DOI:https:\/\/doi.org\/10.1109\/WSC57314.2022.10015361\" target=\"_blank\">doi:https:\/\/doi.org\/10.1109\/WSC57314.2022.10015361<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('320','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Sneha Gathani, Madelon Hulsebos, James Gale, Peter J. Haas, \u00c7a\u011fatay Demiralp\r\n<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('319','tp_abstract')\" style=\"cursor:pointer;\">Augmenting Decision Making via Interactive What-If Analysis<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">12th Conference on Innovative Data Systems Research, CIDR 2022, Chaminade, CA, USA, January 9-12, 2022, <\/span><span class=\"tp_pub_additional_year\">2022<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_319\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('319','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_319\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('319','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_319\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('319','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_319\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{gathani2021augmenting,<br \/>\r\ntitle = {Augmenting Decision Making via Interactive What-If Analysis},<br \/>\r\nauthor = {Sneha Gathani and Madelon Hulsebos and James Gale and Peter J. Haas and \u00c7a\u011fatay Demiralp<br \/>\r\n},<br \/>\r\nurl = {https:\/\/www.cidrdb.org\/cidr2022\/papers\/p49-gathani.pdf},<br \/>\r\nyear  = {2022},<br \/>\r\ndate = {2022-01-09},<br \/>\r\nbooktitle = {12th Conference on Innovative Data Systems Research, CIDR 2022, Chaminade, CA, USA, January 9-12, 2022},<br \/>\r\nabstract = {The fundamental goal of business data analysis is to improve business decisions using data. Business users often make decisions to achieve key performance indicators (KPIs) such as increasing customer retention or sales, or decreasing costs. To discover the relationship between data attributes hypothesized to be drivers and those corresponding to KPIs of interest, business users currently need to perform lengthy exploratory analyses. This involves considering multitudes of combinations and scenarios and performing slicing, dicing, and transformations on the data accordingly, e.g., analyzing customer retention across quarters of the year or suggesting optimal media channels across strata of customers. However, the increasing complexity of datasets combined with the cognitive limitations of humans makes it challenging to carry over multiple hypotheses, even for simple datasets. Therefore mentally performing such analyses is hard. Existing commercial tools either provide partial solutions or fail to cater to business users altogether. Here we argue for four functionalities to enable business users to interactively learn and reason about the relationships between sets of data attributes thereby facilitating data-driven decision making. We implement these functionalities in SystemD, an interactive visual data analysis system enabling business users to experiment with the data by asking what-if questions. We evaluate the system through three business use cases: marketing mix modeling, customer retention analysis, and deal closing analysis, and report on feedback from multiple business users. Users find the SystemD functionalities highly useful for quick testing and validation of their hypotheses around their KPIs of interest, addressing their unmet analysis needs. The feedback also suggests that the UX design can be enhanced to further improve the understandability of these functionalities.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('319','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_319\" style=\"display:none;\"><div class=\"tp_abstract_entry\">The fundamental goal of business data analysis is to improve business decisions using data. Business users often make decisions to achieve key performance indicators (KPIs) such as increasing customer retention or sales, or decreasing costs. To discover the relationship between data attributes hypothesized to be drivers and those corresponding to KPIs of interest, business users currently need to perform lengthy exploratory analyses. This involves considering multitudes of combinations and scenarios and performing slicing, dicing, and transformations on the data accordingly, e.g., analyzing customer retention across quarters of the year or suggesting optimal media channels across strata of customers. However, the increasing complexity of datasets combined with the cognitive limitations of humans makes it challenging to carry over multiple hypotheses, even for simple datasets. Therefore mentally performing such analyses is hard. Existing commercial tools either provide partial solutions or fail to cater to business users altogether. Here we argue for four functionalities to enable business users to interactively learn and reason about the relationships between sets of data attributes thereby facilitating data-driven decision making. We implement these functionalities in SystemD, an interactive visual data analysis system enabling business users to experiment with the data by asking what-if questions. We evaluate the system through three business use cases: marketing mix modeling, customer retention analysis, and deal closing analysis, and report on feedback from multiple business users. Users find the SystemD functionalities highly useful for quick testing and validation of their hypotheses around their KPIs of interest, addressing their unmet analysis needs. The feedback also suggests that the UX design can be enhanced to further improve the understandability of these functionalities.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('319','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_319\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/www.cidrdb.org\/cidr2022\/papers\/p49-gathani.pdf\" title=\"https:\/\/www.cidrdb.org\/cidr2022\/papers\/p49-gathani.pdf\" target=\"_blank\">https:\/\/www.cidrdb.org\/cidr2022\/papers\/p49-gathani.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('319','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Azza Abouzied, Peter J Haas, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('318','tp_abstract')\" style=\"cursor:pointer;\">In-Database Decision Support: Opportunities and Challenges<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">IEEE Data Engineering Bulletin, <\/span><span class=\"tp_pub_additional_volume\">45 <\/span><span class=\"tp_pub_additional_number\">(3), <\/span><span class=\"tp_pub_additional_pages\">pp. 102-115, <\/span><span class=\"tp_pub_additional_year\">2022<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_318\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('318','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_318\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('318','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_318\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('318','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_318\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{abouzied2022database,<br \/>\r\ntitle = {In-Database Decision Support: Opportunities and Challenges},<br \/>\r\nauthor = {Azza Abouzied and Peter J Haas and Alexandra Meliou},<br \/>\r\nurl = {https:\/\/people.cs.umass.edu\/~phaas\/files\/DEBulletin2022.pdf},<br \/>\r\nyear  = {2022},<br \/>\r\ndate = {2022-09-01},<br \/>\r\njournal = {IEEE Data Engineering Bulletin},<br \/>\r\nvolume = {45},<br \/>\r\nnumber = {3},<br \/>\r\npages = {102-115},<br \/>\r\nabstract = {Decision makers in a broad range of domains, such as finance, transportation, manufacturing, and healthcare, often need to derive optimal decisions given a set of constraints and objectives. Traditional solutions to such constrained optimization problems are typically application-specific, complex, and do not generalize. Further, the usual workflow requires slow, cumbersome, and error-prone data movement between a database and predictive-modeling and optimization packages. All of these problems are ex- acerbated by the unprecedented size of modern data-intensive optimization problems. The emerging re- search area of in-database prescriptive analytics aims to provide seamless domain-independent, declar- ative, and scalable approaches powered by the system where the data typically resides: the database. Integrating optimization with database technology opens up prescriptive analytics to a much broader community, amplifying its benefits. In the context of our prior and ongoing work in this area, we discuss some strategies for addressing key challenges related to usability, scalability, data uncertainty, dynamic environments with changing data and models, and the need to support decision-making agents. We in- dicate how deep integration between the DBMS, predictive models, and optimization software creates opportunities for rich prescriptive-query functionality with good scalability and performance.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('318','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_318\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Decision makers in a broad range of domains, such as finance, transportation, manufacturing, and healthcare, often need to derive optimal decisions given a set of constraints and objectives. Traditional solutions to such constrained optimization problems are typically application-specific, complex, and do not generalize. Further, the usual workflow requires slow, cumbersome, and error-prone data movement between a database and predictive-modeling and optimization packages. All of these problems are ex- acerbated by the unprecedented size of modern data-intensive optimization problems. The emerging re- search area of in-database prescriptive analytics aims to provide seamless domain-independent, declar- ative, and scalable approaches powered by the system where the data typically resides: the database. Integrating optimization with database technology opens up prescriptive analytics to a much broader community, amplifying its benefits. In the context of our prior and ongoing work in this area, we discuss some strategies for addressing key challenges related to usability, scalability, data uncertainty, dynamic environments with changing data and models, and the need to support decision-making agents. We in- dicate how deep integration between the DBMS, predictive models, and optimization software creates opportunities for rich prescriptive-query functionality with good scalability and performance.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('318','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_318\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/people.cs.umass.edu\/~phaas\/files\/DEBulletin2022.pdf\" title=\"https:\/\/people.cs.umass.edu\/~phaas\/files\/DEBulletin2022.pdf\" target=\"_blank\">https:\/\/people.cs.umass.edu\/~phaas\/files\/DEBulletin2022.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('318','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr>\r\n                    <td>\r\n                        <h3 class=\"tp_h3\" id=\"tp_h3_2021\">2021<\/h3>\r\n                    <\/td>\r\n                <\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Fei Song, Khaled Zaouk, Chenghao Lyu, Arnab Sinha, Qi Fan, Yanlei Diao, Prashant J Shenoy<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('315','tp_abstract')\" style=\"cursor:pointer;\">Spark-based Cloud Data Analytics using Multi-Objective Optimization<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">37th IEEE International Conference on Data Engineering, ICDE 2021, \r\n Chania, Greece, April 19-22, 2021, <\/span><span class=\"tp_pub_additional_pages\">pp. 396\u2013407, <\/span><span class=\"tp_pub_additional_publisher\">IEEE, <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_315\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('315','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_315\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('315','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_315\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('315','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_315\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{DBLP:conf\/icde\/SongZLSFDS21,<br \/>\r\ntitle = {Spark-based Cloud Data Analytics using Multi-Objective Optimization},<br \/>\r\nauthor = {Fei Song and Khaled Zaouk and Chenghao Lyu and Arnab Sinha and Qi Fan and Yanlei Diao and Prashant J Shenoy},<br \/>\r\nurl = {https:\/\/doi.org\/10.1109\/ICDE51399.2021.00041<br \/>\r\nhttp:\/\/scalla.cs.umass.edu\/papers\/udao2020.pdf},<br \/>\r\ndoi = {10.1109\/ICDE51399.2021.00041},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-01-01},<br \/>\r\nbooktitle = {37th IEEE International Conference on Data Engineering, ICDE 2021, <br \/>\r\n Chania, Greece, April 19-22, 2021},<br \/>\r\npages = {396--407},<br \/>\r\npublisher = {IEEE},<br \/>\r\nabstract = {Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take task objectives such as user performance goals and budgetary constraints and automatically configure an analytic job to achieve these objectives. This paper presents UDAO, a Spark-based Unified Data Analytics Optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of configurations to reveal tradeoffs between different objectives, recommends a new Spark configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. Compared to Ottertune, a state-of-the-art performance tuning system, UDAO recommends Spark configurations that yield 26%-49% reduction of running time of the TPCx-BB benchmark while adapting to different user preferences on multiple objectives.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('315','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_315\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take task objectives such as user performance goals and budgetary constraints and automatically configure an analytic job to achieve these objectives. This paper presents UDAO, a Spark-based Unified Data Analytics Optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of configurations to reveal tradeoffs between different objectives, recommends a new Spark configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. Compared to Ottertune, a state-of-the-art performance tuning system, UDAO recommends Spark configurations that yield 26%-49% reduction of running time of the TPCx-BB benchmark while adapting to different user preferences on multiple objectives.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('315','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_315\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1109\/ICDE51399.2021.00041\" title=\"https:\/\/doi.org\/10.1109\/ICDE51399.2021.00041\" target=\"_blank\">https:\/\/doi.org\/10.1109\/ICDE51399.2021.00041<\/a><\/li><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/scalla.cs.umass.edu\/papers\/udao2020.pdf\" title=\"http:\/\/scalla.cs.umass.edu\/papers\/udao2020.pdf\" target=\"_blank\">http:\/\/scalla.cs.umass.edu\/papers\/udao2020.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1109\/ICDE51399.2021.00041\" title=\"Follow DOI:10.1109\/ICDE51399.2021.00041\" target=\"_blank\">doi:10.1109\/ICDE51399.2021.00041<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('315','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Anna Fariha, Ashish Tiwari, Alexandra Meliou, Arjun Radhakrishna, Sumit Gulwani<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('311','tp_abstract')\" style=\"cursor:pointer;\">CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_311\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('311','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_311\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('311','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_311\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('311','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_311\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{FarihaTMRG21demo,<br \/>\r\ntitle = {CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning},<br \/>\r\nauthor = {Anna Fariha and Ashish Tiwari and Alexandra Meliou and Arjun Radhakrishna and Sumit Gulwani},<br \/>\r\nurl = {https:\/\/afariha.github.io\/papers\/CoCo_SIGMOD_2021_Demo.pdf},<br \/>\r\ndoi = {10.1145\/3448016.3452750},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-06-18},<br \/>\r\nbooktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD)},<br \/>\r\nabstract = {Data profiling refers to the task of extracting technical metadata or profiles and has numerous applications such as data understanding, validation, integration, and cleaning. While a number of data profiling primitives exist in the literature, most of them are limited to categorical attributes. A few techniques consider numerical attributes; but, they either focus on simple relationships involving a pair of attributes (e.g., correlations) or convert the continuous semantics of numerical attributes to a discrete semantics, which results in information loss. To capture more complex relationships involving the numerical attributes, we developed a new data-profiling primitive called conformance constraints, which can model linear arithmetic relationships involving multiple numerical attributes.<br \/>\r\nWe present CoCo, a system that allows interactive discovery and exploration of Conformance Constraints for understanding trends involving the numerical attributes of a dataset, with a particular focus on the application of data cleaning. Through a simple interface, CoCo enables the user to guide conformance constraint discovery according to their preferences. The user can examine to what extent a new, possibly dirty, dataset satisfies or violates the discovered conformance constraints. Further, CoCo provides useful suggestions for cleaning dirty data tuples, where the user can interactively alter cell values, and verify by checking change in conformance constraint violation due to the alteration. We demonstrate how CoCo can help in understanding trends in the data and assist the users in interactive data cleaning, using conformance constraints},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('311','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_311\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Data profiling refers to the task of extracting technical metadata or profiles and has numerous applications such as data understanding, validation, integration, and cleaning. While a number of data profiling primitives exist in the literature, most of them are limited to categorical attributes. A few techniques consider numerical attributes; but, they either focus on simple relationships involving a pair of attributes (e.g., correlations) or convert the continuous semantics of numerical attributes to a discrete semantics, which results in information loss. To capture more complex relationships involving the numerical attributes, we developed a new data-profiling primitive called conformance constraints, which can model linear arithmetic relationships involving multiple numerical attributes.<br \/>\r\nWe present CoCo, a system that allows interactive discovery and exploration of Conformance Constraints for understanding trends involving the numerical attributes of a dataset, with a particular focus on the application of data cleaning. Through a simple interface, CoCo enables the user to guide conformance constraint discovery according to their preferences. The user can examine to what extent a new, possibly dirty, dataset satisfies or violates the discovered conformance constraints. Further, CoCo provides useful suggestions for cleaning dirty data tuples, where the user can interactively alter cell values, and verify by checking change in conformance constraint violation due to the alteration. We demonstrate how CoCo can help in understanding trends in the data and assist the users in interactive data cleaning, using conformance constraints<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('311','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_311\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/afariha.github.io\/papers\/CoCo_SIGMOD_2021_Demo.pdf\" title=\"https:\/\/afariha.github.io\/papers\/CoCo_SIGMOD_2021_Demo.pdf\" target=\"_blank\">https:\/\/afariha.github.io\/papers\/CoCo_SIGMOD_2021_Demo.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3448016.3452750\" title=\"Follow DOI:10.1145\/3448016.3452750\" target=\"_blank\">doi:10.1145\/3448016.3452750<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('311','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Anna Fariha, Ashish Tiwari, Arjun Radhakrishna, Sumit Gulwani, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('312','tp_abstract')\" style=\"cursor:pointer;\">Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_312\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('312','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_312\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('312','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_312\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('312','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_312\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{FarihaTMRG21b,<br \/>\r\ntitle = {Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems},<br \/>\r\nauthor = {Anna Fariha and Ashish Tiwari and Arjun Radhakrishna and Sumit Gulwani and Alexandra Meliou},<br \/>\r\nurl = {https:\/\/afariha.github.io\/papers\/Conformance_Constraints_SIGMOD_2021.pdf},<br \/>\r\ndoi = {10.1145\/3448016.3452795},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-06-18},<br \/>\r\nbooktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD)},<br \/>\r\nabstract = {The reliability of inferences made by data-driven systems hinges on the data\u2019s continued conformance to the systems\u2019 initial settings and assumptions. When serving data (on which we want to apply inference) deviates from the profile of the initial training data, the outcome of inference becomes unreliable. We introduce conformance constraints, a new data profiling primitive tailored towards quantifying the degree of non-conformance, which can effectively characterize if inference over that tuple is untrustworthy. Conformance constraints are constraints over certain arithmetic expressions (called projections) involving the numerical attributes of a dataset, which existing data profiling primitives such as functional dependencies and denial constraints cannot model. Our key finding is that projections that incur low variance on a dataset construct effective conformance constraints. This principle yields the surprising result that lowvariance components of a principal component analysis, which are usually discarded for dimensionality reduction, generate stronger conformance constraints than the high-variance components. Based on this result, we provide a highly scalable and efficient technique\u2014linear in data size and cubic in the number of attributes\u2014for discovering conformance constraints for a dataset. To measure the degree of a tuple\u2019s non-conformance with respect to a dataset, we propose a quantitative semantics that captures how much a tuple violates the conformance constraints of that dataset. We demonstrate the value of conformance constraints on two applications: trusted machine learning and data drift. We empirically show that conformance constraints offer mechanisms to (1) reliably detect tuples on which the inference of a machine-learned model should not be trusted, and (2) quantify data drift more accurately than the state of the art.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('312','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_312\" style=\"display:none;\"><div class=\"tp_abstract_entry\">The reliability of inferences made by data-driven systems hinges on the data\u2019s continued conformance to the systems\u2019 initial settings and assumptions. When serving data (on which we want to apply inference) deviates from the profile of the initial training data, the outcome of inference becomes unreliable. We introduce conformance constraints, a new data profiling primitive tailored towards quantifying the degree of non-conformance, which can effectively characterize if inference over that tuple is untrustworthy. Conformance constraints are constraints over certain arithmetic expressions (called projections) involving the numerical attributes of a dataset, which existing data profiling primitives such as functional dependencies and denial constraints cannot model. Our key finding is that projections that incur low variance on a dataset construct effective conformance constraints. This principle yields the surprising result that lowvariance components of a principal component analysis, which are usually discarded for dimensionality reduction, generate stronger conformance constraints than the high-variance components. Based on this result, we provide a highly scalable and efficient technique\u2014linear in data size and cubic in the number of attributes\u2014for discovering conformance constraints for a dataset. To measure the degree of a tuple\u2019s non-conformance with respect to a dataset, we propose a quantitative semantics that captures how much a tuple violates the conformance constraints of that dataset. We demonstrate the value of conformance constraints on two applications: trusted machine learning and data drift. We empirically show that conformance constraints offer mechanisms to (1) reliably detect tuples on which the inference of a machine-learned model should not be trusted, and (2) quantify data drift more accurately than the state of the art.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('312','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_312\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/afariha.github.io\/papers\/Conformance_Constraints_SIGMOD_2021.pdf\" title=\"https:\/\/afariha.github.io\/papers\/Conformance_Constraints_SIGMOD_2021.pdf\" target=\"_blank\">https:\/\/afariha.github.io\/papers\/Conformance_Constraints_SIGMOD_2021.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3448016.3452795\" title=\"Follow DOI:10.1145\/3448016.3452795\" target=\"_blank\">doi:10.1145\/3448016.3452795<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('312','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Zafeiria Moumoulidou, Andrew McGregor, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('313','tp_abstract')\" style=\"cursor:pointer;\">Diverse Data Selection under Fairness Constraints<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">International Conference on Database Theory, (ICDT), <\/span><span class=\"tp_pub_additional_pages\">pp. 11:1\u201311:25, <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_313\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('313','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_313\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('313','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_313\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('313','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_313\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{MoumoulidouMM21,<br \/>\r\ntitle = {Diverse Data Selection under Fairness Constraints},<br \/>\r\nauthor = {Zafeiria Moumoulidou and Andrew McGregor and Alexandra Meliou},<br \/>\r\nurl = {https:\/\/arxiv.org\/pdf\/2010.09141.pdf},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-03-23},<br \/>\r\nbooktitle = {International Conference on Database Theory, (ICDT)},<br \/>\r\npages = {11:1--11:25},<br \/>\r\nabstract = {Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe U of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified ki number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('313','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_313\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe U of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified ki number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('313','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_313\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/arxiv.org\/pdf\/2010.09141.pdf\" title=\"https:\/\/arxiv.org\/pdf\/2010.09141.pdf\" target=\"_blank\">https:\/\/arxiv.org\/pdf\/2010.09141.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('313','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Abhinav Jangda, Sandeep Polisetty, Arjun Guha, Marco Serafini<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('325','tp_abstract')\" style=\"cursor:pointer;\">Accelerating graph sampling for graph machine learning using GPUs<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems, <\/span><span class=\"tp_pub_additional_pages\">pp. 311\u2013326, <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_325\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('325','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_325\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('325','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_325\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('325','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_325\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{Jangda2021accelerating,<br \/>\r\ntitle = {Accelerating graph sampling for graph machine learning using GPUs},<br \/>\r\nauthor = {Abhinav Jangda and Sandeep Polisetty and Arjun Guha and Marco Serafini},<br \/>\r\ndoi = {https:\/\/doi.org\/10.1145\/3447786.3456244},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-04-01},<br \/>\r\nbooktitle = {EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems},<br \/>\r\npages = {311\u2013326},<br \/>\r\nabstract = {Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and Graph-SAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling.<br \/>\r\nSampling is an \"embarrassingly parallel\" problem and may appear to lend itself to GPU acceleration, but the irregularity of graphs makes it hard to use GPU resources effectively. This paper presents NextDoor, a system designed to effectively perform graph sampling on GPUs. NextDoor employs a new approach to graph sampling that we call transit-parallelism, which allows load balancing and caching of edges. NextDoor provides end-users with a high-level abstraction for writing a variety of graph sampling algorithms. We implement several graph sampling applications, and show that NextDoor runs them orders of magnitude faster than existing systems.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('325','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_325\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and Graph-SAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling.<br \/>\r\nSampling is an &quot;embarrassingly parallel&quot; problem and may appear to lend itself to GPU acceleration, but the irregularity of graphs makes it hard to use GPU resources effectively. This paper presents NextDoor, a system designed to effectively perform graph sampling on GPUs. NextDoor employs a new approach to graph sampling that we call transit-parallelism, which allows load balancing and caching of edges. NextDoor provides end-users with a high-level abstraction for writing a variety of graph sampling algorithms. We implement several graph sampling applications, and show that NextDoor runs them orders of magnitude faster than existing systems.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('325','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_325\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.1145\/3447786.3456244\" title=\"Follow DOI:https:\/\/doi.org\/10.1145\/3447786.3456244\" target=\"_blank\">doi:https:\/\/doi.org\/10.1145\/3447786.3456244<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('325','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Yifei Yang, Matt Youill, Matthew Woicik, Yizhou Liu, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('326','tp_abstract')\" style=\"cursor:pointer;\">FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the VLDB Endowment, <\/span><span class=\"tp_pub_additional_pages\">pp.  2101\u20132113, <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_326\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('326','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_326\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('326','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_326\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('326','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_326\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{yang2021flexpushdowndb,<br \/>\r\ntitle = {FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS},<br \/>\r\nauthor = {Yifei Yang and Matt Youill and Matthew Woicik and Yizhou Liu and Xiangyao Yu and Marco Serafini and Ashraf Aboulnaga and Michael Stonebraker},<br \/>\r\ndoi = {https:\/\/doi.org\/10.14778\/3476249.3476265},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-06-01},<br \/>\r\nbooktitle = {Proceedings of the VLDB Endowment},<br \/>\r\nvolume = {14},<br \/>\r\nnumber = {11},<br \/>\r\npages = { 2101\u20132113},<br \/>\r\nabstract = {Modern cloud databases adopt a storage-disaggregation architecture that separates the management of computation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Two solutions have been explored to mitigate the bottleneck: caching and computation pushdown. While both techniques can significantly reduce network traffic, existing DBMSs consider them as orthogonal techniques and support only one or the other, leaving potential performance benefits unexploited. In this paper we present FlexPushdownDB (FPDB), an OLAP cloud DBMS prototype that supports fine-grained hybrid query execution to combine the benefits of caching and computation pushdown in a storage-disaggregation architecture. We build a hybrid query executor based on a new concept called separable operators to combine the data from the cache and results from the pushdown processing. We also propose a novel Weighted-LFU cache replacement policy that takes into account the cost of pushdown computation. Our experimental evaluation on the Star Schema Benchmark shows that the hybrid execution outperforms both the conventional caching-only architecture and pushdown-only architecture by 2.2X. In the hybrid architecture, our experiments show that Weighted-LFU can outperform the baseline LFU by 37%},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('326','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_326\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Modern cloud databases adopt a storage-disaggregation architecture that separates the management of computation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Two solutions have been explored to mitigate the bottleneck: caching and computation pushdown. While both techniques can significantly reduce network traffic, existing DBMSs consider them as orthogonal techniques and support only one or the other, leaving potential performance benefits unexploited. In this paper we present FlexPushdownDB (FPDB), an OLAP cloud DBMS prototype that supports fine-grained hybrid query execution to combine the benefits of caching and computation pushdown in a storage-disaggregation architecture. We build a hybrid query executor based on a new concept called separable operators to combine the data from the cache and results from the pushdown processing. We also propose a novel Weighted-LFU cache replacement policy that takes into account the cost of pushdown computation. Our experimental evaluation on the Star Schema Benchmark shows that the hybrid execution outperforms both the conventional caching-only architecture and pushdown-only architecture by 2.2X. In the hybrid architecture, our experiments show that Weighted-LFU can outperform the baseline LFU by 37%<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('326','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_326\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.14778\/3476249.3476265\" title=\"Follow DOI:https:\/\/doi.org\/10.14778\/3476249.3476265\" target=\"_blank\">doi:https:\/\/doi.org\/10.14778\/3476249.3476265<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('326','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Ryan McKenna, Siddhant Pradhan, Daniel Sheldon, Gerome Miklau<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('329','tp_abstract')\" style=\"cursor:pointer;\">Relaxed marginal consistency for differentially private query answering<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Advances in Neural Information Processing Systems, <\/span><span class=\"tp_pub_additional_pages\">pp. 20696-20707, <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_329\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('329','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_329\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('329','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_329\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('329','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_329\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{McKenna2021relaxed,<br \/>\r\ntitle = {Relaxed marginal consistency for differentially private query answering},<br \/>\r\nauthor = {Ryan McKenna and Siddhant Pradhan and Daniel Sheldon and Gerome Miklau},<br \/>\r\nurl = {https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/acb55f9af76808c5fd5522dcdb519fde-Abstract.html},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-12-01},<br \/>\r\nbooktitle = {Advances in Neural Information Processing Systems},<br \/>\r\nvolume = {34},<br \/>\r\npages = {20696-20707},<br \/>\r\nabstract = {Many differentially private algorithms for answering database queries involve astep that reconstructs a discrete data distribution from noisy measurements. Thisprovides consistent query answers and reduces error, but often requires space thatgrows exponentially with dimension. PRIVATE-PGM is a recent approach that usesgraphical models to represent the data distribution, with complexity proportional tothat of exact marginal inference in a graphical model with structure determined bythe co-occurrence of variables in the noisy measurements. PRIVATE-PGM is highlyscalable for sparse measurements, but may fail to run in high dimensions with densemeasurements. We overcome the main scalability limitation of PRIVATE-PGMthrough a principled approach that relaxes consistency constraints in the estimationobjective. Our new approach works with many existing private query answeringalgorithms and improves scalability or accuracy with no privacy cost.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('329','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_329\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Many differentially private algorithms for answering database queries involve astep that reconstructs a discrete data distribution from noisy measurements. Thisprovides consistent query answers and reduces error, but often requires space thatgrows exponentially with dimension. PRIVATE-PGM is a recent approach that usesgraphical models to represent the data distribution, with complexity proportional tothat of exact marginal inference in a graphical model with structure determined bythe co-occurrence of variables in the noisy measurements. PRIVATE-PGM is highlyscalable for sparse measurements, but may fail to run in high dimensions with densemeasurements. We overcome the main scalability limitation of PRIVATE-PGMthrough a principled approach that relaxes consistency constraints in the estimationobjective. Our new approach works with many existing private query answeringalgorithms and improves scalability or accuracy with no privacy cost.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('329','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_329\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/acb55f9af76808c5fd5522dcdb519fde-Abstract.html\" title=\"https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/acb55f9af76808c5fd5522dcdb519fde-[...]\" target=\"_blank\">https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/acb55f9af76808c5fd5522dcdb519fde-[...]<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('329','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Ryan McKenna, Gerome Miklau, Daniel Sheldon<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('330','tp_abstract')\" style=\"cursor:pointer;\">Winning the NIST Contest: A scalable and general approach to differentially private synthetic data<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Journal of Privacy and Confidentiality, <\/span><span class=\"tp_pub_additional_volume\">11 <\/span><span class=\"tp_pub_additional_number\">(3), <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_330\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('330','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_330\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('330','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_330\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('330','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_330\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{McKenna2021winning,<br \/>\r\ntitle = {Winning the NIST Contest: A scalable and general approach to differentially private synthetic data},<br \/>\r\nauthor = {Ryan McKenna and Gerome Miklau and Daniel Sheldon},<br \/>\r\ndoi = {https:\/\/doi.org\/10.29012\/jpc.778},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-12-01},<br \/>\r\njournal = {Journal of Privacy and Confidentiality},<br \/>\r\nvolume = {11},<br \/>\r\nnumber = {3},<br \/>\r\nabstract = {We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('330','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_330\" style=\"display:none;\"><div class=\"tp_abstract_entry\">We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('330','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_330\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.29012\/jpc.778\" title=\"Follow DOI:https:\/\/doi.org\/10.29012\/jpc.778\" target=\"_blank\">doi:https:\/\/doi.org\/10.29012\/jpc.778<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('330','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Marco Serafini, Hui Guan<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('324','tp_abstract')\" style=\"cursor:pointer;\">Scalable graph neural network training: The case for sampling<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">ACM SIGOPS Operating Systems Review, <\/span><span class=\"tp_pub_additional_volume\">55 <\/span><span class=\"tp_pub_additional_number\">(1), <\/span><span class=\"tp_pub_additional_pages\">pp. 68-76, <\/span><span class=\"tp_pub_additional_year\">2021<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_324\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('324','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_324\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('324','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_324\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('324','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_324\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{serafini2021scalable,<br \/>\r\ntitle = {Scalable graph neural network training: The case for sampling},<br \/>\r\nauthor = {Marco Serafini and Hui Guan},<br \/>\r\ndoi = {https:\/\/doi.org\/10.1145\/3469379.3469387},<br \/>\r\nyear  = {2021},<br \/>\r\ndate = {2021-01-01},<br \/>\r\njournal = {ACM SIGOPS Operating Systems Review},<br \/>\r\nvolume = {55},<br \/>\r\nnumber = {1},<br \/>\r\npages = {68-76},<br \/>\r\nabstract = {Graph Neural Networks (GNNs) are a new and increasingly popular family of deep neural network architectures to perform learning on graphs. Training them efficiently is challenging due to the irregular nature of graph data. The problem becomes even more challenging when scaling to large graphs that exceed the capacity of single devices. Standard approaches to distributed DNN training, like data and model parallelism, do not directly apply to GNNs. Instead, two different approaches have emerged in the literature: whole-graph and sample-based training. In this paper, we review and compare the two approaches. Scalability is challenging with both approaches, but we make a case that research should focus on sample-based training since it is a more promising approach. Finally, we review recent systems supporting sample-based training.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('324','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_324\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Graph Neural Networks (GNNs) are a new and increasingly popular family of deep neural network architectures to perform learning on graphs. Training them efficiently is challenging due to the irregular nature of graph data. The problem becomes even more challenging when scaling to large graphs that exceed the capacity of single devices. Standard approaches to distributed DNN training, like data and model parallelism, do not directly apply to GNNs. Instead, two different approaches have emerged in the literature: whole-graph and sample-based training. In this paper, we review and compare the two approaches. Scalability is challenging with both approaches, but we make a case that research should focus on sample-based training since it is a more promising approach. Finally, we review recent systems supporting sample-based training.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('324','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_324\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/https:\/\/doi.org\/10.1145\/3469379.3469387\" title=\"Follow DOI:https:\/\/doi.org\/10.1145\/3469379.3469387\" target=\"_blank\">doi:https:\/\/doi.org\/10.1145\/3469379.3469387<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('324','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr>\r\n                    <td>\r\n                        <h3 class=\"tp_h3\" id=\"tp_h3_2020\">2020<\/h3>\r\n                    <\/td>\r\n                <\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Matteo Brucato, Nishant Yadav, Azza Abouzied, Peter J Haas, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('301','tp_abstract')\" style=\"cursor:pointer;\">Stochastic Package Queries in Probabilistic Databases<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the 2020 International Conference on Management of \r\n Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], \r\n June 14-19, 2020, <\/span><span class=\"tp_pub_additional_pages\">pp. 269\u2013283, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_301\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('301','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_301\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('301','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_301\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('301','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_301\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{DBLP:conf\/sigmod\/BrucatoYAHM20,<br \/>\r\ntitle = {Stochastic Package Queries in Probabilistic Databases},<br \/>\r\nauthor = {Matteo Brucato and Nishant Yadav and Azza Abouzied and Peter J Haas and Alexandra Meliou},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3318464.3389765<br \/>\r\nhttps:\/\/people.cs.umass.edu\/~matteo\/files\/3318464.3389765.pdf},<br \/>\r\ndoi = {10.1145\/3318464.3389765},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-06-15},<br \/>\r\nbooktitle = {Proceedings of the 2020 International Conference on Management of <br \/>\r\n Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], <br \/>\r\n June 14-19, 2020},<br \/>\r\npages = {269--283},<br \/>\r\ncrossref = {DBLP:conf\/sigmod\/2020},<br \/>\r\nabstract = {We provide methods for in-database support of decision making under uncertainty. Many important decision problems correspond to selecting a \"package\" (bag of tuples in a relational database) that jointly satisfy a set of constraints while minimizing some overall \"cost\" function; in most real-world problems, the data is uncertain. We provide methods for specifying---via a SQL extension---and processing stochastic package queries (SPQS), in order to solve optimization problems over uncertain data, right where the data resides. Prior work in stochastic programming uses Monte Carlo methods where the original stochastic optimization problem is approximated by a large deterministic optimization problem that incorporates many \"scenarios\", i.e., sample realizations of the uncertain data values. For large database tables, however, a huge number of scenarios is required, leading to poor performance and, often, failure of the solver software. We therefore provide a novel \u00dfs algorithm that, instead of trying to solve a large deterministic problem, seamlessly approximates it via a sequence of smaller problems defined over carefully crafted \"summaries\" of the scenarios that accelerate convergence to a feasible and near-optimal solution. Experimental results on our prototype system show that \u00dfs can be orders of magnitude faster than prior methods at finding feasible and high-quality packages.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('301','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_301\" style=\"display:none;\"><div class=\"tp_abstract_entry\">We provide methods for in-database support of decision making under uncertainty. Many important decision problems correspond to selecting a &quot;package&quot; (bag of tuples in a relational database) that jointly satisfy a set of constraints while minimizing some overall &quot;cost&quot; function; in most real-world problems, the data is uncertain. We provide methods for specifying---via a SQL extension---and processing stochastic package queries (SPQS), in order to solve optimization problems over uncertain data, right where the data resides. Prior work in stochastic programming uses Monte Carlo methods where the original stochastic optimization problem is approximated by a large deterministic optimization problem that incorporates many &quot;scenarios&quot;, i.e., sample realizations of the uncertain data values. For large database tables, however, a huge number of scenarios is required, leading to poor performance and, often, failure of the solver software. We therefore provide a novel \u00dfs algorithm that, instead of trying to solve a large deterministic problem, seamlessly approximates it via a sequence of smaller problems defined over carefully crafted &quot;summaries&quot; of the scenarios that accelerate convergence to a feasible and near-optimal solution. Experimental results on our prototype system show that \u00dfs can be orders of magnitude faster than prior methods at finding feasible and high-quality packages.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('301','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_301\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3318464.3389765\" title=\"https:\/\/doi.org\/10.1145\/3318464.3389765\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3318464.3389765<\/a><\/li><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/people.cs.umass.edu\/~matteo\/files\/3318464.3389765.pdf\" title=\"https:\/\/people.cs.umass.edu\/~matteo\/files\/3318464.3389765.pdf\" target=\"_blank\">https:\/\/people.cs.umass.edu\/~matteo\/files\/3318464.3389765.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3318464.3389765\" title=\"Follow DOI:10.1145\/3318464.3389765\" target=\"_blank\">doi:10.1145\/3318464.3389765<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('301','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Dan Zhang, Ryan McKenna, Ios Kotsogiannis, George Bissias, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau<\/p><p class=\"tp_pub_title\">\u03f5KTELO: A Framework for Defining Differentially Private  Computations <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">ACM Trans. Database Syst., <\/span><span class=\"tp_pub_additional_volume\">45 <\/span><span class=\"tp_pub_additional_number\">(1), <\/span><span class=\"tp_pub_additional_pages\">pp. 2:1\u20132:44, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_resource_link\"><a id=\"tp_links_sh_299\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('299','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_299\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('299','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_299\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/tods\/ZhangMKBHMM20,<br \/>\r\ntitle = {\u03f5KTELO: A Framework for Defining Differentially Private  Computations},<br \/>\r\nauthor = {Dan Zhang and Ryan McKenna and Ios Kotsogiannis and George Bissias and Michael Hay and Ashwin Machanavajjhala and Gerome Miklau},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3362032},<br \/>\r\ndoi = {10.1145\/3362032},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-01-01},<br \/>\r\njournal = {ACM Trans. Database Syst.},<br \/>\r\nvolume = {45},<br \/>\r\nnumber = {1},<br \/>\r\npages = {2:1--2:44},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('299','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_299\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3362032\" title=\"https:\/\/doi.org\/10.1145\/3362032\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3362032<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3362032\" title=\"Follow DOI:10.1145\/3362032\" target=\"_blank\">doi:10.1145\/3362032<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('299','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">David Pujol, Ryan McKenna, Satya Kuppam, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau<\/p><p class=\"tp_pub_title\">Fair decision making using privacy-protected data <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">FAT* '20: Conference on Fairness, Accountability, and Transparency, \r\n Barcelona, Spain, January 27-30, 2020, <\/span><span class=\"tp_pub_additional_pages\">pp. 189\u2013199, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_resource_link\"><a id=\"tp_links_sh_300\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('300','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_300\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('300','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_300\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{DBLP:conf\/fat\/PujolMKHMM20,<br \/>\r\ntitle = {Fair decision making using privacy-protected data},<br \/>\r\nauthor = {David Pujol and Ryan McKenna and Satya Kuppam and Michael Hay and Ashwin Machanavajjhala and Gerome Miklau},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3351095.3372872},<br \/>\r\ndoi = {10.1145\/3351095.3372872},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-01-01},<br \/>\r\nbooktitle = {FAT* '20: Conference on Fairness, Accountability, and Transparency, <br \/>\r\n Barcelona, Spain, January 27-30, 2020},<br \/>\r\npages = {189--199},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('300','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_300\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3351095.3372872\" title=\"https:\/\/doi.org\/10.1145\/3351095.3372872\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3351095.3372872<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3351095.3372872\" title=\"Follow DOI:10.1145\/3351095.3372872\" target=\"_blank\">doi:10.1145\/3351095.3372872<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('300','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Anna Fariha, Suman Nath, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('302','tp_abstract')\" style=\"cursor:pointer;\">Causality-Guided Adaptive Interventional Debugging<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the 2020 International Conference on Management of \r\n Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], \r\n June 14-19, 2020, <\/span><span class=\"tp_pub_additional_pages\">pp. 431\u2013446, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_302\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('302','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_302\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('302','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_302\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('302','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_302\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{DBLP:conf\/sigmod\/FarihaNM20,<br \/>\r\ntitle = {Causality-Guided Adaptive Interventional Debugging},<br \/>\r\nauthor = {Anna Fariha and Suman Nath and Alexandra Meliou},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3318464.3389694<br \/>\r\nhttps:\/\/people.cs.umass.edu\/~afariha\/papers\/aid.pdf},<br \/>\r\ndoi = {10.1145\/3318464.3389694},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-06-15},<br \/>\r\nbooktitle = {Proceedings of the 2020 International Conference on Management of <br \/>\r\n Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], <br \/>\r\n June 14-19, 2020},<br \/>\r\npages = {431--446},<br \/>\r\ncrossref = {DBLP:conf\/sigmod\/2020},<br \/>\r\nabstract = {Runtime nondeterminism is a fact of life in modern database applications. Previous research has shown that nondeterminism can cause applications to intermittently crash, become unresponsive, or experience data corruption. We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures. AID combines existing statistical debugging, causal analysis, fault injection, and group testing techniques in a novel way to (1) pinpoint the root cause of an application's intermittent failure and (2) generate an explanation of how the root cause triggers the failure. AID works by first identifying a set of runtime behaviors (called predicates) that are strongly correlated to the failure. It then utilizes temporal properties of the predicates to (over)-approximate their causal relationships. Finally, it uses fault injection to execute a sequence of interventions on the predicates and discover their true causal relationships. This enables AID to identify the true root cause and its causal relationship to the failure. We theoretically analyze how fast AID can converge to the identification. We evaluate AID with six real-world applications that intermittently fail under specific inputs. In each case, AID was able to identify the root cause and explain how the root cause triggered the failure, much faster than group testing and more precisely than statistical debugging. We also evaluate AID with many synthetically generated applications with known root causes and confirm that the benefits also hold for them.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('302','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_302\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Runtime nondeterminism is a fact of life in modern database applications. Previous research has shown that nondeterminism can cause applications to intermittently crash, become unresponsive, or experience data corruption. We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures. AID combines existing statistical debugging, causal analysis, fault injection, and group testing techniques in a novel way to (1) pinpoint the root cause of an application's intermittent failure and (2) generate an explanation of how the root cause triggers the failure. AID works by first identifying a set of runtime behaviors (called predicates) that are strongly correlated to the failure. It then utilizes temporal properties of the predicates to (over)-approximate their causal relationships. Finally, it uses fault injection to execute a sequence of interventions on the predicates and discover their true causal relationships. This enables AID to identify the true root cause and its causal relationship to the failure. We theoretically analyze how fast AID can converge to the identification. We evaluate AID with six real-world applications that intermittently fail under specific inputs. In each case, AID was able to identify the root cause and explain how the root cause triggered the failure, much faster than group testing and more precisely than statistical debugging. We also evaluate AID with many synthetically generated applications with known root causes and confirm that the benefits also hold for them.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('302','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_302\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3318464.3389694\" title=\"https:\/\/doi.org\/10.1145\/3318464.3389694\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3318464.3389694<\/a><\/li><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/people.cs.umass.edu\/~afariha\/papers\/aid.pdf\" title=\"https:\/\/people.cs.umass.edu\/~afariha\/papers\/aid.pdf\" target=\"_blank\">https:\/\/people.cs.umass.edu\/~afariha\/papers\/aid.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3318464.3389694\" title=\"Follow DOI:10.1145\/3318464.3389694\" target=\"_blank\">doi:10.1145\/3318464.3389694<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('302','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Anna Fariha, Ashish Tiwari, Arjun Radhakrishna, Sumit Gulwani<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('303','tp_abstract')\" style=\"cursor:pointer;\">ExTuNe: Explaining Tuple Non-conformance<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the 2020 International Conference on Management of \r\n Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], \r\n June 14-19, 2020, <\/span><span class=\"tp_pub_additional_pages\">pp. 2741\u20132744, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_303\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('303','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_303\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('303','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_303\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('303','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_303\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{DBLP:conf\/sigmod\/Fariha0RG20,<br \/>\r\ntitle = {ExTuNe: Explaining Tuple Non-conformance},<br \/>\r\nauthor = {Anna Fariha and Ashish Tiwari and Arjun Radhakrishna and Sumit Gulwani},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3318464.3384694<br \/>\r\nhttps:\/\/people.cs.umass.edu\/~afariha\/papers\/ExTuNe.pdf},<br \/>\r\ndoi = {10.1145\/3318464.3384694},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-06-16},<br \/>\r\nbooktitle = {Proceedings of the 2020 International Conference on Management of <br \/>\r\n Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], <br \/>\r\n June 14-19, 2020},<br \/>\r\npages = {2741--2744},<br \/>\r\ncrossref = {DBLP:conf\/sigmod\/2020},<br \/>\r\nabstract = {In data-driven systems, we often encounter tuples on which the predictions of a machine-learned model are untrustworthy. A key cause of such untrustworthiness is non-conformance of a new tuple with respect to the training dataset. To check conformance, we introduce a novel concept of data invariant, which captures a set of implicit constraints that all tuples of a dataset satisfy: a test tuple is non-conforming if it violates the data invariants. Data invariants model complex relationships among multiple attributes; but do not provide interpretable explanations of non-conformance. We present ExTuNe, a system for Explaining causes of Tuple Non-conformance. Based on the principles of causality, ExTuNe assigns responsibility to the attributes for causing non-conformance. The key idea is to observe change in invariant violation under intervention on attribute-values. Through a simple interface, ExTuNe produces a ranked list of the test tuples based on their degree of non-conformance and visualizes tuple-level attribute responsibility for non-conformance through heat maps. ExTuNe further visualizes attribute responsibility, aggregated over the test tuples. We demonstrate how ExTuNe can detect and explain tuple non-conformance and assist the users to make careful decisions towards achieving trusted machine learning.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('303','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_303\" style=\"display:none;\"><div class=\"tp_abstract_entry\">In data-driven systems, we often encounter tuples on which the predictions of a machine-learned model are untrustworthy. A key cause of such untrustworthiness is non-conformance of a new tuple with respect to the training dataset. To check conformance, we introduce a novel concept of data invariant, which captures a set of implicit constraints that all tuples of a dataset satisfy: a test tuple is non-conforming if it violates the data invariants. Data invariants model complex relationships among multiple attributes; but do not provide interpretable explanations of non-conformance. We present ExTuNe, a system for Explaining causes of Tuple Non-conformance. Based on the principles of causality, ExTuNe assigns responsibility to the attributes for causing non-conformance. The key idea is to observe change in invariant violation under intervention on attribute-values. Through a simple interface, ExTuNe produces a ranked list of the test tuples based on their degree of non-conformance and visualizes tuple-level attribute responsibility for non-conformance through heat maps. ExTuNe further visualizes attribute responsibility, aggregated over the test tuples. We demonstrate how ExTuNe can detect and explain tuple non-conformance and assist the users to make careful decisions towards achieving trusted machine learning.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('303','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_303\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3318464.3384694\" title=\"https:\/\/doi.org\/10.1145\/3318464.3384694\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3318464.3384694<\/a><\/li><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/people.cs.umass.edu\/~afariha\/papers\/ExTuNe.pdf\" title=\"https:\/\/people.cs.umass.edu\/~afariha\/papers\/ExTuNe.pdf\" target=\"_blank\">https:\/\/people.cs.umass.edu\/~afariha\/papers\/ExTuNe.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3318464.3384694\" title=\"Follow DOI:10.1145\/3318464.3384694\" target=\"_blank\">doi:10.1145\/3318464.3384694<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('303','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('304','tp_abstract')\" style=\"cursor:pointer;\">New Results for the Complexity of Resilience for Binary Conjunctive  Queries with Self-Joins<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles \r\n of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020, <\/span><span class=\"tp_pub_additional_pages\">pp. 271\u2013284, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_304\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('304','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_304\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('304','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_304\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('304','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_304\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{DBLP:conf\/pods\/FreireGIM20,<br \/>\r\ntitle = {New Results for the Complexity of Resilience for Binary Conjunctive  Queries with Self-Joins},<br \/>\r\nauthor = {Cibele Freire and Wolfgang Gatterbauer and Neil Immerman and Alexandra Meliou},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3375395.3387647<br \/>\r\nhttps:\/\/people.cs.umass.edu\/~ameli\/projects\/causality\/papers\/pods2020-Resilience.pdf},<br \/>\r\ndoi = {10.1145\/3375395.3387647},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-06-16},<br \/>\r\nbooktitle = {Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles <br \/>\r\n of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020},<br \/>\r\npages = {271--284},<br \/>\r\ncrossref = {DBLP:conf\/pods\/2020},<br \/>\r\nabstract = {The resilience of a Boolean query on a database is the minimum number of tuples that need to be deleted from the input tables in order to make the query false. A solution to this problem immediately translates into a solution for the more widely known problem of deletion propagation with source-side effects. In this paper, we give several novel results on the hardness of the resilience problem for conjunctive queries with self-joins, and, more specifically, we present a dichotomy result for the class of single-self-join binary queries with exactly two repeated relations occurring in the query. Unlike in the self-join free case, the concept of triad is not enough to fully characterize the complexity of resilience. We identify new structural properties, namely chains, confluences and permutations, which lead to various NP-hardness results. We also give novel involved reductions to network flow to show certain cases are in P. Although restricted, our results provide important insights into the problem of self-joins that we hope can help solve the general case of all conjunctive queries with self-joins in the future.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('304','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_304\" style=\"display:none;\"><div class=\"tp_abstract_entry\">The resilience of a Boolean query on a database is the minimum number of tuples that need to be deleted from the input tables in order to make the query false. A solution to this problem immediately translates into a solution for the more widely known problem of deletion propagation with source-side effects. In this paper, we give several novel results on the hardness of the resilience problem for conjunctive queries with self-joins, and, more specifically, we present a dichotomy result for the class of single-self-join binary queries with exactly two repeated relations occurring in the query. Unlike in the self-join free case, the concept of triad is not enough to fully characterize the complexity of resilience. We identify new structural properties, namely chains, confluences and permutations, which lead to various NP-hardness results. We also give novel involved reductions to network flow to show certain cases are in P. Although restricted, our results provide important insights into the problem of self-joins that we hope can help solve the general case of all conjunctive queries with self-joins in the future.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('304','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_304\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3375395.3387647\" title=\"https:\/\/doi.org\/10.1145\/3375395.3387647\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3375395.3387647<\/a><\/li><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/people.cs.umass.edu\/~ameli\/projects\/causality\/papers\/pods2020-Resilience.pdf\" title=\"https:\/\/people.cs.umass.edu\/~ameli\/projects\/causality\/papers\/pods2020-Resilience[...]\" target=\"_blank\">https:\/\/people.cs.umass.edu\/~ameli\/projects\/causality\/papers\/pods2020-Resilience[...]<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3375395.3387647\" title=\"Follow DOI:10.1145\/3375395.3387647\" target=\"_blank\">doi:10.1145\/3375395.3387647<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('304','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_inproceedings\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Xiangyao Yu, Matt Youill, Matthew E Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('307','tp_abstract')\" style=\"cursor:pointer;\">PushdownDB: Accelerating a DBMS Using S3 Computation<\/a> <span class=\"tp_pub_type inproceedings\">Inproceedings<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_booktitle\">36th IEEE International Conference on Data Engineering, ICDE 2020, \r\n Dallas, TX, USA, April 20-24, 2020, <\/span><span class=\"tp_pub_additional_pages\">pp. 1802\u20131805, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_307\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('307','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_307\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('307','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_307\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('307','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_307\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@inproceedings{DBLP:conf\/icde\/YuYWGSAS20,<br \/>\r\ntitle = {PushdownDB: Accelerating a DBMS Using S3 Computation},<br \/>\r\nauthor = {Xiangyao Yu and Matt Youill and Matthew E Woicik and Abdurrahman Ghanem and Marco Serafini and Ashraf Aboulnaga and Michael Stonebraker},<br \/>\r\nurl = {https:\/\/doi.org\/10.1109\/ICDE48307.2020.00174<br \/>\r\nhttps:\/\/marcoserafini.github.io\/papers\/pushdown.pdf},<br \/>\r\ndoi = {10.1109\/ICDE48307.2020.00174},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-01-01},<br \/>\r\nbooktitle = {36th IEEE International Conference on Data Engineering, ICDE 2020, <br \/>\r\n Dallas, TX, USA, April 20-24, 2020},<br \/>\r\npages = {1802--1805},<br \/>\r\ncrossref = {DBLP:conf\/icde\/2020},<br \/>\r\nabstract = {This paper studies the effectiveness of pushing parts of DBMS analytics queries into the Simple Storage Service (S3) of Amazon Web Services (AWS), using a recently released capability called S3 Select. We show that some DBMS primitives (filter, projection, and aggregation) can always be cost-effectively moved into S3. Other more complex operations (join, top-K, and groupby) require reimplementation to take advantage of S3 Select and are often candidates for pushdown. We demonstrate these capabilities through experimentation using a new DBMS that we developed, PushdownDB. Experimentation with a collection of queries including TPC-H queries shows that PushdownDB is on average 30% cheaper and 6.7\u00d7 faster than a baseline that does not use S3 Select.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {inproceedings}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('307','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_307\" style=\"display:none;\"><div class=\"tp_abstract_entry\">This paper studies the effectiveness of pushing parts of DBMS analytics queries into the Simple Storage Service (S3) of Amazon Web Services (AWS), using a recently released capability called S3 Select. We show that some DBMS primitives (filter, projection, and aggregation) can always be cost-effectively moved into S3. Other more complex operations (join, top-K, and groupby) require reimplementation to take advantage of S3 Select and are often candidates for pushdown. We demonstrate these capabilities through experimentation using a new DBMS that we developed, PushdownDB. Experimentation with a collection of queries including TPC-H queries shows that PushdownDB is on average 30% cheaper and 6.7\u00d7 faster than a baseline that does not use S3 Select.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('307','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_307\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1109\/ICDE48307.2020.00174\" title=\"https:\/\/doi.org\/10.1109\/ICDE48307.2020.00174\" target=\"_blank\">https:\/\/doi.org\/10.1109\/ICDE48307.2020.00174<\/a><\/li><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/marcoserafini.github.io\/papers\/pushdown.pdf\" title=\"https:\/\/marcoserafini.github.io\/papers\/pushdown.pdf\" target=\"_blank\">https:\/\/marcoserafini.github.io\/papers\/pushdown.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1109\/ICDE48307.2020.00174\" title=\"Follow DOI:10.1109\/ICDE48307.2020.00174\" target=\"_blank\">doi:10.1109\/ICDE48307.2020.00174<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('307','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Xiaowei Zhu, Marco Serafini, Xiaosong Ma, Ashraf Aboulnaga, Wenguang Chen, Guanyu Feng<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('306','tp_abstract')\" style=\"cursor:pointer;\">LiveGraph: A Transactional Graph Storage System with Purely Sequential  Adjacency List Scans<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Proc. VLDB Endow., <\/span><span class=\"tp_pub_additional_volume\">13 <\/span><span class=\"tp_pub_additional_number\">(7), <\/span><span class=\"tp_pub_additional_pages\">pp. 1020\u20131034, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_306\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('306','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_306\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('306','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_306\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('306','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_306\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/pvldb\/ZhuSMACF20,<br \/>\r\ntitle = {LiveGraph: A Transactional Graph Storage System with Purely Sequential  Adjacency List Scans},<br \/>\r\nauthor = {Xiaowei Zhu and Marco Serafini and Xiaosong Ma and Ashraf Aboulnaga and Wenguang Chen and Guanyu Feng},<br \/>\r\nurl = {http:\/\/www.vldb.org\/pvldb\/vol13\/p1020-zhu.pdf},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-01-01},<br \/>\r\njournal = {Proc. VLDB Endow.},<br \/>\r\nvolume = {13},<br \/>\r\nnumber = {7},<br \/>\r\npages = {1020--1034},<br \/>\r\nabstract = {The specific characteristics of graph workloads make it hard to design a one-size-fits-all graph storage system. Systems that support transactional updates use data structures with poor data locality, which limits the efficiency of analytical workloads or even simple edge scans. Other systems run graph analytics workloads efficiently, but cannot properly support transactions. This paper presents LiveGraph, a graph storage system that outperforms both the best graph transactional systems and the best solutions for real-time graph analytics on fresh data. LiveGraph achieves this by ensuring that adjacency list scans, a key operation in graph workloads, are purely sequential: they never require random accesses even in presence of concurrent transactions. Such pure-sequential operations are enabled by combining a novel graph-aware data structure, the Transactional Edge Log (TEL), with a concurrency control mechanism that leverages TEL\u2019s data layout. Our evaluation shows that LiveGraph significantly outperforms state-of-the-art (graph) database solutions on both transactional and real-time analytical workloads.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('306','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_306\" style=\"display:none;\"><div class=\"tp_abstract_entry\">The specific characteristics of graph workloads make it hard to design a one-size-fits-all graph storage system. Systems that support transactional updates use data structures with poor data locality, which limits the efficiency of analytical workloads or even simple edge scans. Other systems run graph analytics workloads efficiently, but cannot properly support transactions. This paper presents LiveGraph, a graph storage system that outperforms both the best graph transactional systems and the best solutions for real-time graph analytics on fresh data. LiveGraph achieves this by ensuring that adjacency list scans, a key operation in graph workloads, are purely sequential: they never require random accesses even in presence of concurrent transactions. Such pure-sequential operations are enabled by combining a novel graph-aware data structure, the Transactional Edge Log (TEL), with a concurrency control mechanism that leverages TEL\u2019s data layout. Our evaluation shows that LiveGraph significantly outperforms state-of-the-art (graph) database solutions on both transactional and real-time analytical workloads.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('306','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_306\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p1020-zhu.pdf\" title=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p1020-zhu.pdf\" target=\"_blank\">http:\/\/www.vldb.org\/pvldb\/vol13\/p1020-zhu.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('306','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Muhammad Bilal, Marco Serafini, Marco Canini, Rodrigo Rodrigues<\/p><p class=\"tp_pub_title\">Do the Best Cloud Configurations Grow on Trees? An Experimental Evaluation \r\n of Black Box Algorithms for Optimizing Cloud Workloads Sub <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Proc. VLDB Endow., <\/span><span class=\"tp_pub_additional_volume\">13 <\/span><span class=\"tp_pub_additional_number\">(11), <\/span><span class=\"tp_pub_additional_pages\">pp. 2563\u20132575, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_resource_link\"><a id=\"tp_links_sh_314\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('314','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_314\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('314','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_314\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/pvldb\/0007SCR20,<br \/>\r\ntitle = {Do the Best Cloud Configurations Grow on Trees? An Experimental Evaluation <br \/>\r\n of Black Box Algorithms for Optimizing Cloud Workloads Sub},<br \/>\r\nauthor = {Muhammad Bilal and Marco Serafini and Marco Canini and Rodrigo Rodrigues},<br \/>\r\nurl = {http:\/\/www.vldb.org\/pvldb\/vol13\/p2563-bilal.pdf},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-01-01},<br \/>\r\njournal = {Proc. VLDB Endow.},<br \/>\r\nvolume = {13},<br \/>\r\nnumber = {11},<br \/>\r\npages = {2563--2575},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('314','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_314\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p2563-bilal.pdf\" title=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p2563-bilal.pdf\" target=\"_blank\">http:\/\/www.vldb.org\/pvldb\/vol13\/p2563-bilal.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('314','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Matteo Brucato, Miro Mannino, Azza Abouzied, Peter J Haas, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('310','tp_abstract')\" style=\"cursor:pointer;\">sPaQLTooLs: A Stochastic Package Query Interface for Scalable Constrained  Optimization<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Proc. VLDB Endow., <\/span><span class=\"tp_pub_additional_volume\">13 <\/span><span class=\"tp_pub_additional_number\">(12), <\/span><span class=\"tp_pub_additional_pages\">pp. 2881\u20132884, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_310\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('310','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_310\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('310','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_310\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('310','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_310\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/pvldb\/BrucatoMAHM20,<br \/>\r\ntitle = {sPaQLTooLs: A Stochastic Package Query Interface for Scalable Constrained  Optimization},<br \/>\r\nauthor = {Matteo Brucato and Miro Mannino and Azza Abouzied and Peter J Haas and Alexandra Meliou},<br \/>\r\nurl = {http:\/\/www.vldb.org\/pvldb\/vol13\/p2881-brucato.pdf},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-01-01},<br \/>\r\njournal = {Proc. VLDB Endow.},<br \/>\r\nvolume = {13},<br \/>\r\nnumber = {12},<br \/>\r\npages = {2881--2884},<br \/>\r\nabstract = {Everyone needs to make decisions under uncertainty and with limited resources, e.g., an investor who is building a stock portfolio subject to an investment budget and a bounded risk tolerance. Doing this with current technology is hard. There is a disconnect between software tools for data management, stochastic predictive modeling (e.g., simulation of future stock prices), and optimization; this leads to cumbersome analytical workflows. Moreover, current methods do not scale. To handle a broad class of uncertainty models, analysts approximate the original stochastic optimization problem by a large deterministic optimization problem that incorporates many \u201cscenarios\u201d, i.e., sample realizations of the uncertain data values. For large problems, a huge number of scenarios is required, often causing the solver to fail. We demonstrate sPaQLTooLs, a system for in-database specification and scalable solution of constrained optimization problems. The key ingredients are (i) a database-oriented specification of constrained stochastic optimization problems as \u201cstochastic package queries\u201d (SPQs), (ii) use of a<br \/>\r\nMonte Carlo database to incorporate stochastic predictive models, and (iii) a new SUMMARYSEARCH algorithm for scalably solving SPQs with approximation guarantees. In this demonstration, the attendees will experience first-hand the difficulty of manually constructing feasible and high-quality portfolios, using real-world stock market data. We will then demonstrate how SUMMARYSEARCH can easily and efficiently help them find very good portfolios, while being orders of magnitude faster than prior methods.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('310','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_310\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Everyone needs to make decisions under uncertainty and with limited resources, e.g., an investor who is building a stock portfolio subject to an investment budget and a bounded risk tolerance. Doing this with current technology is hard. There is a disconnect between software tools for data management, stochastic predictive modeling (e.g., simulation of future stock prices), and optimization; this leads to cumbersome analytical workflows. Moreover, current methods do not scale. To handle a broad class of uncertainty models, analysts approximate the original stochastic optimization problem by a large deterministic optimization problem that incorporates many \u201cscenarios\u201d, i.e., sample realizations of the uncertain data values. For large problems, a huge number of scenarios is required, often causing the solver to fail. We demonstrate sPaQLTooLs, a system for in-database specification and scalable solution of constrained optimization problems. The key ingredients are (i) a database-oriented specification of constrained stochastic optimization problems as \u201cstochastic package queries\u201d (SPQs), (ii) use of a<br \/>\r\nMonte Carlo database to incorporate stochastic predictive models, and (iii) a new SUMMARYSEARCH algorithm for scalably solving SPQs with approximation guarantees. In this demonstration, the attendees will experience first-hand the difficulty of manually constructing feasible and high-quality portfolios, using real-world stock market data. We will then demonstrate how SUMMARYSEARCH can easily and efficiently help them find very good portfolios, while being orders of magnitude faster than prior methods.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('310','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_310\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p2881-brucato.pdf\" title=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p2881-brucato.pdf\" target=\"_blank\">http:\/\/www.vldb.org\/pvldb\/vol13\/p2881-brucato.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('310','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, \u00c7, Wang -<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('308','tp_abstract')\" style=\"cursor:pointer;\">Sato: Contextual Semantic Type Detection in Tables<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Proc. VLDB Endow., <\/span><span class=\"tp_pub_additional_volume\">13 <\/span><span class=\"tp_pub_additional_number\">(11), <\/span><span class=\"tp_pub_additional_pages\">pp. 1835\u20131848, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_308\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('308','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_308\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('308','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_308\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('308','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_308\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/pvldb\/ZhangSLHDT20,<br \/>\r\ntitle = {Sato: Contextual Semantic Type Detection in Tables},<br \/>\r\nauthor = {Dan Zhang and Yoshihiko Suhara and Jinfeng Li and Madelon Hulsebos and \u00c7 and Wang -},<br \/>\r\nurl = {http:\/\/www.vldb.org\/pvldb\/vol13\/p1835-zhang.pdf},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-01-01},<br \/>\r\njournal = {Proc. VLDB Endow.},<br \/>\r\nvolume = {13},<br \/>\r\nnumber = {11},<br \/>\r\npages = {1835--1848},<br \/>\r\nabstract = {Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search. However, existing detection approaches either perform poorly with dirty data, support only a limited number of semantic types, fail to incorporate the table context of columns or rely on large sample sizes for training data. We introduce Sato, a hybrid machine learning model to automatically detect the semantic types of columns in tables, exploiting the signals from the table context as well as the column values. Sato combines a deep learning model trained on a large-scale table corpus with topic modeling and structured prediction<br \/>\r\nto achieve support-weighted and macro average F1 scores of 0.925 and 0.735, respectively, exceeding the state-of-theart performance by a significant margin. We extensively analyze the overall and per-type performance of Sato, discussing how individual modeling components, as well as feature categories, contribute to its performance.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('308','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_308\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search. However, existing detection approaches either perform poorly with dirty data, support only a limited number of semantic types, fail to incorporate the table context of columns or rely on large sample sizes for training data. We introduce Sato, a hybrid machine learning model to automatically detect the semantic types of columns in tables, exploiting the signals from the table context as well as the column values. Sato combines a deep learning model trained on a large-scale table corpus with topic modeling and structured prediction<br \/>\r\nto achieve support-weighted and macro average F1 scores of 0.925 and 0.735, respectively, exceeding the state-of-theart performance by a significant margin. We extensively analyze the overall and per-type performance of Sato, discussing how individual modeling components, as well as feature categories, contribute to its performance.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('308','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_308\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p1835-zhang.pdf\" title=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p1835-zhang.pdf\" target=\"_blank\">http:\/\/www.vldb.org\/pvldb\/vol13\/p1835-zhang.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('308','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Anna Fariha, Matteo Brucato, Peter J Haas, Alexandra Meliou<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('309','tp_abstract')\" style=\"cursor:pointer;\">SuDocu: Summarizing Documents by Example<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Proc. VLDB Endow., <\/span><span class=\"tp_pub_additional_volume\">13 <\/span><span class=\"tp_pub_additional_number\">(12), <\/span><span class=\"tp_pub_additional_pages\">pp. 2861\u20132864, <\/span><span class=\"tp_pub_additional_year\">2020<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_309\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('309','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_309\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('309','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_309\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('309','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_309\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/pvldb\/FarihaBHM20,<br \/>\r\ntitle = {SuDocu: Summarizing Documents by Example},<br \/>\r\nauthor = {Anna Fariha and Matteo Brucato and Peter J Haas and Alexandra Meliou},<br \/>\r\nurl = {http:\/\/www.vldb.org\/pvldb\/vol13\/p2861-fariha.pdf},<br \/>\r\nyear  = {2020},<br \/>\r\ndate = {2020-01-01},<br \/>\r\njournal = {Proc. VLDB Endow.},<br \/>\r\nvolume = {13},<br \/>\r\nnumber = {12},<br \/>\r\npages = {2861--2864},<br \/>\r\nabstract = {Text document summarization refers to the task of producing a brief representation of a document for easy human consumption. Existing text summarization techniques mostly focus on generic summarization, but users often require personalized summarization that targets their specific preferences and needs. However, precisely expressing preferences is challenging, and current methods are often ambiguous, outside the user\u2019s control, or require costly training data. We propose a novel and effective way to express summarization intent (preferences) via examples: the user provides a few example summaries for a small number of documents in a collection, and the system summarizes the rest. We demonstrate SUDOCU, an example-based personalized DOCUment SUmmarization system. Through a simple interface, SUDOCU allows the users to provide example summaries, learns the summarization intent from the examples, and produces summaries for new documents that reflect the user\u2019s summarization intent. SUDOCU further explains the captured summarization intent in the form of a package query, an extension of a traditional SQL query that handles complex constraints and preferences over answer sets. SUDOCU combines topic modeling, semantic similarity discovery, and in-database optimization in a novel way to achieve example-driven document summarization.<br \/>\r\nWe demonstrate how SUDOCU can detect complex summarization intents from a few example summaries and produce accurate summaries for new documents effectively and efficiently.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('309','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_309\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Text document summarization refers to the task of producing a brief representation of a document for easy human consumption. Existing text summarization techniques mostly focus on generic summarization, but users often require personalized summarization that targets their specific preferences and needs. However, precisely expressing preferences is challenging, and current methods are often ambiguous, outside the user\u2019s control, or require costly training data. We propose a novel and effective way to express summarization intent (preferences) via examples: the user provides a few example summaries for a small number of documents in a collection, and the system summarizes the rest. We demonstrate SUDOCU, an example-based personalized DOCUment SUmmarization system. Through a simple interface, SUDOCU allows the users to provide example summaries, learns the summarization intent from the examples, and produces summaries for new documents that reflect the user\u2019s summarization intent. SUDOCU further explains the captured summarization intent in the form of a package query, an extension of a traditional SQL query that handles complex constraints and preferences over answer sets. SUDOCU combines topic modeling, semantic similarity discovery, and in-database optimization in a novel way to achieve example-driven document summarization.<br \/>\r\nWe demonstrate how SUDOCU can detect complex summarization intents from a few example summaries and produce accurate summaries for new documents effectively and efficiently.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('309','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_309\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p2861-fariha.pdf\" title=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p2861-fariha.pdf\" target=\"_blank\">http:\/\/www.vldb.org\/pvldb\/vol13\/p2861-fariha.pdf<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('309','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr>\r\n                    <td>\r\n                        <h3 class=\"tp_h3\" id=\"tp_h3_2019\">2019<\/h3>\r\n                    <\/td>\r\n                <\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Ahmed Elgohary, Matthias Boehm, Peter J Haas, Frederick R Reiss, Berthold Reinwald<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('276','tp_abstract')\" style=\"cursor:pointer;\">Compressed linear algebra for declarative large-scale machine learning<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">Commun. ACM, <\/span><span class=\"tp_pub_additional_volume\">62 <\/span><span class=\"tp_pub_additional_number\">(5), <\/span><span class=\"tp_pub_additional_pages\">pp. 83\u201391, <\/span><span class=\"tp_pub_additional_year\">2019<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_276\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('276','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_276\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('276','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_276\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('276','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_276\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/cacm\/ElgoharyBHRR19,<br \/>\r\ntitle = {Compressed linear algebra for declarative large-scale machine learning},<br \/>\r\nauthor = {Ahmed Elgohary and Matthias Boehm and Peter J Haas and Frederick R Reiss and Berthold Reinwald},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3318221},<br \/>\r\ndoi = {10.1145\/3318221},<br \/>\r\nyear  = {2019},<br \/>\r\ndate = {2019-01-01},<br \/>\r\njournal = {Commun. ACM},<br \/>\r\nvolume = {62},<br \/>\r\nnumber = {5},<br \/>\r\npages = {83--91},<br \/>\r\nabstract = {Large-scale Machine Learning (ML) algorithms are often iterative, using repeated read-only data access and I\/Obound matrix-vector multiplications. Hence, it is crucial for performance to fit the data into single-node or distributed main memory to enable fast matrix-vector operations. General-purpose compression struggles to achieve both good compression ratios and fast decompression for blockwise uncompressed operations. Therefore, we introduce Compressed Linear Algebra (CLA) for lossless matrix compression. CLA encodes matrices with lightweight, valuebased compression techniques and executes linear algebra operations directly on the compressed representations. We contribute effective column compression schemes, cacheconscious operations, and an efficient sampling-based compression algorithm. Our experiments show good compression ratios and operations performance close to the uncompressed case, which enables fitting larger datasets into available memory. We thereby obtain significant endto-end performance improvements},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('276','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_276\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Large-scale Machine Learning (ML) algorithms are often iterative, using repeated read-only data access and I\/Obound matrix-vector multiplications. Hence, it is crucial for performance to fit the data into single-node or distributed main memory to enable fast matrix-vector operations. General-purpose compression struggles to achieve both good compression ratios and fast decompression for blockwise uncompressed operations. Therefore, we introduce Compressed Linear Algebra (CLA) for lossless matrix compression. CLA encodes matrices with lightweight, valuebased compression techniques and executes linear algebra operations directly on the compressed representations. We contribute effective column compression schemes, cacheconscious operations, and an efficient sampling-based compression algorithm. Our experiments show good compression ratios and operations performance close to the uncompressed case, which enables fitting larger datasets into available memory. We thereby obtain significant endto-end performance improvements<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('276','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_276\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3318221\" title=\"https:\/\/doi.org\/10.1145\/3318221\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3318221<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3318221\" title=\"Follow DOI:10.1145\/3318221\" target=\"_blank\">doi:10.1145\/3318221<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('276','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Yanlei Diao, Pawel Guzewicz, Ioana Manolescu, Mirjana Mazuran<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('277','tp_abstract')\" style=\"cursor:pointer;\">Spade: A Modular Framework for Analytical Exploration of RDF Graphs<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">PVLDB, <\/span><span class=\"tp_pub_additional_volume\">12 <\/span><span class=\"tp_pub_additional_number\">(12), <\/span><span class=\"tp_pub_additional_pages\">pp. 1926\u20131929, <\/span><span class=\"tp_pub_additional_year\">2019<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_277\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('277','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_277\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('277','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_277\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('277','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_277\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/pvldb\/DiaoGMM19,<br \/>\r\ntitle = {Spade: A Modular Framework for Analytical Exploration of RDF Graphs},<br \/>\r\nauthor = {Yanlei Diao and Pawel Guzewicz and Ioana Manolescu and Mirjana Mazuran},<br \/>\r\nurl = {http:\/\/www.vldb.org\/pvldb\/vol12\/p1926-diao.pdf<br \/>\r\nhttps:\/\/hal.inria.fr\/hal-02152844\/document},<br \/>\r\ndoi = {10.14778\/3352063.3352101},<br \/>\r\nyear  = {2019},<br \/>\r\ndate = {2019-01-01},<br \/>\r\njournal = {PVLDB},<br \/>\r\nvolume = {12},<br \/>\r\nnumber = {12},<br \/>\r\npages = {1926--1929},<br \/>\r\nabstract = {RDF data is complex; exploring it is hard, and can be done through many different metaphors. We have developed and propose to demonstrate Spade, a tool helping users discover meaningful content of an RDF graph by showing them the results of aggregation (OLAP-style) queries automatically identified from the data. Spade chooses aggregates that are visually interesting, a property formally based on statistic properties of the aggregation query results. While well understood for relational data, such exploration raises multiple challenges for RDF: facts, dimensions and measures have to be identified (as opposed to known beforehand); as there are more candidate aggregates, assessing their interestingness can be very costly; finally, ontologies bring novel specific challenges but also novel opportunities, enabling ontology-driven exploration from an aggregate initially proposed by the system. Spade is a generic, extensible framework, which we instantiated with: (i) novel methods for enumerating candidate measures and dimensions in the vast space of possibilities provided by an RDF graph; (ii) a set of aggregate interestingness functions; (iii) ontology-based interactive exploration, and (iv) efficient early-stop techniques for estimating the interestingness of an aggregate query. The demonstration will comprise interactive scenarios on a variety of large, interesting RDF graphs.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('277','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_277\" style=\"display:none;\"><div class=\"tp_abstract_entry\">RDF data is complex; exploring it is hard, and can be done through many different metaphors. We have developed and propose to demonstrate Spade, a tool helping users discover meaningful content of an RDF graph by showing them the results of aggregation (OLAP-style) queries automatically identified from the data. Spade chooses aggregates that are visually interesting, a property formally based on statistic properties of the aggregation query results. While well understood for relational data, such exploration raises multiple challenges for RDF: facts, dimensions and measures have to be identified (as opposed to known beforehand); as there are more candidate aggregates, assessing their interestingness can be very costly; finally, ontologies bring novel specific challenges but also novel opportunities, enabling ontology-driven exploration from an aggregate initially proposed by the system. Spade is a generic, extensible framework, which we instantiated with: (i) novel methods for enumerating candidate measures and dimensions in the vast space of possibilities provided by an RDF graph; (ii) a set of aggregate interestingness functions; (iii) ontology-based interactive exploration, and (iv) efficient early-stop techniques for estimating the interestingness of an aggregate query. The demonstration will comprise interactive scenarios on a variety of large, interesting RDF graphs.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('277','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_277\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/www.vldb.org\/pvldb\/vol12\/p1926-diao.pdf\" title=\"http:\/\/www.vldb.org\/pvldb\/vol12\/p1926-diao.pdf\" target=\"_blank\">http:\/\/www.vldb.org\/pvldb\/vol12\/p1926-diao.pdf<\/a><\/li><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/hal.inria.fr\/hal-02152844\/document\" title=\"https:\/\/hal.inria.fr\/hal-02152844\/document\" target=\"_blank\">https:\/\/hal.inria.fr\/hal-02152844\/document<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.14778\/3352063.3352101\" title=\"Follow DOI:10.14778\/3352063.3352101\" target=\"_blank\">doi:10.14778\/3352063.3352101<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('277','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Khaled Zaouk, Fei Song, Chenghao Lyu, Arnab Sinha, Yanlei Diao, Prashant J Shenoy<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('278','tp_abstract')\" style=\"cursor:pointer;\">UDAO: A Next-Generation Unified Data Analytics Optimizer<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">PVLDB, <\/span><span class=\"tp_pub_additional_volume\">12 <\/span><span class=\"tp_pub_additional_number\">(12), <\/span><span class=\"tp_pub_additional_pages\">pp. 1934\u20131937, <\/span><span class=\"tp_pub_additional_year\">2019<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_278\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('278','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_278\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('278','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_278\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('278','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_278\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/pvldb\/ZaoukSLSDS19,<br \/>\r\ntitle = {UDAO: A Next-Generation Unified Data Analytics Optimizer},<br \/>\r\nauthor = {Khaled Zaouk and Fei Song and Chenghao Lyu and Arnab Sinha and Yanlei Diao and Prashant J Shenoy},<br \/>\r\nurl = {http:\/\/www.vldb.org\/pvldb\/vol12\/p1934-zaouk.pdf},<br \/>\r\ndoi = {10.14778\/3352063.3352103},<br \/>\r\nyear  = {2019},<br \/>\r\ndate = {2019-01-01},<br \/>\r\njournal = {PVLDB},<br \/>\r\nvolume = {12},<br \/>\r\nnumber = {12},<br \/>\r\npages = {1934--1937},<br \/>\r\nabstract = {Big data analytics systems today still lack the ability to take user performance goals and budgetary constraints, collectively referred to as \u201cobjectives\u201d, and automatically configure an analytic job to achieve the objectives. This paper presents UDAO, a unified data analytics optimizer that can automatically determine the parameters of the runtime system, collectively called a job configuration, for general dataflow programs based on user objectives. UDAO embodies key techniques including in-situ modeling, which learns a model for each user objective in the same computing environment as the job is run, and multi-objective optimization, which computes a Pareto optimal set of job configurations to reveal tradeoffs between different objectives. Using benchmarks developed based on industry needs, our demonstration will allow the user to explore (1) learned models to gain insights into how various parameters affect user objectives; (2) Pareto frontiers to understand interesting tradeoffs between different objectives and how a configuration recommended by the optimizer explores these tradeoffs; (3) endto-end benefits that UDAO can provide over default configurations or those manually tuned by engineers.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('278','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_278\" style=\"display:none;\"><div class=\"tp_abstract_entry\">Big data analytics systems today still lack the ability to take user performance goals and budgetary constraints, collectively referred to as \u201cobjectives\u201d, and automatically configure an analytic job to achieve the objectives. This paper presents UDAO, a unified data analytics optimizer that can automatically determine the parameters of the runtime system, collectively called a job configuration, for general dataflow programs based on user objectives. UDAO embodies key techniques including in-situ modeling, which learns a model for each user objective in the same computing environment as the job is run, and multi-objective optimization, which computes a Pareto optimal set of job configurations to reveal tradeoffs between different objectives. Using benchmarks developed based on industry needs, our demonstration will allow the user to explore (1) learned models to gain insights into how various parameters affect user objectives; (2) Pareto frontiers to understand interesting tradeoffs between different objectives and how a configuration recommended by the optimizer explores these tradeoffs; (3) endto-end benefits that UDAO can provide over default configurations or those manually tuned by engineers.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('278','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_278\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/www.vldb.org\/pvldb\/vol12\/p1934-zaouk.pdf\" title=\"http:\/\/www.vldb.org\/pvldb\/vol12\/p1934-zaouk.pdf\" target=\"_blank\">http:\/\/www.vldb.org\/pvldb\/vol12\/p1934-zaouk.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.14778\/3352063.3352103\" title=\"Follow DOI:10.14778\/3352063.3352103\" target=\"_blank\">doi:10.14778\/3352063.3352103<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('278','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanavajjhala, Michael Hay, Gerome Miklau<\/p><p class=\"tp_pub_title\">PrivateSQL: A Differentially Private SQL Query Engine <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">PVLDB, <\/span><span class=\"tp_pub_additional_volume\">12 <\/span><span class=\"tp_pub_additional_number\">(11), <\/span><span class=\"tp_pub_additional_pages\">pp. 1371\u20131384, <\/span><span class=\"tp_pub_additional_year\">2019<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_resource_link\"><a id=\"tp_links_sh_282\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('282','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_282\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('282','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_282\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/pvldb\/KotsogiannisTHF19,<br \/>\r\ntitle = {PrivateSQL: A Differentially Private SQL Query Engine},<br \/>\r\nauthor = {Ios Kotsogiannis and Yuchao Tao and Xi He and Maryam Fanaeepour and Ashwin Machanavajjhala and Michael Hay and Gerome Miklau},<br \/>\r\nurl = {http:\/\/www.vldb.org\/pvldb\/vol12\/p1371-kotsogiannis.pdf},<br \/>\r\ndoi = {10.14778\/3342263.3342274},<br \/>\r\nyear  = {2019},<br \/>\r\ndate = {2019-01-01},<br \/>\r\njournal = {PVLDB},<br \/>\r\nvolume = {12},<br \/>\r\nnumber = {11},<br \/>\r\npages = {1371--1384},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('282','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_282\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"http:\/\/www.vldb.org\/pvldb\/vol12\/p1371-kotsogiannis.pdf\" title=\"http:\/\/www.vldb.org\/pvldb\/vol12\/p1371-kotsogiannis.pdf\" target=\"_blank\">http:\/\/www.vldb.org\/pvldb\/vol12\/p1371-kotsogiannis.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.14778\/3342263.3342274\" title=\"Follow DOI:10.14778\/3342263.3342274\" target=\"_blank\">doi:10.14778\/3342263.3342274<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('282','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Dan Zhang, Ryan McKenna, Ios Kotsogiannis, George Bissias, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau<\/p><p class=\"tp_pub_title\">Ektelo: A Framework for Defining Differentially-Private Computations <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">SIGMOD Record, <\/span><span class=\"tp_pub_additional_volume\">48 <\/span><span class=\"tp_pub_additional_number\">(1), <\/span><span class=\"tp_pub_additional_pages\">pp. 15\u201322, <\/span><span class=\"tp_pub_additional_year\">2019<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_resource_link\"><a id=\"tp_links_sh_283\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('283','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_283\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('283','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_283\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/sigmod\/ZhangMKBHMM19,<br \/>\r\ntitle = {Ektelo: A Framework for Defining Differentially-Private Computations},<br \/>\r\nauthor = {Dan Zhang and Ryan McKenna and Ios Kotsogiannis and George Bissias and Michael Hay and Ashwin Machanavajjhala and Gerome Miklau},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3371316.3371321},<br \/>\r\ndoi = {10.1145\/3371316.3371321},<br \/>\r\nyear  = {2019},<br \/>\r\ndate = {2019-01-01},<br \/>\r\njournal = {SIGMOD Record},<br \/>\r\nvolume = {48},<br \/>\r\nnumber = {1},<br \/>\r\npages = {15--22},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('283','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_283\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3371316.3371321\" title=\"https:\/\/doi.org\/10.1145\/3371316.3371321\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3371316.3371321<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3371316.3371321\" title=\"Follow DOI:10.1145\/3371316.3371321\" target=\"_blank\">doi:10.1145\/3371316.3371321<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('283','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><tr class=\"tp_publication tp_publication_article\"><td class=\"tp_pub_info\"><p class=\"tp_pub_author\">Brian Hentschel, Peter J Haas, Yuanyuan Tian<\/p><p class=\"tp_pub_title\"><a class=\"tp_title_link\" onclick=\"teachpress_pub_showhide('279','tp_abstract')\" style=\"cursor:pointer;\">Online Model Management via Temporally Biased Sampling<\/a> <span class=\"tp_pub_type article\">Journal Article<\/span> <\/p><p class=\"tp_pub_additional\"><span class=\"tp_pub_additional_journal\">SIGMOD Record, <\/span><span class=\"tp_pub_additional_volume\">48 <\/span><span class=\"tp_pub_additional_number\">(1), <\/span><span class=\"tp_pub_additional_pages\">pp. 69\u201376, <\/span><span class=\"tp_pub_additional_year\">2019<\/span>.<\/p><p class=\"tp_pub_tags\"><span class=\"tp_abstract_link\"><a id=\"tp_abstract_sh_279\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('279','tp_abstract')\" title=\"Show abstract\" style=\"cursor:pointer;\">Abstract<\/a><\/span> | <span class=\"tp_resource_link\"><a id=\"tp_links_sh_279\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('279','tp_links')\" title=\"Show links and resources\" style=\"cursor:pointer;\">Links<\/a><\/span> | <span class=\"tp_bibtex_link\"><a id=\"tp_bibtex_sh_279\" class=\"tp_show\" onclick=\"teachpress_pub_showhide('279','tp_bibtex')\" title=\"Show BibTeX entry\" style=\"cursor:pointer;\">BibTeX<\/a><\/span><\/p><div class=\"tp_bibtex\" id=\"tp_bibtex_279\" style=\"display:none;\"><div class=\"tp_bibtex_entry\">@article{DBLP:journals\/sigmod\/HentschelHT19,<br \/>\r\ntitle = {Online Model Management via Temporally Biased Sampling},<br \/>\r\nauthor = {Brian Hentschel and Peter J Haas and Yuanyuan Tian},<br \/>\r\nurl = {https:\/\/doi.org\/10.1145\/3371316.3371333<br \/>\r\nhttps:\/\/people.cs.umass.edu\/~phaas\/files\/tbs-sigmod-record.pdf},<br \/>\r\ndoi = {10.1145\/3371316.3371333},<br \/>\r\nyear  = {2019},<br \/>\r\ndate = {2019-01-01},<br \/>\r\njournal = {SIGMOD Record},<br \/>\r\nvolume = {48},<br \/>\r\nnumber = {1},<br \/>\r\npages = {69--76},<br \/>\r\nabstract = {To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporally-biased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying exponentially over time. We then periodically retrain the models on the current sample. We provide and analyze both a simple sampling scheme (T-TBS) that probabilistically maintains a target sample size and a novel reservoir-based scheme (R-TBS) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequal-probability sampling schemes. We discuss distributed implementation strategies; experiments in Spark show that our approach can increase machine learning accuracy and robustness in the face of evolving data.},<br \/>\r\nkeywords = {},<br \/>\r\npubstate = {published},<br \/>\r\ntppubtype = {article}<br \/>\r\n}<br \/>\r\n<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('279','tp_bibtex')\">Close<\/a><\/p><\/div><div class=\"tp_abstract\" id=\"tp_abstract_279\" style=\"display:none;\"><div class=\"tp_abstract_entry\">To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporally-biased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying exponentially over time. We then periodically retrain the models on the current sample. We provide and analyze both a simple sampling scheme (T-TBS) that probabilistically maintains a target sample size and a novel reservoir-based scheme (R-TBS) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequal-probability sampling schemes. We discuss distributed implementation strategies; experiments in Spark show that our approach can increase machine learning accuracy and robustness in the face of evolving data.<\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('279','tp_abstract')\">Close<\/a><\/p><\/div><div class=\"tp_links\" id=\"tp_links_279\" style=\"display:none;\"><div class=\"tp_links_entry\"><ul class=\"tp_pub_list\"><li><i class=\"fas fa-globe\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/doi.org\/10.1145\/3371316.3371333\" title=\"https:\/\/doi.org\/10.1145\/3371316.3371333\" target=\"_blank\">https:\/\/doi.org\/10.1145\/3371316.3371333<\/a><\/li><li><i class=\"fas fa-file-pdf\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/people.cs.umass.edu\/~phaas\/files\/tbs-sigmod-record.pdf\" title=\"https:\/\/people.cs.umass.edu\/~phaas\/files\/tbs-sigmod-record.pdf\" target=\"_blank\">https:\/\/people.cs.umass.edu\/~phaas\/files\/tbs-sigmod-record.pdf<\/a><\/li><li><i class=\"ai ai-doi\"><\/i><a class=\"tp_pub_list\" href=\"https:\/\/dx.doi.org\/10.1145\/3371316.3371333\" title=\"Follow DOI:10.1145\/3371316.3371333\" target=\"_blank\">doi:10.1145\/3371316.3371333<\/a><\/li><\/ul><\/div><p class=\"tp_close_menu\"><a class=\"tp_close\" onclick=\"teachpress_pub_showhide('279','tp_links')\">Close<\/a><\/p><\/div><\/td><\/tr><\/table><div class=\"tablenav\"><div class=\"tablenav-pages\"><span class=\"displaying-num\">249 entries<\/span> <a class=\"page-numbers button disabled\">&laquo;<\/a> <a class=\"page-numbers button disabled\">&lsaquo;<\/a>  1 of 5 <a href=\"https:\/\/dream.cs.umass.edu\/?page_id=92&amp;limit=2&amp;tgid=&amp;yr=&amp;type=&amp;usr=&amp;auth=&amp;tsr=#tppubs\" title=\"next page\" class=\"page-numbers button\">&rsaquo;<\/a> <a href=\"https:\/\/dream.cs.umass.edu\/?page_id=92&amp;limit=5&amp;tgid=&amp;yr=&amp;type=&amp;usr=&amp;auth=&amp;tsr=#tppubs\" title=\"last page\" class=\"page-numbers button\">&raquo;<\/a> <\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":15,"comment_status":"closed","ping_status":"closed","template":"Templates\/nosidebar-template.php","meta":[],"_links":{"self":[{"href":"https:\/\/dream.cs.umass.edu\/index.php?rest_route=\/wp\/v2\/pages\/92"}],"collection":[{"href":"https:\/\/dream.cs.umass.edu\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dream.cs.umass.edu\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dream.cs.umass.edu\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dream.cs.umass.edu\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=92"}],"version-history":[{"count":26,"href":"https:\/\/dream.cs.umass.edu\/index.php?rest_route=\/wp\/v2\/pages\/92\/revisions"}],"predecessor-version":[{"id":1335,"href":"https:\/\/dream.cs.umass.edu\/index.php?rest_route=\/wp\/v2\/pages\/92\/revisions\/1335"}],"wp:attachment":[{"href":"https:\/\/dream.cs.umass.edu\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=92"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}