Computational intractability as an ally and adversary in genomic privacy, analysis, and engineering

学友会セミナー

開催情報

開催日時	2020年6月25日（木）10:00～11:00
開催場所	Zoomにて開催（参加希望者は、世話人（imoto@ims.u-tokyo.ac.jp）に連絡下さい）
講師	Robert Daniel Barish
所属・職名	東京大学医科学研究所ヒトゲノム解析センター医療データ情報学分野・特任研究員
国名	日本
演題	Computational intractability as an ally and adversary in genomic privacy, analysis, and engineering
世話人	主たる世話人：井元清哉（健康医療インテリジェンス分野）世話人：渋谷哲朗（医療データ情報学分野）

概要

The foundational assumption of computational complexity theory and cryptography is thatsome problems are "harder" than others, in the sense that the time or space resources (e.g.CPU cycles, memory, or kilowatt-hours) required to solve them scales asymptotically faster asa function of their size in the average or worst case. When these required resources "blow up" exponentially or super-exponentially with only linearly or polynomially increasing problemsizes, one typically refers to such problems as being intractable. In this seminar, focusing onapplications to genomic medicine, I will discuss my current and past research concerning, inone direction, the exploitation of computational intractability, and in the opposite direction,the development of tools and techniques to mitigate or otherwise circumvent walls ofcomputational intractability when forced to solve NP-hard decision problems or #P-hardcounting problems (i.e. problems as hard as any on the polynomial hierarchy generalizingcomplexity classes P and NP).
Towards exploiting computational intractability, I will discuss my current research on anovel hierarchical block transposition scheme for the privacy preserving release of slowlyshuffled genomic and clinical longitudinal data, the latter being the type of data oneencounters in monitoring patient clinical trajectories or in pathogen contact tracing studies.More specifically, letting D ∊ D^n be a database modeled as a vector of n entries or recordsand adopting the notion of differential privacy due to Dwork et. al. (Dwork et. al.; Proc. TCC3876; 2006), I will show how extensions of the work of Diaconis & Shahshahani (Diaconis &Shahshahani; Z. Wahr. Verw. Geb. 57; 1981) on card shuffling, as well as direct mixing timecalculations, can be used to rigorously establish the intractability of guessing the existence ofan individual's records in D at the cost of adding a quantifiable amount of noise to usefulpopulation statistics and properties (e.g. haplotype block clusterings (Gabriel et. al.; Science296 (5576); 2002)).
In terms of mitigating or otherwise managing computational hardness, I will discuss myresearch on a problem instance classification approach using graph classes --- which I'veshown can be largely automated using Courcelle's meta-theorem --- to obtain a fine-grainedmap of the often sharp (and at times surprising) boundaries between tractability andintractability for instances of omics-related cycle and path counting problems that areprovably #P-complete in the general case. In particular, I will discuss my analysis (Barish &Suyama; LNCS 10307; 2017) of the complexity of counting and enumerating pathways inmetabolic networks denoted Elementary Flux Modes (EFMs) and Extreme Pathways (EPs), aswell as methods of minimally perturbing the topology of metabolic networks to allow for suchcounting and enumeration problems to be solved in polynomial time via dynamicprogramming. I will also discuss how limited instances of these NP-hard and #P-hardproblems can be directly solved when problem instance classification approaches fail.
Pertaining the same theme, I will discuss older work of mine (Barish et. al.; Nano Letters 5(12); 2005), (Barish & Schulman et. al.; PNAS 106(15); 2009) on finding practical solutionsto the NP-hard (or PSPACE-hard depending on its formulation) problem of creating compactrule sets for molecular implementations of 1D cellular automata (Rothemund et. al.; PLoSBiology 2(12); 2004) whose space-time histories correspond to 2D lattices DNA withWatson-Crick addressable nucleic acids at sets of target coordinates separated by distances of ≈5 nanometers to ≈10 micrometers. This work has allowed for the positioning and orientingof carbon nanotubes (Maune, Han, & Barish et. al.; Nat. Nanotechnol. 5(1); 2010), and I willelaborate on further applications to the combinatoric assembly of genomic sequences beyondwhat is generally possible with split-pool synthesis.