【95周年校庆系列讲座】A Diffusion Perspective of Manifold Clustering

时间:2020-07-29         阅读:

光华讲坛——社会名流与企业家论坛第 5836 期

(线上讲座)

主题A Diffusion Perspective of Manifold Clustering

主讲人伊利诺伊大学香槟分校 陈晓辉副教授

主持人统计学院 常晋源教授

时间2020年7月31日(周五)10:00-11:20

直播平台及会议IDZoom,会议ID:367 123 8320

主办单位:统计研究中心 数据科学与商业智能联合实验室 统计学院 科研处

主讲人概况:

Xiaohui Chen received a Ph. D. in Electrical and Computer Engineering in 2013 from the University of British Columba (UBC), Vancouver, Canada. He was a post-doctoral fellow at the Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located on the University of Chicago campus. In 2013 he joined the University of Illinois at Urbana-Champaign (UIUC) as an Assistant Professor of Statistics. He is an Associate Professor of Statistics at UIUC since 2019 and a member of Discovery Partners Institute (DPI) since 2020. He held Visiting Faculty position in the Institute for Data, Systems, and Society (IDSS) at Massachusetts Institute of Technology (MIT) in 2019-2020. He received numerous notable awards, including an NSF CAREER Award in 2018, an Arnold O. Beckman Award at UIUC in 2018, an Outstanding Young Researcher Award from the International Chinese Statistical Association (ICSA) in 2019, an Associate appointment in the Center for Advanced Study at UIUC in 2020-2021, and a Simons Fellowship in Mathematics from the Simons Foundation in 2020-2021. His teaching was recognized three times by the University of Illinois List of Teachers Ranked as Excellent by Their Students.

陈晓辉,2013年毕业于加拿大温哥华哥伦比亚大学(UBC)获得电子与计算机工程博士学位。他曾是芝加哥丰田技术研究所(TTIC)的博士后研究员,TTIC是位于芝加哥大学校园内的一所受慈善资助的学术计算机科学研究所。2013年,他加入了伊利诺伊大学香槟分校(UIUC),担任统计学助理教授。自2019年起,他是UIUC的统计学副教授,自2020年起,他是Discovery Partners Institute (DPI) (DPI)的成员。2019-2020年,他在麻省理工学院(MIT)数据、系统和社会研究所(IDSS)担任客座教授。他获得有众多著名的奖项,包括2018年NSF事业奖,2018UIUC的Arnold O. Beckman Award,2019年ICSA杰出青年研究学者奖,2020-2021年UIUC的高级研究中心的助理任命,2020-2021年西蒙斯基金会数学奖学金。他三次在UIUC被学生评为优秀教师。

内容提要:

We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. The diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion K-means. In addition, we also propose a nuclear norm regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion K-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the localized diffusion K-means by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion K-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds. Joint work with Yun Yang (UIUC).

本文引入黎曼子流形上的扩散K均值聚类方法,以使得扩散距离的聚类内连通性最大化。扩散K均值在相似图上会构造一个随机游走,顶点能作为流形上随机采样的数据点也能作为一个核的相似性,这个核能捕获流形的局部几何。扩散K均值是一种多尺度聚类工具,适用于混合维度中具有非线性和非欧几里得几何特征的数据。在给定簇数的情况下,本文提出了一个利用半定规划(SDP)求解扩散K均值的多项式-时间凸松弛算法。此外,本文还提出了一个能自适应簇数的核范数正则化的SDP。在这两种情况下,本文证明了子流形在适当的簇间可分性和簇内连通性下,能够准确恢复扩散K均值的SDPs,从而使得流形聚类问题得以量化。本文利用最近邻估计的局部自适应带宽,提出了局部扩散K均值算法。本文证明了局部扩散K均值能完全适应底层子流形的局部概率密度和几何结构。本文是和Yun Yang(UIUC)一起合作完成的。

XML 地图 | Sitemap 地图