Ricardo Baeza-Yates
NTENT & Northeastern University at SV, USA
Big Data or Right Data? Opportunities and Challenges
Monday December 10, 09:00 – 10:00 | Ada Lovelace Auditorium, IST Dept.
Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So, the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems, this would imply big data, but for most of the problems much less data will and is needed. Hence, in this presentation, we cover the opportunities and the challenges behind big data. Regarding the challenges, we explore the trade-offs involved with the main problems that arise with big data: scalability, redundancy, bias, the bubble filter and privacy.
Ricardo Baeza-Yates areas of expertise are information retrieval, web search and data mining, data science and algorithms. He is currently a Professor at Northeastern University, Silicon Valley campus, since August 2017. He is also CTO of NTENT, a semantic search technology company based in California since June 2016. Before he was VP of Research at Yahoo Labs, based in Sunnyvale, California, from August 2014 to February 2016. Before he founded and led from 2006 to 2015 the Yahoo labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw Yahoo Labs in Haifa, Israel, and started the London lab in 2012. He is part time Professor at the Dept. of Information and Communication Technologies (DTIC) of the Universitat Pompeu Fabra (UPF), in Barcelona, Spain, as well as at the Dept. of Computing Science (DCC) of Universidad de Chile in Santiago. During 2005, he was an ICREA research professor at UPF. Until 2004 he was Professor and founding director of the Center for Web Research at Universidad de Chile. He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, which won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 600 other publications. Within ACM he was the Chilean site director of the regional ACM Programming Contest and member of the South American steering committee from 1998 to 2005. Later he was member of the ACM Publications Board from 2007 to 2009 and of the ACM European Council from 2010 to 2014. Finally, he was elected to the ACM Council from July 2012 to June 2016. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003, he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009, he was named ACM Fellow and in 2011 IEEE Fellow.
ChandraSekhar. C
Indian Institute of Technology Madras, India
Deep Learning Models for Image Processing Tasks
Monday December 10, 10:30 – 12:30 | Ada Lovelace Auditorium, IST Dept.
The shallow learning models based on conventional machine learning techniques for pattern classification such as Gaussian mixture models, multilayer feedforward neural networks and support vector machines use the hand-picked features as input to the models. Recently, several deep learning models have been explored for learning a suitable representation from the image data and then using the learnt representation for performing the image pattern analysis tasks such as image classification, annotation and captioning. In this talk, we present the deep learning models such as Stacked autoencoder, Deep convolutional neural network and Stacked restricted Boltzmann machine for learning a suitable representation from the image data. Then, we present the deep learning models based approaches to image classification, image annotation and image captioning.
Prof.C.Chandra Sekhar received his B.Tech. degree in Electronics and Communication Engineering from Sri Venkateswara University, Tirupati, India, in 1984. He received his M.Tech. degree in Electrical Engineering and Ph.D. degree in Computer Science and Engineering from Indian Institute of Technology (IIT) Madras in 1986 and 1997, respectively. He is currently working as a Professor since 2010 in the Department of Computer Science and Engineering at IIT Madras. He was a Japanese Society for Promotion of Science (JSPS) post-doctoral fellow at Center for Integrated Acoustic Information Research, Nagoya University, Nagoya, Japan, from May 2000 to May 2002. Prof.Chandra Sekhar has received the “Srimathi Marti Annapurna Gurunath Award for Excellence in Teaching at IIT Madras” for the year 2016. His current research interests are in speech processing, kernel methods, deep learning, distance metric learning and content-based information retrieval of multimedia data.
Hema A Murthy
Indian Institute of Technology Madras, India
Signal Processing guided Machine Learning
Monday December 10, 10:30 – 12:30 | Turing Hall, CSE Dept.
Machine learning has become ubiquitous today. Big data analytics has become the buzzword. Build, train, and deploy/transfer is the paradigm that has become the “mantra” today. The more the amount of data available, the more
robust the systems are at prediction. The major problem with machine learning is the problem of getting huge amount data that has been curated. In the context of speech technologies, in a country like India with a large linguistic
diversity, getting data that is accurate for training is difficult. Another issue, is that of simultaneous collection of data in multiple languages. Is there a way to reduce the amount of data required for training a machine learning system?
In this tutorial, we show how signal processing can be used to guide machine learning algorithms. In particular we study problems in speech synthesis, recognition, Indian music analysis, and computational brain research, where efforts are made to first process the signal before subjecting it to machine learning. Signal processing yields accurate results in the particular, while it may lead to a large number of insertions, deletions, and substitutions. Using machine learn-
ing, and signal processing in tandem we show that the amount of data required for training systems can be reduced significantly.
Prof. Hema A Murthy received her B.E. degree in Electronics and Communication Engineering from Osmania University, Hyderabad, India, in 1980. She received her M.E. degree in Electrical and Computer Engineering, McMaster University, Canada, 1986 and Ph.D. degree in Computer Science and Engineering from Indian Institute of Technology (IIT) Madras in 1992. She is currently working as a Professor since 2006 in the Department of Computer Science and Engineering at IIT Madras. Prof. Hema A Murthy has received the “Manthan award” and Prof. Rais Ahmed Memorial Lecture Award, Acoustical Society of India for the year 2012. Her current research interests are in Speech Processing, Speech synthesis and recognition, Network transfer analysis Modelling and Music Processing.
Jaya Sreevalsan Nair
International Institute of Information Technology Bangalore, India
Visual Analytics: “Bringing data to life”
Monday December 10, 01:30 – 03:30 | Ada Lovelace Auditorium, IST Dept.
John Tukey, the mathematician, said the following. once upon a time about analytics: “This is my favorite part about analytics: Taking boring flat data and bringing it to life through visualization.” It remains true to a great extent even today, in the time of big data. The objective of this tutorial is to impress upon the audience the need for visualization as an essential part of larger data science workflows. Visualization in itself has evolved from being summaries to facilitating complex exploratory analysis of data. This tutorial will demonstrate techniques of how data can be formatted to make the best use of some of the time-tested visualization techniques, and how visualizations enable in the overall data analysis.
Dr. Jaya Srevalsan Nair is currently with the International Institute of Information Technology Bangalore. Her research interests include exploiting spatial locality and other analytical processes in data visualization. She applies these approaches to work well in LiDAR point cloud analysis, multiplex networks in biology and society, and multivariate data in health informatics. She has graduated with a B. Tech. In aerospace engineering from IIT Madras, M.S. in computational engineering from Mississippi State University, and Ph.D. in computer science from University of California at Davis. She has received the Early Career Research Award by SERB in 2017.
Manikandan Narayanan
Indian Institute of Technology Madras, India
Biological/Genomic Data Science: Moving Beyond Correlation to Causation
Monday December 10, 01:30 – 03:30 | Turning Hall, CSE Dept.
Discovering causal relations in a complex system is a fundamental pursuit in many sciences and disciplines. When controlled intervention experiments to determine cause-and-effect is not feasible or ethical, causal inference is surprisingly possible from observational data alone - its theory (models/assumptions/language) and practice (concrete applications in biology/medicine) is the focus of this tutorial. You will find this tutorial appealing if you find causal inference from observational data intriguing (e.g., how can one break the symmetry of an observed correlation between two variables to determine the causal direction, or sever the links to not only known but also unknown confounding factors?) and valuable (in terms of its broad applications, including bioinformatics applications ranging from identifying causal risk factors of human diseases to gene regulatory networks underlying living cells).
We will start with causal discovery between two variables using the so-called mediation-based and Mendelian Randomization (MR) approaches that are analogous to Randomized Controlled Trials popularized by Ronald Fisher, and then move onto multivariate causal discovery using the framework of Bayesian networks and do-calculus pioneered by Judea Pearl. We intend to cover modern developments and data resources that aid causal discovery from biomedical/genomic data (for instance, one recent resource pools 11 billion correlations between genetic variants and health/disease-related outcomes from genome-wide association studies, which are waiting to be mined for new causal factors for human health and disease).
All relevant biology and causality concepts will be introduced. A basic knowledge of probability/statistics is assumed.
Dr. Manikandan Narayanan is an Associate Professor at the Indian Institute of Technology (IIT) Madras in the Department of Computer Science and Engineering. He is also a core faculty at the Initiative for Biological Systems Engineering and a faculty member of the Robert-Bosch Center for Data Science and Artificial Intelligence at IIT Madras. His research areas include bioinformatics, computational network biology, and systems biology/genomics in health and disease; specifically multilayer probabilistic graphical models and graph algorithms that can help analyse large-scale biomolecular data to dissect tissue-tissue communication and disease-disease interactions. He is a Wellcome Trust / DBT India Alliance Intermediate Fellowship Awardee (2018). Prior to joining IIT, Manikandan was a Staff Scientist in the Systems Genomics and Bioinformatics Unit at the National Institutes of Health (NIH). He obtained his Ph.D. in Computer Science (with an emphasis in computational and genomic biology) from the University of California, Berkeley and has also held a Sr. Research Scientist position at Merck Research Labs in Seattle and Boston.
Shalini R. Urs
MYRA School of Business, Mysore, India
Social Network Analysis: Making the invisible visible
Monday December 10, 04:00 – 05:30 | Ada Lovelace Auditorium, IST Dept.
Over the past decade, there has been a growing public fascination with the complex "connectedness" of modern society especially since the emergence of Social Networking sites. Whether the rapid spread of news or the tipping point of social/political movements gathering momentum or the cascading of epidemics and financial crises around the world with alacrity and intensity, it is attributed to the connectedness of today’s society. Many scientific disciplines have come together and evolved into a new discipline of network science focused on understanding these complex connected systems operate. Social Network Analysis (SNA) has emerged as an approach and a tool to uncover and understand the hidden side of connections.
This tutorial will introduce the basic concepts of a network, their attributes and their measures such as Centrality, Components, Cohesion, Geodesic, Density and Degree, Cores, Cliques and, and others. We will also introduce Graph Theory that underpins network science and uses graph theory as a primary tool in the broader examination of networks. With the help of examples from across different domains, we will help participants understand the dynamics of social networks and how this understanding can be used from uncovering terrorist networks to "The Network of Global Corporate Control."
This tutorial will introduce the participants to some of the essential software tools such as Gephi, Pajek, NodeXL, Cytoscape and NetworkX. Participants will be shown how to install, import data and analyze the network with the help of examples. A comparison of these five software tools concerning features and performance will be presented.
Dr. Shalini Urs is an internationally recognized academic and an institution builder whose brainchild is the MYRA School of Business (www.myra.ac.in) and the International School of Information Management (www.isim.ac.in). She started her career in the academia 42 years back, joining the University of Mysore in 1976.
As an information scientist, Dr. Urs has a research interest in all matters of the mind—from creativity to cognitive to cultural. Taking a 360-degree view of information, she has researched on issues ranging from the theoretical foundations of information sciences to Informatics. Her areas of research include—Information Retrieval, Ontology Development, and Social Media and Network Analysis. She conceptualized and developed the Vidyanidhi Digital Library and eScholarship portal in the year 2000 with funding from the Government of India, Department of Scientific and Industrial Research under their National Information System for Science and Technology (NISSAT), which became a national initiative with further funding from the Ford Foundation. She is lauded as a pioneer in the field of digital libraries and Electronic Theses and Dissertations (ETD) areas in India. She built an online digital library way back in 2000, much before the digital era captured the imagination in India. She has published papers in international journals and peer-reviewed conference proceedings published by Springer, Emerald, and others.
Widely traveled, invited to speak at various international and national conferences, Dr. Urs speaks on wide-ranging topics focusing on primarily ‘Technology for Learning and Development.’ Dr. Urs has won many awards—including the Best Paper Award at Infotex 1995; NDLTD - Adobe Leadership award in 2004; Emerald Research Fund Award 2007-08. She was awarded the Mortenson Distinguished Lecturer 2010 by the University of Illinois Urbana Champaign, USA and was invited to deliver their annual lecture series, where she lectured on how digital technologies have transformed the learning experience. She was conferred with the Women Leadership Award by Dewang Mehta Business School awards in 2014 and Education Evangelist Award of India award by Skill Tree Foundation in 2015 for her outstanding contribution to higher education.
She was a Fulbright scholar and a visiting professor at the Department of Computer Science, Virginia Tech, USA during 2000 - 2001. She was an adjunct faculty at the International Institute of Information Technology, Bangalore (IIITB) during 2005-2007. She has also been a visiting professor at the Indian Statistical Institute, Bangalore during 1998 and 1999. She has published more than 200 papers in peer-reviewed journals and prestigious academic conferences. She has served as a UNESCO expert on several occasions and has been commissioned to conduct many studies in the last ten years.
With decades of experience in academia and educational entrepreneurship, she founded the MYRA School of Business in 2010. With the mission of ‘learning unlimited, excellence unbounded’ she has been leading MYRA’s efforts to be a truly global and uniquely Indian business school with Data Analytics embedded in its DNA.