Sign-up information will be available soon.
Machine learning with Pytorch for science of science and innovation
AI has made amazing contributions to science and creativity in recent years. It has led to advances in difficult and fundamental problems such as protein folding in biology, has beat humans in strategy games such as chess and go, and reached human level in recognition and synthesis of images and text. With the availability of powerful, high-level packages for AI, such Pytorch, it is easier than ever to utilize AI in computational social science (CSS) problems. However, a large fraction of the CSS community may find it difficult to get started with AI. One reason for this is that most tutorials to AI are not designed specifically with the CSS community in mind. They are either too high-level, or too mathematical for CSS researchers. They also often don’t cover the range of problems in CSS, such as network analysis, natural language processing and image processing. We bridge this gap by providing a tutorial that is tailored to the CSS audience. We discuss some of the most important AI techniques for CSS and review in detail a few papers utilizing AI in CSS. We then discuss the structure of the most important AI deep learning models for CSS. We provide an interactive notebook to teach how to design and use these methods in pytorch.
Web-crawling is an increasingly popular tool for collecting data from websites, online databases, and social media—but it also remains a common stumbling block for computational social science. With such a wide range of tools and languages available (HTML, APIs, and XPath are just a few), developing a crawling pipeline can be a frustrating experience for researchers unfamiliar with the promises and pitfalls of web-based data. This is especially the case for scholars without formal training in Computer Science, for whom attempting to collect new web corpora often means reinventing the flat tire.
Whatever your background, this workshop will give you the building blocks to use web-crawling in your research. We will tackle common problems including collecting web addresses (by automated Google search); focused, narrow crawling of a limited number of websites (with Requests and BeautifulSoup); and flexible, broad crawling of heterogeneous web corpora (with Scrapy). We will explore the tradeoff between precision and extensibility and challenge conventional skepticism toward noisier but scalable crawling.
We will build on best practices to create a decision hierarchy promoting accessible and efficient workflows. First, if there’s an API, use that before scraping; the data provider prefers this and it’s probably easier. Second, precise scraping is good—but not necessarily at the expense of scale or for methods less sensitive to noise (e.g., raw word counts). Finally, when collecting web data representative of a population, it’s okay to break individual sites’ Terms of Service, but be polite and don’t release anything private or copyrighted.
Analyze the Sentiment or Emotion Reflected in Newspapers
This tutorial will: (1) Engage in discussion and exercise that explore baseline and state-of-the-art approaches to Sentiment Classification and (2) Provide access to current and historical global and local newspapers titles which are in high demand by computational social scientists for text-mining research.
Sentiment Classification is a sub-task of text classification which attempts to assign an affective or emotion state to the text. Unlike the related task of sentiment analysis which aims to assign a single valence or likert score from ‘very negative’ to ‘very positive’, sentiment classification is less studied. In this tutorial, we will analyze newspaper articles in order to identify the following emotions: ‘anger’, ‘disgust’, ‘fear’, ‘sadness’, ‘happiness’, ‘love’, ‘surprise’, and ‘neutral’.
Classifying sentiment can be especially valuable with newspaper content with many potential applications such as:
• What are the ‘typical’ public emotions surrounding political events? Tragic events? • How do emotions or affective states as represented by a newspaper outlet vary by location and era?
• What are the most common emotion transitions?
As part of the tutorial, participants will (1) Create newspaper datasets focusing on sentiment-related topics and 2) Walk-through, develop understanding, and run both Python code for baseline BoW models and a SoTA BERT-based model which was developed as part of a collaboration with students and researchers at the University of Michigan. Since a text and data mining solution from ProQuest, TDM Studio, will be used for the tutorial, researchers will be provided access to this solution and hundreds of newspaper titles during the exercise.
Despite its inevitable nature and the incontrovertible wisdom that “failure is the mother of success”, our quantitative understanding of failure remains limited, in part due to the lack of systematic datasets that record the frequently occurring yet often neglected failures within individuals, teams and organizations. This situation is changing radically, however, thanks to newly available large-scale datasets spanning social, scientific, and technical domains. In this tutorial, we will touch on different examples of failures through the use of behavioral experiments, sociological theories, data analytics, causal inference, and mathematical modeling, hoping to illustrate that a computational social science agenda towards failures ---combining canonical social science frameworks, big data, and computational tools from AI and complexity sciences --- offers exciting new opportunities and challenges. By helping improve our understanding and predictions of the why, how, and when of failure, advances in this area not only hold potential policy implications; they could also substantially further our ability to imagine and create by revealing the total pipeline of creativity.
High Throughput Experimentation for Computational Social Science
Thinking with Deep Learning: An exposition of deep (representation) learning for social science research
A deluge of digital content is generated daily by web-based platforms and sensors that capture digital traces of communication and connection, and complex states of society, the economy, the human mind, and the physical world. Emerging deep learning methods enable the integration and analysis of these complex data in order to address research and real-world problems by designing and discovering successful solutions. Our tutorial serves as a companion to our book, “Thinking with Deep Learning”. This book takes the position that the real power of deep learning is unleashed by thinking with deep learning to reformulate and solve problems traditional machine learning methods cannot address. These include fusing diverse data like text, images, tabular and network data into integrated and comprehensive “digital doubles” of the subjects and scenarios you want to model, the generation of promising recommendations, and the creation of AI assistants to radically augment an analyst or system’s intelligence. For scientists, social scientists, humanists, and other researchers who seek to understand their subjects more deeply, deep learned representations facilitate the opportunity to not only predict and simulate them but also to provide novel insights, associations, and understanding available for analysis and reuse.
The tutorial will walk attendees through various non-nerual representations of social text, image and network data, and the various distance metrics we can use to measure between these representations. We then move on to introducing to neural models and their use in modern science and computing, with a focus on social sciences. After introducing neural architectures, we will explore how they are used with various multi-modal social data, and how their power can be unleashed with integrating and aligning these representations.