Chapter 1 Welcome

Welcome to the Urban Informatics Toolkit. This online book is intended to serve as an effective introduction to the understanding and analysis of the urban commons. In it you will find material on the theoretical underpinnings of Urban Informatics and an introduction to the statistical programming language R which will allow you to work confidently with the data you will encounter in your research, coursework, or elsewhere.

This book is many things at once. Within it are two years of study in the Urban Informatics Program at Northeastern University, five years of self-directed education in R, two years of teaching R, and innumerable hours of frustration spent attempting to understand what could otherwise have been a very digestible language.

I will do my best to avoid overly technical and verbose descriptions of theory and concepts. I will attempt to explain everything as I would to a friend.

By the end of these pages I hope that you will have become self-sufficient in your analyses. My intention is not to teach you everything that you will need to know because this is impossible. Data analytics are always changing and what is current will be dated by tomorrow. As I wrote this, in fact, one of the most commonly used tools was altered to such a degree that much of what I had already written became dated. The purpose of this book, then, is not to brief you on the latest developments, but to equip you with the proper tools with which navigate a rapidly evolving field.

This book has three goals. First, to ensure in the reader an understanding of the fundamentals of scientific inquiry in the Big Data era. Second, to familiarize the reader with effective approaches to data analysis. And third, to equip the reader with enough knowledge of R that they might understand what’s been happening under the hood, so to speak.

1.1 Structure of the book

This book is in four sections, each with it’s own theme. The first section, Foundations of Urban Informatics, will explore the nature of big data and its role in the field. This section will review different approaches to scientific inquiry, seeking to understand some of the newer methods enabled by big data, before closing with a review of Broken Windows Theory and the development of ecometrics.

The second section, Toolkit Foundations, will introduce you to R. This section is highly technical. We will first ensure that you download the software and data that is needed to follow along with the exercises in this book. This section will cover everything from reading data to manipulating it, while also discussing some fundamental R theory.

The third section, Visualization Strategies, is the largest, and is dedicated entirely to information visualization. In this section you will learn how to craft visualizations and understand when to make what kind of graphic. It will also discuss various traditional graphic types, and how to update and expand them. This section is expansive due to the importance of visualization.

The final section, Expanding the Toolkit, will deal with multiple datasets, statistics, and spatial data. Of course, the section on spatial analysis is, by necessity, lacking, as the subject is so expansive that even a cursory sketch of its contents would fill multiple books.

1.2 Considerations

As the fields of the digital humanities continue to grow and as the number of tools available in R increase I will work to continually add to and improve this writing. If there are topics that you would like to see included or expanded upon, please submit an issue on GitHub or reach out to me directly.