Urban Big Data Analytics

Lecture 11
Machine Learning +
Future of Urban Data Science


August 1, 2019

Instructor: Andy Hong, PhD
Lead Urban Health Scientist
The George Institute for Global Health
University of Oxford

Machine Learning

What is machine learning?

  • All useful programs "learn something"
  • Linear regression is one form of learning
  • Program that can learn from experience
"Field of study that gives computers the ability to learn without being explicitly programmed."
- Arthur Samuel 1959

Why Machine Learning?

Placemeter

Machine learning is . . .

Machine Learning Steps

Adapted from Google Cloud

Types of Learning

k-Means clustering

  • Basic machine learning method
  • A type of unsupervised learning
  • Assign data points to a cluster with the nearest mean

k-Means steps

  1. Assign each data point to a cluster whose centroid it is nearest to
  2. Adjust the locations of the clusters' centroids
  3. Reassign all the points to the centroid
  4. Repeat all the steps until no changes can be made
  5. Goal: minimize the pairwise squared deviations of points in the same cluster

K-Means Example

http://web.stanford.edu/class/ee103/visualizations/kmeans/kmeans.html

K-Means demo

Lecture 11 - Group Session Part 1

Random Forest

  • A collection of decision trees
  • Uses Bootstrap Aggregation (Bagging)
  • Bagging: combine multiple ML algorithms to increase accuracy

What is A Decision Tree?

What is A Decision Tree?

Bagging and Random Forest

$$ \hat{f}_{bag}(x) = \frac{1}{B} \sum_{b=1}^B \hat{f}^{*b}(x) $$
  • Bagging: Take repeated samples from the training data set; calculate decision tree algorithm for each subset; and average all the predictions
  • Random Forest: Instead of all predictors, take a random sample of m predictors, typically $m \approx \sqrt p$

Random Forest

Random forest demo

Lecture 11 - Group Session Part 2

Future of urban data science

Smart cities opportunities

https://www.youtube.com/watch?v=nnyRZotnPSU

Challenges of Smart Cities

  • Privacy issues
  • Biases in algorithms and automated systems
  • Competition between humans and robots
  • Equity vs. efficiency

Concluding remarks

Thank you

For all the course materials, go to urbanbigdata.github.io