Urban Big Data Analytics
					
					Lecture 10
					Advanced Modeling
					
					
					July 31, 2019
					
					Instructor: Andy Hong, PhD
					
					Lead Urban Health Scientist
					The George Institute for Global Health
					University of Oxford
					
					
				
				
				
				
					Final group project
					
						- Five elements: 1. Problem; 2. Hypotheses; 3. Data and methods; 4. Results and Interpretation; 5. Conclusions
 
						- Minimum 6 page, single space, Times New Roman 12 font size
 
						- August 8 (Weds), 12:00 Midnight
 
					
				
				
					
				
				
					Final group presentation
					
						- Presentation (12 mins), Q&A (2 mins)
 
						- Each group member needs to present
 
						- Share some preliminary results
 					
					
					
				
				
				
				Special Guest Speakers
				A 14-week program to work on collaborative projects
				
					
						
				
				
					2019 DSSG Fellows
					
					
											
				
				
				
					
					Beyond linear regressions
											
						- Linear models are good for numbers
 
						- But, what about categorical data?
 
						- What about survey data with yes or no questions?
 
						- Can you convert categories into numbers?
 
					
				
				
				
				
					
				BC Generations Survey
  				  
					
				
				
				
				
					
				Logistic Regression
				
				
					- "Logit" regression
 
					- "Logit" model
 
					- Developed by David Cox in 1958
 
					- Regression model for categorical outcome Y
 
				
				
				
					
						Sir David Cox (age 95)
						
					
									
				
					
				Why Logistic Regression?
				
					- Linear model, not appropriate for a qualitative response
 
					- Ex) Question - How would you rate this course?
 
					- Aweful - Okay - Good - Very Good - Excellent
 
					- Can we turn this into 1-2-3-4-5? 
 
					- No, because the distance between each item is not the same
 
				
				
				
					
				Types of Logistic Regression
				
					- Simple logistic model
						
							- Binary outcome: "0" and "1"
 
							- Pass/Fail, Win/Lose, Dead/Alive, Sick/Healthy
 
						
					 
					- Multinomial logistic model
						
							- Multiple categorical outcomes
 
							- A range of values: Unsatisfied - Satisfied - Very Satisfied
 
						
					 
				
				
				
				
					
				
					
				Linear vs. Logistic Regression
				
					
						
							| Linear function | 
							Logistic function | 
						
					
					
					
						| 
						 $$ f(x) =  \beta_{0} + \beta_{1}x $$ 
						 | 	
						 
						 $$ f(x) = \frac{e^{\beta_{0} + \beta_{1}x}}{1 + e^{\beta_{0} + \beta_{1}x}} $$  
						$$ \log_e (\frac{p}{1-p}) =  \beta_{0} + \beta_{1}x $$ 
						 | 
					
				
				
				
				
					
						Log Odds
						
					
				
				
					
						Odds
						
					
												
				
					
						Odds and Odds Ratio
						
					
				
				
					
						Odds Ratio Example
						
					
					
				
				
					
				Simple Logistic Regression
				$$ corruption \approx f(income) $$
					
					# Simple Logit Model
					m1 = glm(data = gapminder, 
							 corruption ~ income, 
							 family = "binomial")
		
					exp(coef(m1))	
					confint(m1)		 	 
					
					
				
				
				
					
				Multiple Logistic Regression
				$$ corruption \approx f(income, population, democracy) $$
					
					# Multiple Logit Model
					m2 = glm(data = gapminder, 
							 corruption ~ income + population + democracy, 
							 family = "binomial")
					
					exp(coef(m2))	
					confint(m2)
					
					
				
				
					
						2019 DSSG Fellows