Urban Big Data Analytics

Lecture 4
Database and SQL

July 22, 2019

Instructor: Andy Hong, PhD
Lead Urban Health Scientist
The George Institute for Global Health
University of Oxford

Assignment 2 check in

  • Gapminder offline tool
  • Some people having trouble understanding what issue means
  • An issue can be as specific as low child mortality
  • Or as broad as poor child welfare
  • It's about how you frame the problem, big or small

Database

What is database?

  • A systematic collection of data
  • Support storage and manipulation of data
  • Make data management quick and easy
  • Examples of database:
    • Online telephone directory
    • Amazon products database
    • Facebook posts database

Why use database?

  • Only way to handle large "big" data
    (over 10 million rows)
  • Faster to search and query
  • Easy to manage and reliable
  • Keep data and relationships separately
  • Handle multiple users concurrently

Why NOT use database?

  • Difficult to set up
  • Need a special software
  • Require expert knowledge
  • Overkill for small data
  • Don't have time and money

Two most common types

When to use what?

Relational DB Non-relational DB
  • Large structured data
  • Existing organization and relationships (corporates)
  • Scalability and reliability matter
  • Large unstructured data
  • Scalability and speed matter (startups)
  • Changing structure and flexible

Example
Relational database

Warner Bros Database

Example
Non-relational database

LinkedIn Member Distance DB

Structured
Query
Language

What is SQL?

  • Standard language for relational database management systems (RDBMS)
  • Been around for more than 40 years
  • Universal language for database:
    • Build databases: CREATE
    • Manipulate databases: INSERT, UPDATE
    • Query databases: SELECT

Basic SQL commands

  • CREATE TABLE: Creating a new table
  • DROP TABLE: Removing a table
  • UPDATE TABLE: Changing a table
  • CREATE INDEX: Optimizing a table
  • SELECT * FROM [table name] WHERE x = y: Querying a table

Basic SQLite commands

  • .tables: View tables
  • .schema [table name]: View header definitions
  • .quit: Close the SQLite program
  • .headers on: Turn on display of headers
  • .mode csv: Comma separated values
  • .mode column: Left aligned columns

SQLite Demo


						Andys-MacBook-Pro$ sqlite3 seattle_crime_data.db
						
						SQLite version 3.13.0 2016-05-18 10:57:30
						Enter ".help" for usage hints.
						 
						sqlite> .headers on
						sqlite> .mode column
						sqlite> .tables
						
					

SQLite Demo


						sqlite> .schema crime
						CREATE TABLE crime(
						  "ID" TEXT,
						  "GeneralOffenseNumber" TEXT,
						  "OffenseCode" TEXT,
						  "OffenseCodeExtension" TEXT,
						  "OffenseType" TEXT,
						  "SummaryOffenseCode" TEXT,
						  "SummarizedOffenseDescription" TEXT,
						  "DateReported" TEXT,
						  "OccurredDateorDateRangeStart" TEXT,
						  "OccurredDateRangeEnd" TEXT,
						  "HundredBlockLocation" TEXT,
						  "DistrictSector" TEXT,
						  "ZoneBeat" TEXT,
						  "CensusTract2000" TEXT,
						  "Longitude" REAL,
						  "Latitude" REAL,
						  "Location" TEXT,
						  "Month" INTEGER,
						  "Year" INTEGER
						);
						
					

SQLite Demo


						sqlite> SELECT Year, OffenseType FROM crime LIMIT 10;
						
						Year        OffenseType
						----------  ------------
						Year        Offense Type
						2010        PROPERTY DAM
						2012        THEFT-SHOPLI
						2012        THEFT-OTH
						2014        VEH-THEFT-TR
						2014        BURGLARY-FOR
						2010        ROBBERY-STRE
						2014        VEH-THEFT-AU
						2014        VEH-THEFT-AU
						2014        ROBBERY-BUSI
						
					

Assignment 3

Instruction

Class 4 - Group session

Instruction

Any questions?

For all the course materials, go to urbanbigdata.github.io