Prerequisites

For this group session, you will need SQLite. Please download the following file and follow the instruction.

For Windows:

For Mac:

Get the Seattle crime data:


Instruction

1. Synopsis

The purpose of this group session is to get you familiar with database sysem. There are two steps in this group session. First, you will create your database file and create a table under the database. Then you will import the Seattle Crime data (csv file) into the table.

Second, you will write a custom code to create an interface in Exploratory to your database file you just created. RSQLite package allows you to create a database connection and load the table into the R environment.


2. Create a database table and import CSV file

First, open up the terminal and go to your group session folder.

# For Windows:
cd C:\Users\[your user name]\Documents\vsp_bigdata\group_session\04-lecture

# For Mac:
cd /Users/[your user name]/Documents/vsp_bigdata/group_session/04-lecture

Run the SQLite program on a terminal and a command line.

sqlite3 seattle_crime_data.db

Create the first table and define database schema.

CREATE TABLE crime(
  "ID" TEXT,
  "GeneralOffenseNumber" TEXT,
  "OffenseCode" TEXT,
  "OffenseCodeExtension" TEXT,
  "OffenseType" TEXT,
  "SummaryOffenseCode" TEXT,
  "SummarizedOffenseDescription" TEXT,
  "DateReported" TEXT,
  "OccurredDateorDateRangeStart" TEXT,
  "OccurredDateRangeEnd" TEXT,
  "HundredBlockLocation" TEXT,
  "DistrictSector" TEXT,
  "ZoneBeat" TEXT,
  "CensusTract2000" TEXT,
  "Longitude" TEXT,
  "Latitude" TEXT,
  "Location" TEXT,
  "Month" TEXT,
  "Year" TEXT
);

Check the schema.

.schema crime

Import the csv into the database table.

.mode csv
.import "/Users/[your user name]/Documents/vsp_bigdata/group_session/04-lecture/seattle_crime.csv" crime

Turn on several options to format the table

.headers on
.mode column

# CREATE INDEX crime_year ON crime (Year);
# DROP INDEX crime_year;
# 
# SELECT Year, COUNT(*) FROM crime GROUP BY Year;
#  
# 
# SELECT Year,GeneralOffenseNumber,OffenseCode,OffenseType FROM crime LIMIT 10;

Query the top 10 rows from the database.

SELECT * FROM crime LIMIT 10;

Close the SQLite program.

.quit


3. Interface with SQLite database in R

You can use the RSQLite package to directly access your SQLite database file.
First, open up Exploratory, select R Packages, and install RSQLite package.
Click + on the Data Frames and select R Script. Copy and paste the following code. modify the file name to point to your SQLite database path.

library(RSQLite)
library(DBI)

# Define database path
filename = "/Users/[your user name]/Documents/vsp_bigdata/group_session/04-lecture/seattle_crime_data.db"

# Connect to db
con = dbConnect(drv=RSQLite::SQLite(), dbname=filename)

# Issue a query to read the data
# my_data = dbGetQuery(con, "SELECT * FROM crime")

my_data = dbGetQuery(con, "SELECT Year, Month, SummarizedOffenseDescription, DateReported, DistrictSector FROM crime WHERE Year >= 2000 AND Year <=2014")

Click Run and give the data frame name seattle_crime.

Congratulations! You successfully loaded your first database table into the R environment.

4. Each group is asked to change the SQL code to complete the specific query:

  • Group 1: Choose year 2006
  • Group 2: Choose year 2006, and select only BURGLARY for the type of crime
  • Group 3: Choose year 2008
  • Group 4: Choose year 2008
  • Group 5: Choose year 2010
  • Group 6: Choose year 2010
  • Group 7: Choose year 2012, and select only ASSAULT for the type of crime
  • Group 8: Choose year 2012
  • Group 9: Choose year 2013
  • Group 10: Choose year 2013


5. Publish your data and codes

Once you group completed the session, please publish your data and codes using the following steps.

  1. Click the publish button as shown below:



  2. Write your title and select options as shown below:


Share your story