For this group session, you will need SQLite. Please download the following file and follow the instruction.
For Windows:
vsp_bigdata
folder and create a folder named sqlite
.sqlite
folder.cmd
set PATH=%PATH%;C:\Users\[your user name]\Documents\vsp_bigdata\sqlite
set PATH=%PATH%;C:\Users\andyhong\Documents\vsp_bigdata\sqlite
path
For Mac:
vsp_bigdata
folder and create a folder named sqlite
.sqlite
folder.Applications
> Utilities
> Terminal
export PATH=$PATH:/Users/[your user name]/Documents/vsp_bigdata/sqlite
export PATH=$PATH:/Users/andyhong/Documents/vsp_bigdata/sqlite
echo $PATH
Get the Seattle crime data:
vsp_bigdata/group_session/04-lecture
.vsp_bigdata/group_session/04-lecture
.The purpose of this group session is to get you familiar with database sysem. There are two steps in this group session. First, you will create your database file and create a table under the database. Then you will import the Seattle Crime data (csv file) into the table.
Second, you will write a custom code to create an interface in Exploratory to your database file you just created. RSQLite
package allows you to create a database connection and load the table into the R environment.
First, open up the terminal and go to your group session folder.
# For Windows:
cd C:\Users\[your user name]\Documents\vsp_bigdata\group_session\04-lecture
# For Mac:
cd /Users/[your user name]/Documents/vsp_bigdata/group_session/04-lecture
Run the SQLite program on a terminal and a command line.
sqlite3 seattle_crime_data.db
Create the first table and define database schema.
CREATE TABLE crime(
"ID" TEXT,
"GeneralOffenseNumber" TEXT,
"OffenseCode" TEXT,
"OffenseCodeExtension" TEXT,
"OffenseType" TEXT,
"SummaryOffenseCode" TEXT,
"SummarizedOffenseDescription" TEXT,
"DateReported" TEXT,
"OccurredDateorDateRangeStart" TEXT,
"OccurredDateRangeEnd" TEXT,
"HundredBlockLocation" TEXT,
"DistrictSector" TEXT,
"ZoneBeat" TEXT,
"CensusTract2000" TEXT,
"Longitude" TEXT,
"Latitude" TEXT,
"Location" TEXT,
"Month" TEXT,
"Year" TEXT
);
Check the schema.
.schema crime
Import the csv into the database table.
.mode csv
.import "/Users/[your user name]/Documents/vsp_bigdata/group_session/04-lecture/seattle_crime.csv" crime
Turn on several options to format the table
.headers on
.mode column
# CREATE INDEX crime_year ON crime (Year);
# DROP INDEX crime_year;
#
# SELECT Year, COUNT(*) FROM crime GROUP BY Year;
#
#
# SELECT Year,GeneralOffenseNumber,OffenseCode,OffenseType FROM crime LIMIT 10;
Query the top 10 rows from the database.
SELECT * FROM crime LIMIT 10;
Close the SQLite program.
.quit
You can use the RSQLite package to directly access your SQLite database file.
First, open up Exploratory, select R Packages, and install RSQLite
package.
Click +
on the Data Frames and select R Script
. Copy and paste the following code. modify the file name to point to your SQLite database path.
library(RSQLite)
library(DBI)
# Define database path
filename = "/Users/[your user name]/Documents/vsp_bigdata/group_session/04-lecture/seattle_crime_data.db"
# Connect to db
con = dbConnect(drv=RSQLite::SQLite(), dbname=filename)
# Issue a query to read the data
# my_data = dbGetQuery(con, "SELECT * FROM crime")
my_data = dbGetQuery(con, "SELECT Year, Month, SummarizedOffenseDescription, DateReported, DistrictSector FROM crime WHERE Year >= 2000 AND Year <=2014")
Click Run
and give the data frame name seattle_crime
.
Congratulations! You successfully loaded your first database table into the R environment.
Once you group completed the session, please publish your data and codes using the following steps.
Click the publish button as shown below:
Write your title and select options as shown below: