上QQ阅读APP看书，第一时间看更新

Example – Using SQL to get sales by region

In this example, we will use the R package sqldf to show the difference in sales and units in four sales regions. Note that this is an artificial example since we will first generate separate data for the West region with sales figures that are higher than the other three regions:

install.packages("sqldf") 
library(sqldf) 
set.seed(10) 
rows=100 
y <- rbind( 
data.frame(indv=factor(paste("TransId-", 1:100, sep = "")), 
                Sales=rnorm(rows, mean=1500000,sd=100000), 
                Units=round(rnorm(rows,mean=10, sd=3)), 
                Region=sample(c("North","East","South"),rows, replace=TRUE)), 
data.frame(indv=factor(paste("TransId-", 101:200, sep = "")), 
                Sales=rnorm(rows, mean=2000000,sd=100000), 
                Units=round(rnorm(rows,mean=10, sd=3)), 
                Region=sample(c("West"),rows, replace=TRUE)) 
) 
query <- "select Region,avg(Sales),avg(Units) from y group by Region" 
results <- sqldf(query,stringsAsFactors = FALSE) 
results

The results will appear in the console window:

This is randomly generated data. The functions rnorm() and sample() in the written code are giveaways. Generating random data is a great way to begin to test code and algorithms since you will always have a better idea of what kind of results to expect, given the assumptions.