R09228001 楊宇翔 台大地理碩二

Warmup Tasks

1. salary differences in gender (F/M) among departments

2. which department provides a better career path?

rm(list=ls())
setwd("~/Desktop/110-1/110-1 data visualization/week1 0923")
library(readxl)
d=read_excel("employee.xlsx") 
d=na.omit(d)
# Libraries
library(ggplot2)
## Warning: 套件 'ggplot2' 是用 R 版本 4.1.1 來建造的
options(scipen = 999)

1. salary differences in gender (F/M) among departments

### 要計算分組平均
employee <- d[,c("Gender","Dept","Salary")]
library(dplyr)
## 
## 載入套件:'dplyr'
## 下列物件被遮斷自 'package:stats':
## 
##     filter, lag
## 下列物件被遮斷自 'package:base':
## 
##     intersect, setdiff, setequal, union
#  R 的 dplyr 套件,將 data frame 重複的名稱的列合併,加總各欄位值。
# %>% 是置換符號,把要實施的資料寫在前,把要實施的動作函數寫在後

#將 data frame 以 group_by 分組之後,再以 summarise_all 配合 sum 以加總的方式合併重複的列。
#summarise_all(sum)

#若要以平均值的方式合併重複的列,可將 sum 函數替換為 mean
#summarise_all(mean)

#若要同時以加總與平均值的方式合併重複列,可以同時將 sum 與 mean 函數傳入 summarise_all 中
#summarise_all(list(sum = sum, mean = mean))

group_data<-employee %>%
  group_by(Gender,Dept) %>%
  summarise_all(mean)

group_data
## # A tibble: 10 x 3
## # Groups:   Gender [2]
##    Gender Dept  Salary
##    <chr>  <chr>  <dbl>
##  1 F      ACCT  58604.
##  2 F      ADMN  81434.
##  3 F      FINC  57140.
##  4 F      MKTG  64496.
##  5 F      SALE  64188.
##  6 M      ACCT  59626.
##  7 M      ADMN  80963.
##  8 M      FINC  72968.
##  9 M      MKTG  99063.
## 10 M      SALE  85224.
# barplot
# Use position=position_dodge()
ggplot(data=group_data, aes(x=Dept, y=Salary, fill=Gender)) +
  geom_bar(stat="identity", position=position_dodge())+
  scale_fill_brewer(palette = "Set1") +
  theme(legend.position="bottom")

Interpretation:

According to the mean salary in each sex and department group there are some findings as the following. In male group, MKTG department has the highest salary and ACCT department has the lowest one. In female group, ADMN department has the highest salary and the FINC department has the lowest one. In general, male take higher salary in most of department except for ADMN department in which female’s salary is higher than male’s one.

2. which department provides a better career path?

### 要計算分組平均
path <- d[,c("Years","Dept","Salary")]
library(dplyr)
group_data<-path %>%
  group_by(Dept,Years) %>%
  summarise_all(mean)
group_data
## # A tibble: 27 x 3
## # Groups:   Dept [5]
##    Dept  Years  Salary
##    <chr> <dbl>   <dbl>
##  1 ACCT      3  46125.
##  2 ACCT      5  49705.
##  3 ACCT      9  70316.
##  4 ADMN      2  61055.
##  5 ADMN      4  72675.
##  6 ADMN      6  69442.
##  7 ADMN      7  53788.
##  8 ADMN     18 122563.
##  9 ADMN     24 108138.
## 10 FINC      7  57140.
## # … with 17 more rows
#line chart
p<-ggplot(group_data, aes(x=Years, y=Salary)) +
  geom_line(aes(color=Dept))+
  geom_point(aes(color=Dept))
p

Interpretation:

According to the year to salary line chart, SALE department provides the best career path and MKTG provides the worst one. ADMIN is second best choice.

Homework

rm(list=ls())
setwd("~/Desktop/110-1/110-1 data visualization/week1 0923")
library(readxl)
d=read_excel("a_lvr_land_a.xlsx")#, fileEncoding = "BIG5")  #指定文字編碼encoding=big5

for(i in 1:nrow(d)){
  if (d$鄉鎮市區[i]=="士林區") d$region[i]="北區"
   else if (d$鄉鎮市區[i]=="北投區") d$region[i]="北區"
   else if (d$鄉鎮市區[i]=="文山區") d$region[i]="南區"
   else if (d$鄉鎮市區[i]=="內湖區") d$region[i]="東區"
   else if (d$鄉鎮市區[i]=="南港區") d$region[i]="東區"
   else if (d$鄉鎮市區[i]=="大安區") d$region[i]="中區"
   else if (d$鄉鎮市區[i]=="信義區") d$region[i]="中區"
   else if (d$鄉鎮市區[i]=="松山區") d$region[i]="中區"
    else d$region[i]="西區"
}
## Warning: Unknown or uninitialised column: `region`.
#table(d$region)
#table(t$建物型態)
#table(d$主要用途)
#table(d$單價元平方公尺)
#table(d$建物現況格局.房)

t=d[which(d$主要用途=="住家用"),c(12,17,23,29)]
colnames(t)=c("type","room","price","region")
t$room=as.numeric(t$room)
t$price =as.numeric(t$price)


### 要計算分組平均
library(dplyr)
group_data=t %>%
  group_by(region,room,type) %>%
  summarise_all(mean)
group_data
## # A tibble: 80 x 4
## # Groups:   region, room [32]
##    region  room type                         price
##    <chr>  <dbl> <chr>                        <dbl>
##  1 北區       0 華廈(10層含以下有電梯)     219877 
##  2 北區       1 公寓(5樓含以下無電梯)      307730 
##  3 北區       1 華廈(10層含以下有電梯)     203852.
##  4 北區       1 住宅大樓(11層含以上有電梯) 156939 
##  5 北區       2 公寓(5樓含以下無電梯)      147406 
##  6 北區       2 華廈(10層含以下有電梯)     147692.
##  7 北區       2 透天厝                     319437 
##  8 北區       2 住宅大樓(11層含以上有電梯) 221508.
##  9 北區       3 公寓(5樓含以下無電梯)      119464.
## 10 北區       3 華廈(10層含以下有電梯)     163312 
## # … with 70 more rows
#table(t$room)

#table(group_data$room)

# Libraries
library(ggplot2)
options(scipen = 999)

#chinese problem
#install.packages("showtext")
library(showtext)
## Warning: 套件 'showtext' 是用 R 版本 4.1.1 來建造的
## 載入需要的套件:sysfonts
## Warning: 套件 'sysfonts' 是用 R 版本 4.1.1 來建造的
## 載入需要的套件:showtextdb
## Warning: 套件 'showtextdb' 是用 R 版本 4.1.1 來建造的
showtext_auto()

mansion=group_data[which(group_data$type=="華廈(10層含以下有電梯)"),]
condor=group_data[which(group_data$type=="公寓(5樓含以下無電梯)"),]
highbuild=group_data[which(group_data$type=="住宅大樓(11層含以上有電梯)"),]
house=group_data[which(group_data$type=="透天厝"),]

house=group_data[which(group_data$建物型態=="透天厝"),]
## Warning: Unknown or uninitialised column: `建物型態`.
# Use grid.arrange to put plots in columns
#grid.arrange(plot1, plot2, plot3,plot4, ncol=2)

options(scipen=999)
ggplot(group_data)+ 
  geom_line(aes(x=room,y=price,col=region))+
  geom_point(aes(x=room,y=price,col=region))+
  facet_wrap(~ type, ncol = 2)+
  labs(title = "臺北市各區建物別每坪售價與房間數關係圖", x = "建物現況格局(房間數)", y = "每平方公尺平均單價(住宅用)")+
  scale_x_continuous(limits=c(1,7), breaks=seq(1,7,1))+
  ylim(0,700000)
## Warning: Removed 2 row(s) containing missing values (geom_path).
## Warning: Removed 7 rows containing missing values (geom_point).

Interpretation:

The figure above contains four line charts with relation of number of room to price per units with 4 types of building in different areas in Taipei city, Taiwan, Asia. Overall, in most places and most type of building, the fewer the rooms are, the higher the price per unit is. 4 types of buildings are flat, mansion, house, and tall building respectively.

In the type of flat, there show largest difference between 1 room and 2 room products. In south and west area in Taipei, 2 rooms is more costly than 1 rooms, however, in north and middle area of Taipei, the condition is opposite. In the type of mansion, in the middle area of taipei who a decrease with price along with the number of room and this situation is relatively mild in other areas.

In the type of house, there arre more amounts of cases in northern Taipei with bell shape. The climax of price is at 4 rooms. In other areas there are decreasing trend of price. Finally, in the tall building type, the association between numbers of room and price per unit is not distinguishable.

Too sum up, there is trend that the more the rooms is, the lower the price per unit is. Although there are some place contradict this situation, this is the main picture.