Wickets
This analysis focuses on identifying trends in IPL matches in the fall of wickets and the runs scored in between those wickets.
## FilePath Year Inning SumRuns
## Length:1750 Min. :2008 Min. :1.000 Min. : 1.0
## Class :character 1st Qu.:2011 1st Qu.:1.000 1st Qu.:114.0
## Mode :character Median :2014 Median :1.000 Median :138.0
## Mean :2014 Mean :1.513 Mean :134.3
## 3rd Qu.:2018 3rd Qu.:2.000 3rd Qu.:160.0
## Max. :2021 Max. :5.000 Max. :263.0
Trends
As expected, players down the order contribute less and less to the score than players higher up. The plot below shows the mean of all years. Clearly, there are a lot of runs being scored in the first three wickets, and then a precipitous drop towards the mid-order.
As is apparent, more than 60% of all runs that are scored come from just the first three batsmen. The next three batsmen contribute just 30%, and the last four put together contribute only 6.7% of the total score.
The graph below indicates that the largest number of innings end with between 4-5 wickets. Very few innings end before 3 wickets taken.
Time
While the previous graph analyses all of the data, it is interesting to see how these have changed over time.
This is the same graph as before, but separated by team. The lighter the color of the line, the more recent the calculation. For example, the 2009 performance is the darkest blue. Another version of the same graph with separate colors for individual year analysis has also been embedded below.
It is also helpful to view statistics of individual seasons. There are some interesting comparisons shown below.
If the first 7 seasons of IPL are compared to the next 7 seasons of IPL, there is a clear difference in the runs scored. The most significant differences are in the second and third wickets. Together, the first two wickets after 2014 scored 4% more than before 2014. Gradually, openers are scoring more and more runs as compared to the mid-order.
The first openers in 2021 scored almost 10% more than the openers in 2009. It is notable that IPL 2009 was played in South Africa, while IPL 2020 and IPL 2021 were held in the UAE. This may have contributed to the severe differences.
When considering seasons only held in India (all except 2009, 2021, and 2020), with the same logic as in the previous graph, there is still a difference in the order, once again, with the mid-order being stronger by a few percentage points than the top order in seasons before 2015.
Teams
The graph below shows the distribution of runs for each team. The table shows the sum of the first 3 wickets, next 3, and final 4 wickets. From it, it is clear as to which teams rely most on their openers and which rely more on the mid-order.
| Team | FirstThree | MidThree | LastFour |
|---|---|---|---|
| Chennai Super Kings | 67.73494 | 27.42740 | 4.635938 |
| Rising Pune Supergiants | 67.65579 | 29.87141 | 2.448071 |
| Sunrisers Hyderabad | 67.21706 | 27.07490 | 5.434664 |
| Punjab Kings | 64.63445 | 27.44905 | 7.469464 |
| Royal Challengers Bangalore | 62.62457 | 29.87506 | 7.121077 |
| Rajasthan Royals | 62.58892 | 30.04643 | 6.887592 |
| Delhi Capitals | 60.71671 | 32.33790 | 6.727227 |
| Kochi Tuskers Kerala | 60.68477 | 28.80756 | 10.448642 |
| Kolkata Knight Riders | 60.67838 | 31.85087 | 7.231028 |
| Gujarat Lions | 59.76492 | 32.68535 | 7.527125 |
| Mumbai Indians | 59.43563 | 33.75729 | 6.668023 |
| Deccan Chargers | 58.77142 | 34.39245 | 6.797612 |
| Pune Warriors | 54.78576 | 37.18037 | 7.930200 |
Among the major teams, CSK have relied on their first four batsmen for the vast majority of their runs, on par with SRH at about 67%. RCB, RR, and PBKS follow with between 62% and 64%. DC, KKR, and MI rely more on their mid order, with all of them having <61% of the runs coming from the top order.
The mid order shows more of the same, but reveals some more interesting details: DC despite being in the middle of the pack relies a lot more on the mid order than other teams. Similarly, MI and KKR also depend on the mid order for a third of their runs. CSK relies the least on the mid order, followed by SRH and PBKS.
CSK is also the only team with less than 5% of total runs scored coming from the last 4 batsmen. RCB, PBKS, KKR all have ~7% of their runs coming in from the last 4 batsmen, which may reflect a batting order that can rapidly collapse, but may also reflect good batting from bowlers in their teams.
Notes
The data for this exploration is obtained from CricSheet.org.
All data has been processed and plotted with
Rusing CricSheet data. The script has been embedded below.
View Script
library(ggplot2)
library(dplyr)
library(plotly)
library(knitr)
library(rjson)
library(tidyverse)
library(readr)
library(data.table)
FILE_PATH = params$FILE_PATH
FILTER_YEAR = params$FILTER_YEAR
PROCESS_DATA = params$PROCESS_DATA
PROCESS_LINK = params$PROCESS_LINK
PROCESS_OUTPUT = params$PROCESS_OUTPUT
readmePath <- paste(FILE_PATH, "/", "README.txt", sep = "")
readmeData <- read_lines(readmePath, skip = 24)
matches <- tribble( ~ Date, ~ Id)
for (d in readmeData) {
items = strsplit(d, " - ")[[1]]
year <- str_split(items[[1]], "-")[[1]][[1]]
matches <- matches %>% add_row(Date = year, Id = items[[5]])
}
if (FILTER_YEAR != "NONE") {
filteredMatches <- matches %>% filter(Date == FILTER_YEAR)
} else {
filteredMatches <- matches
}
files <-
filteredMatches %>% mutate(path = paste(FILE_PATH, "/", Id, ".json", sep = ""))
nameChanges <- tribble(
~ Old, ~ New,
"Delhi Daredevils", "Delhi Capitals",
"Rising Pune Supergiant", "Rising Pune Supergiants",
"Kings XI Punjab", "Punjab Kings"
)
processName <- function (oldName) {
if (oldName %in% nameChanges$Old) {
i <- which(oldName == nameChanges$Old)
nameChanges$New[[i]]
} else {
oldName
}
}
if (PROCESS_DATA) {
matchData <- tribble(~ FilePath, ~ RunsPerWicket, ~ Year, ~ Team, ~ Inning)
for (file in files$path) {
result <- fromJSON(file = file, simplify = TRUE)
date <- strsplit(result$info$dates[[1]], "-")[[1]][[1]]
inningNumber <- 1
for (inning in result$innings) {
wickets <- c()
runs <- 0
prevRuns <- 0
for (over in inning$overs) {
for (delivery in over$deliveries) {
if (length(delivery$wickets) > 0) {
wickets <- append(wickets, runs - prevRuns)
prevRuns <- runs
} else {
runs <- runs + delivery$runs$total
}
}
}
matchData <- add_row(matchData, FilePath = file, RunsPerWicket = list(wickets), Year = date, Team = processName(inning$team), Inning = inningNumber)
inningNumber <- inningNumber + 1
}
}
} else {
initData <- as_tibble(fread(PROCESS_LINK)) %>% mutate(R = RunsPerWicket)
matchData <- initData %>%
mutate(RunsPerWicket = lapply(strsplit(R, split = "\\|"), as.numeric))
}
cleaned <- na.omit(matchData) %>%
mutate(SumRuns = sapply(RunsPerWicket, sum)) %>%
filter(SumRuns > 0)
if (PROCESS_OUTPUT && PROCESS_DATA) {
fwrite(cleaned, PROCESS_LINK)
}
tData <- tribble(~ Wicket, ~ Value, ~ Year, ~ PercentValue)
for (y in unique(cleaned$Year)) {
sumP <- 0
sumL <- 0
yearFiltered <- cleaned %>% filter(as.numeric(Year) == as.numeric(y))
yearSum <- mean(yearFiltered$SumRuns)
ys <- sum(yearFiltered$SumRuns)
for (i in 1:10) {
r <- c()
totals <- c()
for (ia in 1:length(yearFiltered$RunsPerWicket)) {
run <- yearFiltered$RunsPerWicket[[ia]]
if (length(run) >= i) {
r <- append(r, run[[i]])
totals <- append(totals, yearFiltered$SumRuns[[i]])
}
}
sumP <- sumP + (mean(r)/yearSum)
sumL <- sumL + (sum(r)/ys)
tData <- tData %>% add_row(Wicket = i, Value = mean(r, na.rm = TRUE), Year = y, PercentValue = (sum(r)/ys))
}
}
teamData <- tribble(~ Wicket, ~ Value, ~ PercentValue, ~ Team)
for (y in unique(cleaned$Team)) {
sumP <- 0
sumL <- 0
teamF <- cleaned %>% filter(Team == y)
yearSum <- mean(teamF$SumRuns)
ys <- sum(teamF$SumRuns)
for (i in 1:10) {
r <- c()
totals <- c()
for (ia in 1:length(teamF$RunsPerWicket)) {
run <- teamF$RunsPerWicket[[ia]]
if (length(run) >= i) {
r <- append(r, run[[i]])
totals <- append(totals, teamF$SumRuns[[i]])
}
}
sumP <- sumP + (mean(r)/yearSum)
sumL <- sumL + (sum(r)/ys)
teamData <- teamData %>% add_row(Wicket = i, Value = mean(r, na.rm = TRUE), Team = y, PercentValue = (sum(r)/ys))
}
}
# Embed 1
summary(cleaned %>% select(FilePath, Year, Inning, SumRuns))
# Embed 2
meanData <- tData %>%
group_by(Wicket) %>%
summarise(Wicket = mean(Wicket), Value = mean(Value), PercentValue = mean(PercentValue))
pmean <- ggplot(meanData, aes(x = Wicket, y = PercentValue)) +
geom_line() +
geom_point() +
xlab("Wicket") +
ylab("% of Total Runs")
ggplotly(pmean)
# Embed 3
runsLengths <- matchData %>% filter(Inning < 3) %>% rowwise() %>% mutate(NumberWickets = length(RunsPerWicket))
prl <- ggplot(runsLengths, aes(x = NumberWickets)) +
geom_histogram(bins = 10)
ggplotly(prl)
# Embed 4
pteams <- ggplot(teamData, aes(x = Wicket, y = PercentValue, colour = Team)) +
geom_line() +
geom_point() +
xlab("Wicket") +
ylab("% of Total Runs")
ggplotly(pteams)
# Embed 5
ptimeC <- ggplot(tData, aes(x = Wicket, y = PercentValue, colour = factor(Year))) +
geom_line() +
geom_point() +
xlab("Wicket") +
ylab("% of Total Runs")
ggplotly(ptimeC)
compileYears <- function (years) {
a <- tData %>%
filter(Year %in% years) %>%
group_by(Wicket) %>%
summarise(
AvgVal = mean(PercentValue),
YearVal = paste(years[[1]], "to", years[[length(years)]], sep = " ")
)
a
}
combinedPlot <- function (year1, year2) {
comp1 <- compileYears(year1)
comp2 <- compileYears(year2)
comData <- bind_rows(comp1, comp2)
pcom <- ggplot(comData, aes(x = Wicket, y = AvgVal, colour = YearVal)) +
geom_line() +
geom_point() +
xlab("Wicket") +
ylab("% of Total Runs")
ggplotly(pcom)
}
# Embed 6
combinedPlot(c(2008, 2010, 2011, 2012, 2013, 2014), c(2015, 2016, 2017, 2018, 2019))
# Embed 7
pteams <- ggplot(teamData, aes(x = Wicket, y = PercentValue, colour = Team)) +
geom_line() +
geom_point() +
xlab("Wicket") +
ylab("% of Total Runs")
ggplotly(pteams)
# Embed 8
order1 <- teamData %>%
filter(Wicket == 1 | Wicket == 2 | Wicket == 3) %>%
group_by(Team) %>%
summarise(FirstThree = sum(PercentValue) * 100)
order2 <- teamData %>%
filter(Wicket == 4 | Wicket == 5 | Wicket == 6) %>%
group_by(Team) %>%
summarise(MidThree = sum(PercentValue) * 100) %>%
select(MidThree)
order3 <- teamData %>%
filter(Wicket == 7 | Wicket == 8 | Wicket == 9) %>%
group_by(Team) %>%
summarise(LastFour = sum(PercentValue) * 100) %>%
select(LastFour)
combined <- bind_cols(order1, order2, order3) %>% arrange(desc(FirstThree))
kable(combined)
#END