Wickets

Go Back

This analysis focuses on identifying trends in IPL matches in the fall of wickets and the runs scored in between those wickets.

##    FilePath              Year          Inning         SumRuns     
##  Length:1750        Min.   :2008   Min.   :1.000   Min.   :  1.0  
##  Class :character   1st Qu.:2011   1st Qu.:1.000   1st Qu.:114.0  
##  Mode  :character   Median :2014   Median :1.000   Median :138.0  
##                     Mean   :2014   Mean   :1.513   Mean   :134.3  
##                     3rd Qu.:2018   3rd Qu.:2.000   3rd Qu.:160.0  
##                     Max.   :2021   Max.   :5.000   Max.   :263.0

Trends

As expected, players down the order contribute less and less to the score than players higher up. The plot below shows the mean of all years. Clearly, there are a lot of runs being scored in the first three wickets, and then a precipitous drop towards the mid-order.

As is apparent, more than 60% of all runs that are scored come from just the first three batsmen. The next three batsmen contribute just 30%, and the last four put together contribute only 6.7% of the total score.

The graph below indicates that the largest number of innings end with between 4-5 wickets. Very few innings end before 3 wickets taken.

Time

While the previous graph analyses all of the data, it is interesting to see how these have changed over time.

This is the same graph as before, but separated by team. The lighter the color of the line, the more recent the calculation. For example, the 2009 performance is the darkest blue. Another version of the same graph with separate colors for individual year analysis has also been embedded below.

It is also helpful to view statistics of individual seasons. There are some interesting comparisons shown below.

If the first 7 seasons of IPL are compared to the next 7 seasons of IPL, there is a clear difference in the runs scored. The most significant differences are in the second and third wickets. Together, the first two wickets after 2014 scored 4% more than before 2014. Gradually, openers are scoring more and more runs as compared to the mid-order.

The first openers in 2021 scored almost 10% more than the openers in 2009. It is notable that IPL 2009 was played in South Africa, while IPL 2020 and IPL 2021 were held in the UAE. This may have contributed to the severe differences.

When considering seasons only held in India (all except 2009, 2021, and 2020), with the same logic as in the previous graph, there is still a difference in the order, once again, with the mid-order being stronger by a few percentage points than the top order in seasons before 2015.

Teams

The graph below shows the distribution of runs for each team. The table shows the sum of the first 3 wickets, next 3, and final 4 wickets. From it, it is clear as to which teams rely most on their openers and which rely more on the mid-order.

Team	FirstThree	MidThree	LastFour
Chennai Super Kings	67.73494	27.42740	4.635938
Rising Pune Supergiants	67.65579	29.87141	2.448071
Sunrisers Hyderabad	67.21706	27.07490	5.434664
Punjab Kings	64.63445	27.44905	7.469464
Royal Challengers Bangalore	62.62457	29.87506	7.121077
Rajasthan Royals	62.58892	30.04643	6.887592
Delhi Capitals	60.71671	32.33790	6.727227
Kochi Tuskers Kerala	60.68477	28.80756	10.448642
Kolkata Knight Riders	60.67838	31.85087	7.231028
Gujarat Lions	59.76492	32.68535	7.527125
Mumbai Indians	59.43563	33.75729	6.668023
Deccan Chargers	58.77142	34.39245	6.797612
Pune Warriors	54.78576	37.18037	7.930200

Among the major teams, CSK have relied on their first four batsmen for the vast majority of their runs, on par with SRH at about 67%. RCB, RR, and PBKS follow with between 62% and 64%. DC, KKR, and MI rely more on their mid order, with all of them having <61% of the runs coming from the top order.

The mid order shows more of the same, but reveals some more interesting details: DC despite being in the middle of the pack relies a lot more on the mid order than other teams. Similarly, MI and KKR also depend on the mid order for a third of their runs. CSK relies the least on the mid order, followed by SRH and PBKS.

CSK is also the only team with less than 5% of total runs scored coming from the last 4 batsmen. RCB, PBKS, KKR all have ~7% of their runs coming in from the last 4 batsmen, which may reflect a batting order that can rapidly collapse, but may also reflect good batting from bowlers in their teams.

Notes

The data for this exploration is obtained from CricSheet.org.
All data has been processed and plotted with R using CricSheet data. The script has been embedded below.

View Script

library(ggplot2)
library(dplyr)
library(plotly)
library(knitr)
library(rjson)
library(tidyverse)
library(readr)
library(data.table)

FILE_PATH = params$FILE_PATH
FILTER_YEAR = params$FILTER_YEAR
PROCESS_DATA = params$PROCESS_DATA
PROCESS_LINK = params$PROCESS_LINK
PROCESS_OUTPUT = params$PROCESS_OUTPUT

readmePath <- paste(FILE_PATH, "/", "README.txt", sep = "")
readmeData <- read_lines(readmePath, skip = 24)

matches <- tribble( ~ Date, ~ Id)
for (d in readmeData) {
  items = strsplit(d, " - ")[[1]]
  year <- str_split(items[[1]], "-")[[1]][[1]]
  matches <- matches %>% add_row(Date = year, Id = items[[5]])
}

if (FILTER_YEAR != "NONE") {
  filteredMatches <- matches %>% filter(Date == FILTER_YEAR)
} else {
  filteredMatches <- matches
}
files <-
  filteredMatches %>% mutate(path = paste(FILE_PATH, "/", Id, ".json", sep = ""))

nameChanges <- tribble(
  ~ Old, ~ New,
  "Delhi Daredevils", "Delhi Capitals",
  "Rising Pune Supergiant", "Rising Pune Supergiants",
  "Kings XI Punjab", "Punjab Kings"
)

processName <- function (oldName) {
  if (oldName %in% nameChanges$Old) {
    i <- which(oldName == nameChanges$Old)
    nameChanges$New[[i]]
  } else {
    oldName
  }
}

if (PROCESS_DATA) {
  matchData <- tribble(~ FilePath, ~ RunsPerWicket, ~ Year, ~ Team, ~ Inning)
  for (file in files$path) {
    result <- fromJSON(file = file, simplify = TRUE)
    date <- strsplit(result$info$dates[[1]], "-")[[1]][[1]]
    inningNumber <- 1
    for (inning in result$innings) {
      wickets <- c()
      runs <- 0
      prevRuns <- 0
      for (over in inning$overs) {
        for (delivery in over$deliveries) {
          if (length(delivery$wickets) > 0) {
            wickets <- append(wickets, runs - prevRuns)
            prevRuns <- runs
          } else {
            runs <- runs + delivery$runs$total
          }
        }
      }
      matchData <- add_row(matchData, FilePath = file, RunsPerWicket = list(wickets), Year = date, Team = processName(inning$team), Inning = inningNumber)
      inningNumber <- inningNumber + 1
    }
  }
} else {
  initData <- as_tibble(fread(PROCESS_LINK)) %>% mutate(R = RunsPerWicket)
  matchData <- initData %>% 
    mutate(RunsPerWicket = lapply(strsplit(R, split = "\\|"), as.numeric))
}

cleaned <- na.omit(matchData) %>% 
  mutate(SumRuns = sapply(RunsPerWicket, sum)) %>%
  filter(SumRuns > 0)

if (PROCESS_OUTPUT && PROCESS_DATA) {
  fwrite(cleaned, PROCESS_LINK)
}

tData <- tribble(~ Wicket, ~ Value, ~ Year, ~ PercentValue)
for (y in unique(cleaned$Year)) {
  sumP <- 0
  sumL <- 0
  yearFiltered <- cleaned %>% filter(as.numeric(Year) == as.numeric(y))
  yearSum <- mean(yearFiltered$SumRuns)
  ys <- sum(yearFiltered$SumRuns)
  for (i in 1:10) {
    r <- c()
    totals <- c()
    for (ia in 1:length(yearFiltered$RunsPerWicket)) {
      run <- yearFiltered$RunsPerWicket[[ia]]
      if (length(run) >= i) {
        r <- append(r, run[[i]])
        totals <- append(totals, yearFiltered$SumRuns[[i]])
      }
    }
    sumP <- sumP + (mean(r)/yearSum)
    sumL <- sumL + (sum(r)/ys)
    tData <- tData %>% add_row(Wicket = i, Value = mean(r, na.rm = TRUE), Year = y, PercentValue = (sum(r)/ys))
  }
}

teamData <- tribble(~ Wicket, ~ Value, ~ PercentValue, ~ Team)
for (y in unique(cleaned$Team)) {
  sumP <- 0
  sumL <- 0
  teamF <- cleaned %>% filter(Team == y)
  yearSum <- mean(teamF$SumRuns)
  ys <- sum(teamF$SumRuns)
  for (i in 1:10) {
    r <- c()
    totals <- c()
    for (ia in 1:length(teamF$RunsPerWicket)) {
      run <- teamF$RunsPerWicket[[ia]]
      if (length(run) >= i) {
        r <- append(r, run[[i]])
        totals <- append(totals, teamF$SumRuns[[i]])
      }
    }
    sumP <- sumP + (mean(r)/yearSum)
    sumL <- sumL + (sum(r)/ys)
    teamData <- teamData %>% add_row(Wicket = i, Value = mean(r, na.rm = TRUE), Team = y, PercentValue = (sum(r)/ys))
  }
}

# Embed 1
summary(cleaned %>% select(FilePath, Year, Inning, SumRuns))

# Embed 2
meanData <- tData %>% 
  group_by(Wicket) %>% 
  summarise(Wicket = mean(Wicket), Value = mean(Value), PercentValue = mean(PercentValue))
pmean <- ggplot(meanData, aes(x = Wicket, y = PercentValue)) +
  geom_line() +
  geom_point() +
  xlab("Wicket") +
  ylab("% of Total Runs")
ggplotly(pmean)

# Embed 3
runsLengths <- matchData %>% filter(Inning < 3) %>% rowwise() %>% mutate(NumberWickets = length(RunsPerWicket))
prl <- ggplot(runsLengths, aes(x = NumberWickets)) +
  geom_histogram(bins = 10)
ggplotly(prl)

# Embed 4
pteams <- ggplot(teamData, aes(x = Wicket, y = PercentValue, colour = Team)) +
  geom_line() +
  geom_point() +
  xlab("Wicket") +
  ylab("% of Total Runs")
ggplotly(pteams)

# Embed 5
ptimeC <- ggplot(tData, aes(x = Wicket, y = PercentValue, colour = factor(Year))) +
  geom_line() +
  geom_point() +
  xlab("Wicket") +
  ylab("% of Total Runs")
ggplotly(ptimeC)

compileYears <- function (years) {
  a <- tData %>% 
    filter(Year %in% years) %>% 
    group_by(Wicket) %>% 
    summarise(
      AvgVal = mean(PercentValue), 
      YearVal = paste(years[[1]], "to", years[[length(years)]], sep = " ")
    )
  a
}

combinedPlot <- function (year1, year2) {
  comp1 <- compileYears(year1)
  comp2 <- compileYears(year2)
  
  comData <- bind_rows(comp1, comp2)

  pcom <- ggplot(comData, aes(x = Wicket, y = AvgVal, colour = YearVal)) +
    geom_line() +
    geom_point() +
    xlab("Wicket") +
    ylab("% of Total Runs")
  ggplotly(pcom)
}

# Embed 6
combinedPlot(c(2008, 2010, 2011, 2012, 2013, 2014), c(2015, 2016, 2017, 2018, 2019))

# Embed 7
pteams <- ggplot(teamData, aes(x = Wicket, y = PercentValue, colour = Team)) +
  geom_line() +
  geom_point() +
  xlab("Wicket") +
  ylab("% of Total Runs")
ggplotly(pteams)

# Embed 8
order1 <- teamData %>% 
  filter(Wicket == 1 | Wicket == 2 | Wicket == 3) %>%
  group_by(Team) %>%
  summarise(FirstThree = sum(PercentValue) * 100)

order2 <- teamData %>% 
  filter(Wicket == 4 | Wicket == 5 | Wicket == 6) %>%
  group_by(Team) %>%
  summarise(MidThree = sum(PercentValue) * 100) %>%
  select(MidThree)

order3 <- teamData %>% 
  filter(Wicket == 7 | Wicket == 8 | Wicket == 9) %>%
  group_by(Team) %>%
  summarise(LastFour = sum(PercentValue) * 100) %>%
  select(LastFour)

combined <- bind_cols(order1, order2, order3) %>% arrange(desc(FirstThree))

kable(combined)

#END