Wickets

Go Back

This analysis focuses on identifying trends in IPL matches in the fall of wickets and the runs scored in between those wickets.

##    FilePath              Year          Inning         SumRuns     
##  Length:1750        Min.   :2008   Min.   :1.000   Min.   :  1.0  
##  Class :character   1st Qu.:2011   1st Qu.:1.000   1st Qu.:114.0  
##  Mode  :character   Median :2014   Median :1.000   Median :138.0  
##                     Mean   :2014   Mean   :1.513   Mean   :134.3  
##                     3rd Qu.:2018   3rd Qu.:2.000   3rd Qu.:160.0  
##                     Max.   :2021   Max.   :5.000   Max.   :263.0

Time

While the previous graph analyses all of the data, it is interesting to see how these have changed over time.

2.55.07.510.00.00.10.20.3
2008201220162020YearWicket% of Total Runs

This is the same graph as before, but separated by team. The lighter the color of the line, the more recent the calculation. For example, the 2009 performance is the darkest blue. Another version of the same graph with separate colors for individual year analysis has also been embedded below.

2.55.07.510.00.00.10.20.3
factor(Year)20082009201020112012201320142015201620172018201920202021Wicket% of Total Runs

It is also helpful to view statistics of individual seasons. There are some interesting comparisons shown below.

If the first 7 seasons of IPL are compared to the next 7 seasons of IPL, there is a clear difference in the runs scored. The most significant differences are in the second and third wickets. Together, the first two wickets after 2014 scored 4% more than before 2014. Gradually, openers are scoring more and more runs as compared to the mid-order.

2.55.07.510.00.000.050.100.150.20
YearVal2008 to 20142015 to 2021Wicket% of Total Runs

The first openers in 2021 scored almost 10% more than the openers in 2009. It is notable that IPL 2009 was played in South Africa, while IPL 2020 and IPL 2021 were held in the UAE. This may have contributed to the severe differences.

When considering seasons only held in India (all except 2009, 2021, and 2020), with the same logic as in the previous graph, there is still a difference in the order, once again, with the mid-order being stronger by a few percentage points than the top order in seasons before 2015.

2.55.07.510.00.000.050.100.150.20
YearVal2008 to 20142015 to 2019Wicket% of Total Runs

Teams

The graph below shows the distribution of runs for each team. The table shows the sum of the first 3 wickets, next 3, and final 4 wickets. From it, it is clear as to which teams rely most on their openers and which rely more on the mid-order.

2.55.07.510.00.00.10.20.3
TeamChennai Super KingsDeccan ChargersDelhi CapitalsGujarat LionsKochi Tuskers KeralaKolkata Knight RidersMumbai IndiansPune WarriorsPunjab KingsRajasthan RoyalsRising Pune SupergiantsRoyal Challengers BangaloreSunrisers HyderabadWicket% of Total Runs
Team FirstThree MidThree LastFour
Chennai Super Kings 67.73494 27.42740 4.635938
Rising Pune Supergiants 67.65579 29.87141 2.448071
Sunrisers Hyderabad 67.21706 27.07490 5.434664
Punjab Kings 64.63445 27.44905 7.469464
Royal Challengers Bangalore 62.62457 29.87506 7.121077
Rajasthan Royals 62.58892 30.04643 6.887592
Delhi Capitals 60.71671 32.33790 6.727227
Kochi Tuskers Kerala 60.68477 28.80756 10.448642
Kolkata Knight Riders 60.67838 31.85087 7.231028
Gujarat Lions 59.76492 32.68535 7.527125
Mumbai Indians 59.43563 33.75729 6.668023
Deccan Chargers 58.77142 34.39245 6.797612
Pune Warriors 54.78576 37.18037 7.930200

Among the major teams, CSK have relied on their first four batsmen for the vast majority of their runs, on par with SRH at about 67%. RCB, RR, and PBKS follow with between 62% and 64%. DC, KKR, and MI rely more on their mid order, with all of them having <61% of the runs coming from the top order.

The mid order shows more of the same, but reveals some more interesting details: DC despite being in the middle of the pack relies a lot more on the mid order than other teams. Similarly, MI and KKR also depend on the mid order for a third of their runs. CSK relies the least on the mid order, followed by SRH and PBKS.

CSK is also the only team with less than 5% of total runs scored coming from the last 4 batsmen. RCB, PBKS, KKR all have ~7% of their runs coming in from the last 4 batsmen, which may reflect a batting order that can rapidly collapse, but may also reflect good batting from bowlers in their teams.

Notes

  • The data for this exploration is obtained from CricSheet.org.

  • All data has been processed and plotted with R using CricSheet data. The script has been embedded below.

View Script

library(ggplot2)
library(dplyr)
library(plotly)
library(knitr)
library(rjson)
library(tidyverse)
library(readr)
library(data.table)

FILE_PATH = params$FILE_PATH
FILTER_YEAR = params$FILTER_YEAR
PROCESS_DATA = params$PROCESS_DATA
PROCESS_LINK = params$PROCESS_LINK
PROCESS_OUTPUT = params$PROCESS_OUTPUT

readmePath <- paste(FILE_PATH, "/", "README.txt", sep = "")
readmeData <- read_lines(readmePath, skip = 24)

matches <- tribble( ~ Date, ~ Id)
for (d in readmeData) {
  items = strsplit(d, " - ")[[1]]
  year <- str_split(items[[1]], "-")[[1]][[1]]
  matches <- matches %>% add_row(Date = year, Id = items[[5]])
}

if (FILTER_YEAR != "NONE") {
  filteredMatches <- matches %>% filter(Date == FILTER_YEAR)
} else {
  filteredMatches <- matches
}
files <-
  filteredMatches %>% mutate(path = paste(FILE_PATH, "/", Id, ".json", sep = ""))

nameChanges <- tribble(
  ~ Old, ~ New,
  "Delhi Daredevils", "Delhi Capitals",
  "Rising Pune Supergiant", "Rising Pune Supergiants",
  "Kings XI Punjab", "Punjab Kings"
)

processName <- function (oldName) {
  if (oldName %in% nameChanges$Old) {
    i <- which(oldName == nameChanges$Old)
    nameChanges$New[[i]]
  } else {
    oldName
  }
}

if (PROCESS_DATA) {
  matchData <- tribble(~ FilePath, ~ RunsPerWicket, ~ Year, ~ Team, ~ Inning)
  for (file in files$path) {
    result <- fromJSON(file = file, simplify = TRUE)
    date <- strsplit(result$info$dates[[1]], "-")[[1]][[1]]
    inningNumber <- 1
    for (inning in result$innings) {
      wickets <- c()
      runs <- 0
      prevRuns <- 0
      for (over in inning$overs) {
        for (delivery in over$deliveries) {
          if (length(delivery$wickets) > 0) {
            wickets <- append(wickets, runs - prevRuns)
            prevRuns <- runs
          } else {
            runs <- runs + delivery$runs$total
          }
        }
      }
      matchData <- add_row(matchData, FilePath = file, RunsPerWicket = list(wickets), Year = date, Team = processName(inning$team), Inning = inningNumber)
      inningNumber <- inningNumber + 1
    }
  }
} else {
  initData <- as_tibble(fread(PROCESS_LINK)) %>% mutate(R = RunsPerWicket)
  matchData <- initData %>% 
    mutate(RunsPerWicket = lapply(strsplit(R, split = "\\|"), as.numeric))
}

cleaned <- na.omit(matchData) %>% 
  mutate(SumRuns = sapply(RunsPerWicket, sum)) %>%
  filter(SumRuns > 0)

if (PROCESS_OUTPUT && PROCESS_DATA) {
  fwrite(cleaned, PROCESS_LINK)
}

tData <- tribble(~ Wicket, ~ Value, ~ Year, ~ PercentValue)
for (y in unique(cleaned$Year)) {
  sumP <- 0
  sumL <- 0
  yearFiltered <- cleaned %>% filter(as.numeric(Year) == as.numeric(y))
  yearSum <- mean(yearFiltered$SumRuns)
  ys <- sum(yearFiltered$SumRuns)
  for (i in 1:10) {
    r <- c()
    totals <- c()
    for (ia in 1:length(yearFiltered$RunsPerWicket)) {
      run <- yearFiltered$RunsPerWicket[[ia]]
      if (length(run) >= i) {
        r <- append(r, run[[i]])
        totals <- append(totals, yearFiltered$SumRuns[[i]])
      }
    }
    sumP <- sumP + (mean(r)/yearSum)
    sumL <- sumL + (sum(r)/ys)
    tData <- tData %>% add_row(Wicket = i, Value = mean(r, na.rm = TRUE), Year = y, PercentValue = (sum(r)/ys))
  }
}

teamData <- tribble(~ Wicket, ~ Value, ~ PercentValue, ~ Team)
for (y in unique(cleaned$Team)) {
  sumP <- 0
  sumL <- 0
  teamF <- cleaned %>% filter(Team == y)
  yearSum <- mean(teamF$SumRuns)
  ys <- sum(teamF$SumRuns)
  for (i in 1:10) {
    r <- c()
    totals <- c()
    for (ia in 1:length(teamF$RunsPerWicket)) {
      run <- teamF$RunsPerWicket[[ia]]
      if (length(run) >= i) {
        r <- append(r, run[[i]])
        totals <- append(totals, teamF$SumRuns[[i]])
      }
    }
    sumP <- sumP + (mean(r)/yearSum)
    sumL <- sumL + (sum(r)/ys)
    teamData <- teamData %>% add_row(Wicket = i, Value = mean(r, na.rm = TRUE), Team = y, PercentValue = (sum(r)/ys))
  }
}

# Embed 1
summary(cleaned %>% select(FilePath, Year, Inning, SumRuns))

# Embed 2
meanData <- tData %>% 
  group_by(Wicket) %>% 
  summarise(Wicket = mean(Wicket), Value = mean(Value), PercentValue = mean(PercentValue))
pmean <- ggplot(meanData, aes(x = Wicket, y = PercentValue)) +
  geom_line() +
  geom_point() +
  xlab("Wicket") +
  ylab("% of Total Runs")
ggplotly(pmean)

# Embed 3
runsLengths <- matchData %>% filter(Inning < 3) %>% rowwise() %>% mutate(NumberWickets = length(RunsPerWicket))
prl <- ggplot(runsLengths, aes(x = NumberWickets)) +
  geom_histogram(bins = 10)
ggplotly(prl)

# Embed 4
pteams <- ggplot(teamData, aes(x = Wicket, y = PercentValue, colour = Team)) +
  geom_line() +
  geom_point() +
  xlab("Wicket") +
  ylab("% of Total Runs")
ggplotly(pteams)

# Embed 5
ptimeC <- ggplot(tData, aes(x = Wicket, y = PercentValue, colour = factor(Year))) +
  geom_line() +
  geom_point() +
  xlab("Wicket") +
  ylab("% of Total Runs")
ggplotly(ptimeC)

compileYears <- function (years) {
  a <- tData %>% 
    filter(Year %in% years) %>% 
    group_by(Wicket) %>% 
    summarise(
      AvgVal = mean(PercentValue), 
      YearVal = paste(years[[1]], "to", years[[length(years)]], sep = " ")
    )
  a
}

combinedPlot <- function (year1, year2) {
  comp1 <- compileYears(year1)
  comp2 <- compileYears(year2)
  
  comData <- bind_rows(comp1, comp2)

  pcom <- ggplot(comData, aes(x = Wicket, y = AvgVal, colour = YearVal)) +
    geom_line() +
    geom_point() +
    xlab("Wicket") +
    ylab("% of Total Runs")
  ggplotly(pcom)
}

# Embed 6
combinedPlot(c(2008, 2010, 2011, 2012, 2013, 2014), c(2015, 2016, 2017, 2018, 2019))

# Embed 7
pteams <- ggplot(teamData, aes(x = Wicket, y = PercentValue, colour = Team)) +
  geom_line() +
  geom_point() +
  xlab("Wicket") +
  ylab("% of Total Runs")
ggplotly(pteams)

# Embed 8
order1 <- teamData %>% 
  filter(Wicket == 1 | Wicket == 2 | Wicket == 3) %>%
  group_by(Team) %>%
  summarise(FirstThree = sum(PercentValue) * 100)

order2 <- teamData %>% 
  filter(Wicket == 4 | Wicket == 5 | Wicket == 6) %>%
  group_by(Team) %>%
  summarise(MidThree = sum(PercentValue) * 100) %>%
  select(MidThree)

order3 <- teamData %>% 
  filter(Wicket == 7 | Wicket == 8 | Wicket == 9) %>%
  group_by(Team) %>%
  summarise(LastFour = sum(PercentValue) * 100) %>%
  select(LastFour)

combined <- bind_cols(order1, order2, order3) %>% arrange(desc(FirstThree))

kable(combined)

#END