Multiplicity

Author

Marcio Diniz | Michael Luu

Published

November 10, 2022

Libraries

First, let’s load the libraries to organize our dataset.

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.8     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.1
✔ readr   2.1.2     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(janitor)


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

Data

This dataset consists of expression level of 77 proteins of 78 mice divided among classes resulting from the combination of the following conditions:

Genotype: control (c) or trysomy (s)
Treatment: memantine (m) or saline (s)
Behavior: context-shock (CS) or shock-context (SC)

The complete dataset was generated and analysed in:

Ahmed MM, Dhanasekaran AR, Block A, Tong S, Costa ACS, Stasko M, et al. (2015) Protein Dynamics Associated with Failed and Rescued Learning in the Ts65Dn Mouse Model of Down Syndrome. PLoS ONE 10(3): e0119491.

The data that will be used in this class is a subset originally analysed:

Higuera C, Gardiner KJ, Cios KJ (2015) Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome. PLoS ONE 10(6): e0129126. [Web Link] journal.pone.0129126

The experimental design is showed below:

We will focus only on two classes: “t-CS-s”, “t-CS-m”. Moreover, each protein is measured in triplicates. There are specific statistical methods to handle triplicates that are out of the scope in this course. Therefore, we will adopt a very common and simple strategy used by basic scientists: average over the triplicates.

dataset <- read_csv('data/cortex_nuclear.csv') %>% clean_names()

Rows: 1080 Columns: 82
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): MouseID, Genotype, Treatment, Behavior, class
dbl (77): DYRK1A_N, ITSN1_N, BDNF_N, NR1_N, NR2A_N, pAKT_N, pBRAF_N, pCAMKII...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(dataset)

Rows: 1,080
Columns: 82
$ mouse_id         <chr> "309_1", "309_2", "309_3", "309_4", "309_5", "309_6",…
$ dyrk1a_n         <dbl> 0.5036439, 0.5146171, 0.5091831, 0.4421067, 0.4349402…
$ itsn1_n          <dbl> 0.7471932, 0.6890635, 0.7302468, 0.6170762, 0.6174298…
$ bdnf_n           <dbl> 0.4301753, 0.4117703, 0.4183088, 0.3586263, 0.3588022…
$ nr1_n            <dbl> 2.816329, 2.789514, 2.687201, 2.466947, 2.365785, 2.3…
$ nr2a_n           <dbl> 5.990152, 5.685038, 5.622059, 4.979503, 4.718679, 4.8…
$ p_akt_n          <dbl> 0.2188300, 0.2116362, 0.2090109, 0.2228858, 0.2131059…
$ p_braf_n         <dbl> 0.1775655, 0.1728170, 0.1757222, 0.1764626, 0.1736270…
$ p_camkii_n       <dbl> 2.373744, 2.292150, 2.283337, 2.152301, 2.134014, 2.1…
$ p_creb_n         <dbl> 0.2322238, 0.2269721, 0.2302468, 0.2070042, 0.1921579…
$ p_elk_n          <dbl> 1.750936, 1.596377, 1.561316, 1.595086, 1.504230, 1.4…
$ p_erk_n          <dbl> 0.6879062, 0.6950062, 0.6773484, 0.5832768, 0.5509601…
$ p_jnk_n          <dbl> 0.3063817, 0.2990511, 0.2912761, 0.2967287, 0.2869612…
$ pkca_n           <dbl> 0.4026984, 0.3859868, 0.3810025, 0.3770870, 0.3635021…
$ p_mek_n          <dbl> 0.2969273, 0.2813189, 0.2817103, 0.3138320, 0.2779643…
$ p_nr1_n          <dbl> 1.0220603, 0.9566759, 1.0036350, 0.8753903, 0.8649120…
$ p_nr2a_n         <dbl> 0.6056726, 0.5875587, 0.6024488, 0.5202932, 0.5079898…
$ p_nr2b_n         <dbl> 1.877684, 1.725774, 1.731873, 1.566852, 1.480059, 1.5…
$ p_pkcab_n        <dbl> 2.308745, 2.043037, 2.017984, 2.132754, 2.013697, 1.9…
$ p_rsk_n          <dbl> 0.4415994, 0.4452219, 0.4676679, 0.4776707, 0.4834161…
$ akt_n            <dbl> 0.8593658, 0.8346593, 0.8143294, 0.7277046, 0.6877937…
$ braf_n           <dbl> 0.4162891, 0.4003642, 0.3998469, 0.3856387, 0.3675305…
$ camkii_n         <dbl> 0.3696080, 0.3561775, 0.3680888, 0.3629700, 0.3553109…
$ creb_n           <dbl> 0.1789443, 0.1736797, 0.1739047, 0.1794489, 0.1748355…
$ elk_n            <dbl> 1.8663581, 1.7610467, 1.7655443, 1.2862766, 1.3246945…
$ erk_n            <dbl> 3.685247, 3.485287, 3.571456, 2.970137, 2.896334, 2.9…
$ gsk3b_n          <dbl> 1.537227, 1.509249, 1.501244, 1.419710, 1.359876, 1.4…
$ jnk_n            <dbl> 0.2645263, 0.2557270, 0.2596135, 0.2595358, 0.2507050…
$ mek_n            <dbl> 0.3196770, 0.3044187, 0.3117467, 0.2792181, 0.2736672…
$ trka_n           <dbl> 0.8138665, 0.7805042, 0.7851540, 0.7344917, 0.7026991…
$ rsk_n            <dbl> 0.1658460, 0.1571935, 0.1608954, 0.1622099, 0.1548274…
$ app_n            <dbl> 0.4539098, 0.4309403, 0.4231873, 0.4106149, 0.3985498…
$ bcatenin_n       <dbl> 3.037621, 2.921882, 2.944136, 2.500204, 2.456560, 2.4…
$ sod1_n           <dbl> 0.3695096, 0.3422793, 0.3436962, 0.3445093, 0.3291258…
$ mtor_n           <dbl> 0.4585385, 0.4235599, 0.4250048, 0.4292113, 0.4087552…
$ p38_n            <dbl> 0.3353358, 0.3248347, 0.3248517, 0.3301208, 0.3134148…
$ p_mtor_n         <dbl> 0.8251920, 0.7617176, 0.7570308, 0.7469798, 0.6919565…
$ dscr1_n          <dbl> 0.5769155, 0.5450973, 0.5436197, 0.5467626, 0.5368605…
$ ampka_n          <dbl> 0.4480993, 0.4208761, 0.4046298, 0.3868603, 0.3608164…
$ nr2b_n           <dbl> 0.5862714, 0.5450973, 0.5529941, 0.5478485, 0.5128240…
$ p_numb_n         <dbl> 0.3947213, 0.3682546, 0.3638799, 0.3667707, 0.3515510…
$ raptor_n         <dbl> 0.3395706, 0.3219592, 0.3130859, 0.3284919, 0.3122063…
$ tiam1_n          <dbl> 0.4828639, 0.4545193, 0.4471972, 0.4426497, 0.4190949…
$ p_p70s6_n        <dbl> 0.2941698, 0.2764306, 0.2566482, 0.3985340, 0.3934470…
$ numb_n           <dbl> 0.1821505, 0.1820863, 0.1843877, 0.1617677, 0.1602002…
$ p70s6_n          <dbl> 0.8427252, 0.8476146, 0.8561658, 0.7602335, 0.7681129…
$ p_gsk3b_n        <dbl> 0.1926084, 0.1948153, 0.2007373, 0.1841694, 0.1857183…
$ p_pkcg_n         <dbl> 1.4430907, 1.4394598, 1.5243642, 1.6123821, 1.6458073…
$ cdk5_n           <dbl> 0.2947000, 0.2940598, 0.3018807, 0.2963818, 0.2968294…
$ s6_n             <dbl> 0.3546045, 0.3545483, 0.3860868, 0.2906795, 0.3093450…
$ adarb1_n         <dbl> 1.3390700, 1.3063231, 1.2796003, 1.1987645, 1.2069949…
$ acetyl_h3k9_n    <dbl> 0.17011879, 0.17142709, 0.18545629, 0.15979906, 0.164…
$ rrp1_n           <dbl> 0.1591024, 0.1581289, 0.1486963, 0.1661123, 0.1606870…
$ bax_n            <dbl> 0.1888517, 0.1845700, 0.1905322, 0.1853235, 0.1882214…
$ arc_n            <dbl> 0.10630521, 0.10659216, 0.10830306, 0.10318376, 0.104…
$ erbb4_n          <dbl> 0.1449893, 0.1504709, 0.1453302, 0.1406558, 0.1419830…
$ n_nos_n          <dbl> 0.1766677, 0.1783090, 0.1762129, 0.1638042, 0.1677096…
$ tau_n            <dbl> 0.1251904, 0.1342751, 0.1325604, 0.1232096, 0.1368377…
$ gfap_n           <dbl> 0.11529089, 0.11823450, 0.11776021, 0.11743941, 0.116…
$ glu_r3_n         <dbl> 0.2280435, 0.2380731, 0.2448173, 0.2349467, 0.2555277…
$ glu_r4_n         <dbl> 0.1427556, 0.1420366, 0.1424450, 0.1450682, 0.1408705…
$ il1b_n           <dbl> 0.4309575, 0.4571562, 0.5104723, 0.4309959, 0.4812265…
$ p3525_n          <dbl> 0.2475378, 0.2576322, 0.2553430, 0.2511031, 0.2517730…
$ p_casp9_n        <dbl> 1.603310, 1.671738, 1.663550, 1.484624, 1.534835, 1.5…
$ psd95_n          <dbl> 2.014875, 2.004605, 2.016831, 1.957233, 2.009109, 2.0…
$ snca_n           <dbl> 0.1082343, 0.1097485, 0.1081962, 0.1198832, 0.1195244…
$ ubiquitin_n      <dbl> 1.0449792, 1.0098831, 0.9968476, 0.9902247, 0.9977750…
$ p_gsk3b_tyr216_n <dbl> 0.8315565, 0.8492704, 0.8467087, 0.8332768, 0.8786678…
$ shh_n            <dbl> 0.1888517, 0.2004036, 0.1936845, 0.1921119, 0.2056042…
$ bad_n            <dbl> 0.1226520, 0.1166822, 0.1185082, 0.1327812, 0.1299541…
$ bcl2_n           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ p_s6_n           <dbl> 0.10630521, 0.10659216, 0.10830306, 0.10318376, 0.104…
$ p_cfos_n         <dbl> 0.1083359, 0.1043154, 0.1062193, 0.1112620, 0.1106939…
$ syp_n            <dbl> 0.4270992, 0.4415813, 0.4357769, 0.3916910, 0.4341538…
$ h3ac_k18_n       <dbl> 0.1147832, 0.1119735, 0.1118829, 0.1304053, 0.1184814…
$ egr1_n           <dbl> 0.1317900, 0.1351030, 0.1333618, 0.1474442, 0.1403143…
$ h3me_k4_n        <dbl> 0.1281856, 0.1311187, 0.1274311, 0.1469011, 0.1483799…
$ ca_na_n          <dbl> 1.675652, 1.743610, 1.926427, 1.700563, 1.839730, 1.8…
$ genotype         <chr> "Control", "Control", "Control", "Control", "Control"…
$ treatment        <chr> "Memantine", "Memantine", "Memantine", "Memantine", "…
$ behavior         <chr> "C/S", "C/S", "C/S", "C/S", "C/S", "C/S", "C/S", "C/S…
$ class            <chr> "c-CS-m", "c-CS-m", "c-CS-m", "c-CS-m", "c-CS-m", "c-…

dt <- dataset %>% 
  separate(col = mouse_id, 
           into = c("mouse_id", "replicate"), sep = "_") %>%
  group_by(mouse_id, genotype, treatment, behavior, class) %>%
  summarize(across(where(is.numeric), mean, na.rm = TRUE)) %>%
  arrange(., genotype, treatment, behavior, class) %>%
  ungroup() %>%
  filter(class == "t-CS-m" | class == "t-CS-s") %>%
  mutate(class = factor(class, levels = c("t-CS-s", "t-CS-m")
                        )
         )

`summarise()` has grouped output by 'mouse_id', 'genotype', 'treatment',
'behavior'. You can override using the `.groups` argument.

Comparing two groups

t.test(dyrk1a_n ~ class, data = dt)


    Welch Two Sample t-test

data:  dyrk1a_n by class
t = -1.4507, df = 12.781, p-value = 0.171
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
 -0.23312987  0.04601195
sample estimates:
mean in group t-CS-s mean in group t-CS-m 
           0.5257350            0.6192939

t.test(itsn1_n ~ class, data = dt)


    Welch Two Sample t-test

data:  itsn1_n by class
t = -0.56208, df = 13.892, p-value = 0.583
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
 -0.1804562  0.1055562
sample estimates:
mean in group t-CS-s mean in group t-CS-m 
           0.7595565            0.7970065

t.test(bdnf_n ~ class, data = dt)


    Welch Two Sample t-test

data:  bdnf_n by class
t = -0.41928, df = 13.132, p-value = 0.6818
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
 -0.04470177  0.03015822
sample estimates:
mean in group t-CS-s mean in group t-CS-m 
           0.3054604            0.3127322

Hovewer, it does not seem a very productive process because there are 77 proteins, and we will need to check the p-value of each test individually. Moreover, we did not even check the assumptions of the t-test.

Lists

Lists are a class of objects in R that allow us to store other objects of different classes with different dimensions. Data frames are also a particular type of lists because allow us to store character, numeric, factor, logical vectors of the same length resulting in a rectangular shape object.

Many functions in R return lists as output that can be stored, even though their class is not explictly a list.

my.test <- t.test(dyrk1a_n ~ class, data = dt)
class(my.test)

[1] "htest"

The function ls help us to find out the objects that are contained in a list. We can access these objects using $ similar to a data frame. These objects are often pieces presented in the visual output of the function.

ls(my.test)

 [1] "alternative" "conf.int"    "data.name"   "estimate"    "method"     
 [6] "null.value"  "p.value"     "parameter"   "statistic"   "stderr"

my.test$conf.int

[1] -0.23312987  0.04601195
attr(,"conf.level")
[1] 0.95

my.test$p.value

[1] 0.1709624

my.test$method

[1] "Welch Two Sample t-test"

The ability to access individually objects for a procedure allow us to store only the information that we really need to perform our analysis.

my.pvalue <- my.test$p.value
my.pvalue

[1] 0.1709624

my.ci <- my.test$conf.int
my.ci

[1] -0.23312987  0.04601195
attr(,"conf.level")
[1] 0.95

my.method <- my.test$method
my.method

[1] "Welch Two Sample t-test"

Now, we can create a data frame that will contain such objects.

results <- data.frame(protein = NA, method = NA, difference = NA,
                  lower.ci = NA, upper.ci = NA, p.value = NA)

In this way, we can store only the objects that are useful for us.

my.test <- t.test(dyrk1a_n ~ class, data = dt)
results[1, 'protein'] <- "dyrk1a_n"
results[1, 'method'] <- my.test$method
results[1, 'fold'] <- my.test$estimate[2]/my.test$estimate[1]
results[1, 'lower.ci'] <- my.test$conf.int[1]
results[1, 'upper.ci'] <- my.test$conf.int[2]
results[1, 'p.value'] <- my.test$p.value

my.test <- t.test(itsn1_n ~ class, data = dt)
results[2, 'protein'] <- "itsn1_n"
results[2, 'method'] <- my.test$method
results[2, 'fold'] <- my.test$estimate[2]/my.test$estimate[1]
results[2, 'lower.ci'] <- my.test$conf.int[1]
results[2, 'upper.ci'] <- my.test$conf.int[2]
results[2, 'p.value'] <- my.test$p.value

my.test <- t.test(bdnf_n ~ class, data = dt)
results[3, 'protein'] <- "itsn1_n"
results[3, 'method'] <- my.test$method
results[3, 'fold'] <- my.test$estimate[2]/my.test$estimate[1]
results[3, 'lower.ci'] <- my.test$conf.int[1]
results[3, 'upper.ci'] <- my.test$conf.int[2]
results[3, 'p.value'] <- my.test$p.value

results

   protein                  method difference    lower.ci   upper.ci   p.value
1 dyrk1a_n Welch Two Sample t-test         NA -0.23312987 0.04601195 0.1709624
2  itsn1_n Welch Two Sample t-test         NA -0.18045624 0.10555618 0.5830175
3  itsn1_n Welch Two Sample t-test         NA -0.04470177 0.03015822 0.6817885
      fold
1 1.177958
2 1.049305
3 1.023806

Even though we do not need to check each output individually, we still need to code our testing procedure for each protein. The procedure is the same for all proteins, i.e., we have a recipe for all of them.

How can we simplify this procedure?

For loops

When we have repeated tasks, we can use for loop to execute a recipe several times.

for (i in 1:10){
  i
  print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

The for loop is controlled by a counter that will help us to run our analysis for all the proteins.

Therefore, we need to choose the limits of our counter:

names(dt)

 [1] "mouse_id"         "genotype"         "treatment"        "behavior"        
 [5] "class"            "dyrk1a_n"         "itsn1_n"          "bdnf_n"          
 [9] "nr1_n"            "nr2a_n"           "p_akt_n"          "p_braf_n"        
[13] "p_camkii_n"       "p_creb_n"         "p_elk_n"          "p_erk_n"         
[17] "p_jnk_n"          "pkca_n"           "p_mek_n"          "p_nr1_n"         
[21] "p_nr2a_n"         "p_nr2b_n"         "p_pkcab_n"        "p_rsk_n"         
[25] "akt_n"            "braf_n"           "camkii_n"         "creb_n"          
[29] "elk_n"            "erk_n"            "gsk3b_n"          "jnk_n"           
[33] "mek_n"            "trka_n"           "rsk_n"            "app_n"           
[37] "bcatenin_n"       "sod1_n"           "mtor_n"           "p38_n"           
[41] "p_mtor_n"         "dscr1_n"          "ampka_n"          "nr2b_n"          
[45] "p_numb_n"         "raptor_n"         "tiam1_n"          "p_p70s6_n"       
[49] "numb_n"           "p70s6_n"          "p_gsk3b_n"        "p_pkcg_n"        
[53] "cdk5_n"           "s6_n"             "adarb1_n"         "acetyl_h3k9_n"   
[57] "rrp1_n"           "bax_n"            "arc_n"            "erbb4_n"         
[61] "n_nos_n"          "tau_n"            "gfap_n"           "glu_r3_n"        
[65] "glu_r4_n"         "il1b_n"           "p3525_n"          "p_casp9_n"       
[69] "psd95_n"          "snca_n"           "ubiquitin_n"      "p_gsk3b_tyr216_n"
[73] "shh_n"            "bad_n"            "bcl2_n"           "p_s6_n"          
[77] "p_cfos_n"         "syp_n"            "h3ac_k18_n"       "egr1_n"          
[81] "h3me_k4_n"        "ca_na_n"

ncol(dt)

[1] 82

We found that our data.frame r, d has 82 columns and proteins are stored starting at column 6. Let’s double check it:

protein_names <- names(dt)[6:82]

The second step is to incorporate the counter in our test procedure:

# One way
t.test(dyrk1a_n ~ class, data = dt)


    Welch Two Sample t-test

data:  dyrk1a_n by class
t = -1.4507, df = 12.781, p-value = 0.171
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
 -0.23312987  0.04601195
sample estimates:
mean in group t-CS-s mean in group t-CS-m 
           0.5257350            0.6192939

t.test(get(protein_names[1]) ~ class, data = dt)


    Welch Two Sample t-test

data:  get(protein_names[1]) by class
t = -1.4507, df = 12.781, p-value = 0.171
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
 -0.23312987  0.04601195
sample estimates:
mean in group t-CS-s mean in group t-CS-m 
           0.5257350            0.6192939

Finally, we can obtain t-test result for all proteins:

for (i in 1:length(protein_names)){
  my.test <- t.test(get(protein_names[i]) ~ class, data = dt)
  results[i, 'protein'] <- names(dt)[i]
  results[i, 'method'] <- my.test$method
  results[i, 'fold'] <- my.test$estimate[2]/my.test$estimate[1]
  results[i, 'lower.ci'] <- my.test$conf.int[1]
  results[i, 'upper.ci'] <- my.test$conf.int[2]
  results[i, 'p.value'] <- my.test$p.value
}

results

            protein                  method difference      lower.ci
1          mouse_id Welch Two Sample t-test         NA -0.2331298679
2          genotype Welch Two Sample t-test         NA -0.1804562433
3         treatment Welch Two Sample t-test         NA -0.0447017691
4          behavior Welch Two Sample t-test         NA -0.2705101262
5             class Welch Two Sample t-test         NA -0.5979478852
6          dyrk1a_n Welch Two Sample t-test         NA -0.0346190608
7           itsn1_n Welch Two Sample t-test         NA -0.0269768286
8            bdnf_n Welch Two Sample t-test         NA -1.9282203775
9             nr1_n Welch Two Sample t-test         NA -0.0155205246
10           nr2a_n Welch Two Sample t-test         NA -0.2963585134
11          p_akt_n Welch Two Sample t-test         NA -0.3320380355
12         p_braf_n Welch Two Sample t-test         NA -0.0273300908
13       p_camkii_n Welch Two Sample t-test         NA -0.0739802326
14         p_creb_n Welch Two Sample t-test         NA -0.0392067366
15          p_elk_n Welch Two Sample t-test         NA -0.0860347851
16          p_erk_n Welch Two Sample t-test         NA -0.0667282489
17          p_jnk_n Welch Two Sample t-test         NA -0.0844160803
18           pkca_n Welch Two Sample t-test         NA -0.5138528237
19          p_mek_n Welch Two Sample t-test         NA -0.1222570633
20          p_nr1_n Welch Two Sample t-test         NA -0.0067605346
21         p_nr2a_n Welch Two Sample t-test         NA -0.2416263658
22         p_nr2b_n Welch Two Sample t-test         NA -0.0173320972
23        p_pkcab_n Welch Two Sample t-test         NA -0.0307405828
24          p_rsk_n Welch Two Sample t-test         NA -0.0155502188
25            akt_n Welch Two Sample t-test         NA -0.0432998824
26           braf_n Welch Two Sample t-test         NA -0.2497711413
27         camkii_n Welch Two Sample t-test         NA -0.0362414598
28           creb_n Welch Two Sample t-test         NA -0.0326955749
29            elk_n Welch Two Sample t-test         NA -0.0905849541
30            erk_n Welch Two Sample t-test         NA -0.0252604320
31          gsk3b_n Welch Two Sample t-test         NA -0.0322959168
32            jnk_n Welch Two Sample t-test         NA -0.3143189237
33            mek_n Welch Two Sample t-test         NA -0.0209624060
34           trka_n Welch Two Sample t-test         NA -0.0659138759
35            rsk_n Welch Two Sample t-test         NA -0.0772767206
36            app_n Welch Two Sample t-test         NA -0.1823801622
37       bcatenin_n Welch Two Sample t-test         NA -0.1446787837
38           sod1_n Welch Two Sample t-test         NA -0.0309708918
39           mtor_n Welch Two Sample t-test         NA -0.1302304291
40            p38_n Welch Two Sample t-test         NA -0.0178488114
41         p_mtor_n Welch Two Sample t-test         NA -0.0362763677
42          dscr1_n Welch Two Sample t-test         NA -0.0205825562
43          ampka_n Welch Two Sample t-test         NA -0.2825885243
44           nr2b_n Welch Two Sample t-test         NA  0.0157199341
45         p_numb_n Welch Two Sample t-test         NA -0.0134353480
46         raptor_n Welch Two Sample t-test         NA -0.0106127303
47          tiam1_n Welch Two Sample t-test         NA -0.7380428178
48        p_p70s6_n Welch Two Sample t-test         NA -0.0052828003
49           numb_n Welch Two Sample t-test         NA  0.0290637423
50          p70s6_n Welch Two Sample t-test         NA -0.1243853162
51        p_gsk3b_n Welch Two Sample t-test         NA -0.1107094505
52         p_pkcg_n Welch Two Sample t-test         NA -0.0065125486
53           cdk5_n Welch Two Sample t-test         NA -0.0045686958
54             s6_n Welch Two Sample t-test         NA -0.0025348467
55         adarb1_n Welch Two Sample t-test         NA -0.0025563687
56    acetyl_h3k9_n Welch Two Sample t-test         NA -0.0389455864
57           rrp1_n Welch Two Sample t-test         NA  0.0042160320
58            bax_n Welch Two Sample t-test         NA -0.0016815711
59            arc_n Welch Two Sample t-test         NA -0.0546651271
60          erbb4_n Welch Two Sample t-test         NA -0.0203556283
61          n_nos_n Welch Two Sample t-test         NA -0.0644190458
62            tau_n Welch Two Sample t-test         NA -0.0064086797
63           gfap_n Welch Two Sample t-test         NA -0.1796378789
64         glu_r3_n Welch Two Sample t-test         NA -0.2462807431
65         glu_r4_n Welch Two Sample t-test         NA  0.0001674341
66           il1b_n Welch Two Sample t-test         NA -0.0033679507
67          p3525_n Welch Two Sample t-test         NA -0.1223429871
68        p_casp9_n Welch Two Sample t-test         NA -0.0149151957
69          psd95_n Welch Two Sample t-test         NA -0.0280898396
70           snca_n Welch Two Sample t-test         NA -0.0320131410
71      ubiquitin_n Welch Two Sample t-test         NA -0.0025348467
72 p_gsk3b_tyr216_n Welch Two Sample t-test         NA -0.0210010435
73            shh_n Welch Two Sample t-test         NA -0.0555990810
74            bad_n Welch Two Sample t-test         NA -0.0413025934
75           bcl2_n Welch Two Sample t-test         NA -0.0285559844
76           p_s6_n Welch Two Sample t-test         NA -0.0326247170
77         p_cfos_n Welch Two Sample t-test         NA -0.3071693896
        upper.ci    p.value      fold
1   0.0460119490 0.17096237 1.1779584
2   0.1055561832 0.58301753 1.0493051
3   0.0301582199 0.68178851 1.0238059
4   0.2466411290 0.92202879 1.0054630
5   0.4957053361 0.84083031 1.0145444
6   0.0363075174 0.95878461 0.9960636
7   0.0086549884 0.28648605 1.0555899
8   0.6624227414 0.30820054 1.2542884
9   0.0288115052 0.52768963 0.9683609
10  0.2051517151 0.69868932 1.0300358
11  0.1221255918 0.32787788 1.1509789
12  0.0735824390 0.33884331 0.9230022
13  0.0529665091 0.72784487 1.0339358
14  0.0392265539 0.99956602 0.9999606
15  0.0721719723 0.85226744 1.0090652
16  0.2318006574 0.25519944 0.8718995
17  0.2273953591 0.34195169 0.9529007
18  0.6530769253 0.80175541 0.9604331
19  0.0360545512 0.26236721 1.1031920
20  0.1176030625 0.07661619 0.9133774
21  0.0392296120 0.14140883 1.2317234
22  0.0548473313 0.28342038 0.9460332
23  0.0047249986 0.13724037 1.0788989
24  0.3613179241 0.06919609 0.8600740
25  0.8520577885 0.07230040 0.8501262
26  0.1590605211 0.64040074 1.0367707
27  0.0184263865 0.49568647 1.0383236
28  0.0404492553 0.82243203 0.9852527
29  0.1271091566 0.72403495 0.9741872
30  0.0210173122 0.84569251 1.0133425
31  0.0467994810 0.69981707 0.9835753
32  0.2335883385 0.75543602 1.0202136
33  0.0369135384 0.56220321 0.9747409
34  0.0461734149 0.70497816 1.0254243
35  0.0004535698 0.05237683 1.1243384
36  0.1348572462 0.73391203 1.0371396
37  0.1474750541 0.98272222 0.9974512
38  0.0392669126 0.80364551 0.9875228
39  0.0571142791 0.39608340 1.0778105
40  0.0693980438 0.22482964 0.9291072
41  0.0224140814 0.61588302 1.0254583
42  0.0585778227 0.31868273 0.9532668
43  0.0626525919 0.19243129 1.3896551
44  0.0490595036 0.00145614 0.8443056
45  0.2024389589 0.08139178 0.9064139
46  0.0166155719 0.64307364 0.9824453
47  0.9529982871 0.78589290 0.9414939
48  0.0526384247 0.10080188 0.9246927
49  0.1823120279 0.01095842 0.8193837
50  0.6220825158 0.17087390 0.8050211
51  0.1582619688 0.70085366 0.8895628
52  0.0315959407 0.17084854 0.9265600
53  0.0233249553 0.16595340 0.9487837
54  0.0088503071 0.25386754 0.9716423
55  0.0111711428 0.19499877 0.9716673
56  0.0084273575 0.17327109 1.0935715
57  0.0545598842 0.02532592 0.8682246
58  0.0241327252 0.08136702 0.9131982
59 -0.0035917911 0.02850637 1.1482054
60  0.0059108936 0.25475983 1.0611046
61  0.0353151810 0.53428269 1.0300597
62  0.0458243812 0.12261580 0.9342263
63  0.2459994882 0.74080292 0.9777569
64  0.1339163459 0.53592647 1.0259821
65  0.0307742115 0.04787848 0.8992633
66  0.2171506051 0.05639016 0.9104577
67  0.0366702965 0.26327183 1.0510480
68  0.0253360811 0.58581422 0.9765448
69  0.0287135928 0.98116205 0.9979330
70  0.0283752762 0.89521653 1.0139717
71  0.0088503071 0.25386754 0.9716423
72  0.0094187909 0.41623719 1.0474796
73  0.0955776052 0.57319656 0.9538979
74  0.0590013406 0.70255575 0.9441286
75  0.0201861158 0.71119135 1.0264577
76  0.0536438551 0.59785341 0.9450997
77  0.1457542231 0.45176934 1.0519811

We can increase the complexity of our procedure such as including normality and homoscedasticity tests. Furthermore, we can replace the t-test for more conservative test such as Wilcox or Brunner-Munzel.

Volcano plots

We often use a volcano plot to visualize p-values:

results <- results %>% 
  mutate(label = ifelse(p.value < 0.05, "S", "NS"),
                        log_pvalue = -log10(p.value))


ggplot(results, aes(x = fold, y = log_pvalue, color = label)) +
  geom_point() +
  geom_hline(yintercept = -log10(0.05), linetype = "dotted") +
  geom_text(data = results %>% filter(label == "S"),
            aes(x = fold, y = log_pvalue,
                label = protein),
            nudge_x = 0.05,
            nudge_y = 0.05) +
  labs(x = "Fold Change", y = "-log(p value)") +
  theme_bw() + 
  theme(legend.position = "none")

If we want a box around the protein names:

ggplot(results, aes(x = fold, y = log_pvalue, color = label)) +
  geom_point() +
  geom_hline(yintercept = -log10(0.05), linetype = "dotted") +
  geom_label(data = results %>% filter(label == "S"),
            aes(x = fold, y = log_pvalue,
                label = protein),
            nudge_x = 0.05,
            nudge_y = 0.10) +
  labs(x = "Fold Change", y = "-log(p value)") +
  theme_bw() + 
  theme(legend.position = "none")

Multiple comparisons corrections

Do not forget to apply p value adjustments:

results <- results %>% 
  mutate(p.value_holm = p.adjust(p.value, method = "holm"),
         log_pvalue_holm = -log10(p.value_holm)) %>%
  mutate(p.value_fdr = p.adjust(p.value, method = "BH"),
         log_pvalue_fdr= -log10(p.value_fdr)) 


ggplot(results, aes(x = fold, y = log_pvalue_holm)) +
  geom_point() +
  geom_hline(yintercept = -log10(0.05), linetype = "dotted")  +
  labs(x = "Fold Change", y = "-log(p value)") +
  theme_bw() + 
  theme(legend.position = "none")

ggplot(results, aes(x = fold, y = log_pvalue_fdr)) +
  geom_point() +
  geom_hline(yintercept = -log10(0.05), linetype = "dotted")  +
  labs(x = "Fold Change", y = "-log(p value)") +
  theme_bw() + 
  theme(legend.position = "none")

Holm’s correction controls the Family Wise Error Rate (FWER), consequently, it also controls the False Discovery Rate (FDR). However, Benjamin and Hochberg’s correction only controls FDR, which might be acceptable if validation experiments are planned.