Multiplicity
Libraries
First, let’s load the libraries to organize our dataset.
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.8 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.1
✔ readr 2.1.2 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
Data
This dataset consists of expression level of 77 proteins of 78 mice divided among classes resulting from the combination of the following conditions:
- Genotype: control (c) or trysomy (s)
- Treatment: memantine (m) or saline (s)
- Behavior: context-shock (CS) or shock-context (SC)
The complete dataset was generated and analysed in:
Ahmed MM, Dhanasekaran AR, Block A, Tong S, Costa ACS, Stasko M, et al. (2015) Protein Dynamics Associated with Failed and Rescued Learning in the Ts65Dn Mouse Model of Down Syndrome. PLoS ONE 10(3): e0119491.
The data that will be used in this class is a subset originally analysed:
Higuera C, Gardiner KJ, Cios KJ (2015) Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome. PLoS ONE 10(6): e0129126. [Web Link] journal.pone.0129126
The experimental design is showed below:
We will focus only on two classes: “t-CS-s”, “t-CS-m”. Moreover, each protein is measured in triplicates. There are specific statistical methods to handle triplicates that are out of the scope in this course. Therefore, we will adopt a very common and simple strategy used by basic scientists: average over the triplicates.
<- read_csv('data/cortex_nuclear.csv') %>% clean_names() dataset
Rows: 1080 Columns: 82
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): MouseID, Genotype, Treatment, Behavior, class
dbl (77): DYRK1A_N, ITSN1_N, BDNF_N, NR1_N, NR2A_N, pAKT_N, pBRAF_N, pCAMKII...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(dataset)
Rows: 1,080
Columns: 82
$ mouse_id <chr> "309_1", "309_2", "309_3", "309_4", "309_5", "309_6",…
$ dyrk1a_n <dbl> 0.5036439, 0.5146171, 0.5091831, 0.4421067, 0.4349402…
$ itsn1_n <dbl> 0.7471932, 0.6890635, 0.7302468, 0.6170762, 0.6174298…
$ bdnf_n <dbl> 0.4301753, 0.4117703, 0.4183088, 0.3586263, 0.3588022…
$ nr1_n <dbl> 2.816329, 2.789514, 2.687201, 2.466947, 2.365785, 2.3…
$ nr2a_n <dbl> 5.990152, 5.685038, 5.622059, 4.979503, 4.718679, 4.8…
$ p_akt_n <dbl> 0.2188300, 0.2116362, 0.2090109, 0.2228858, 0.2131059…
$ p_braf_n <dbl> 0.1775655, 0.1728170, 0.1757222, 0.1764626, 0.1736270…
$ p_camkii_n <dbl> 2.373744, 2.292150, 2.283337, 2.152301, 2.134014, 2.1…
$ p_creb_n <dbl> 0.2322238, 0.2269721, 0.2302468, 0.2070042, 0.1921579…
$ p_elk_n <dbl> 1.750936, 1.596377, 1.561316, 1.595086, 1.504230, 1.4…
$ p_erk_n <dbl> 0.6879062, 0.6950062, 0.6773484, 0.5832768, 0.5509601…
$ p_jnk_n <dbl> 0.3063817, 0.2990511, 0.2912761, 0.2967287, 0.2869612…
$ pkca_n <dbl> 0.4026984, 0.3859868, 0.3810025, 0.3770870, 0.3635021…
$ p_mek_n <dbl> 0.2969273, 0.2813189, 0.2817103, 0.3138320, 0.2779643…
$ p_nr1_n <dbl> 1.0220603, 0.9566759, 1.0036350, 0.8753903, 0.8649120…
$ p_nr2a_n <dbl> 0.6056726, 0.5875587, 0.6024488, 0.5202932, 0.5079898…
$ p_nr2b_n <dbl> 1.877684, 1.725774, 1.731873, 1.566852, 1.480059, 1.5…
$ p_pkcab_n <dbl> 2.308745, 2.043037, 2.017984, 2.132754, 2.013697, 1.9…
$ p_rsk_n <dbl> 0.4415994, 0.4452219, 0.4676679, 0.4776707, 0.4834161…
$ akt_n <dbl> 0.8593658, 0.8346593, 0.8143294, 0.7277046, 0.6877937…
$ braf_n <dbl> 0.4162891, 0.4003642, 0.3998469, 0.3856387, 0.3675305…
$ camkii_n <dbl> 0.3696080, 0.3561775, 0.3680888, 0.3629700, 0.3553109…
$ creb_n <dbl> 0.1789443, 0.1736797, 0.1739047, 0.1794489, 0.1748355…
$ elk_n <dbl> 1.8663581, 1.7610467, 1.7655443, 1.2862766, 1.3246945…
$ erk_n <dbl> 3.685247, 3.485287, 3.571456, 2.970137, 2.896334, 2.9…
$ gsk3b_n <dbl> 1.537227, 1.509249, 1.501244, 1.419710, 1.359876, 1.4…
$ jnk_n <dbl> 0.2645263, 0.2557270, 0.2596135, 0.2595358, 0.2507050…
$ mek_n <dbl> 0.3196770, 0.3044187, 0.3117467, 0.2792181, 0.2736672…
$ trka_n <dbl> 0.8138665, 0.7805042, 0.7851540, 0.7344917, 0.7026991…
$ rsk_n <dbl> 0.1658460, 0.1571935, 0.1608954, 0.1622099, 0.1548274…
$ app_n <dbl> 0.4539098, 0.4309403, 0.4231873, 0.4106149, 0.3985498…
$ bcatenin_n <dbl> 3.037621, 2.921882, 2.944136, 2.500204, 2.456560, 2.4…
$ sod1_n <dbl> 0.3695096, 0.3422793, 0.3436962, 0.3445093, 0.3291258…
$ mtor_n <dbl> 0.4585385, 0.4235599, 0.4250048, 0.4292113, 0.4087552…
$ p38_n <dbl> 0.3353358, 0.3248347, 0.3248517, 0.3301208, 0.3134148…
$ p_mtor_n <dbl> 0.8251920, 0.7617176, 0.7570308, 0.7469798, 0.6919565…
$ dscr1_n <dbl> 0.5769155, 0.5450973, 0.5436197, 0.5467626, 0.5368605…
$ ampka_n <dbl> 0.4480993, 0.4208761, 0.4046298, 0.3868603, 0.3608164…
$ nr2b_n <dbl> 0.5862714, 0.5450973, 0.5529941, 0.5478485, 0.5128240…
$ p_numb_n <dbl> 0.3947213, 0.3682546, 0.3638799, 0.3667707, 0.3515510…
$ raptor_n <dbl> 0.3395706, 0.3219592, 0.3130859, 0.3284919, 0.3122063…
$ tiam1_n <dbl> 0.4828639, 0.4545193, 0.4471972, 0.4426497, 0.4190949…
$ p_p70s6_n <dbl> 0.2941698, 0.2764306, 0.2566482, 0.3985340, 0.3934470…
$ numb_n <dbl> 0.1821505, 0.1820863, 0.1843877, 0.1617677, 0.1602002…
$ p70s6_n <dbl> 0.8427252, 0.8476146, 0.8561658, 0.7602335, 0.7681129…
$ p_gsk3b_n <dbl> 0.1926084, 0.1948153, 0.2007373, 0.1841694, 0.1857183…
$ p_pkcg_n <dbl> 1.4430907, 1.4394598, 1.5243642, 1.6123821, 1.6458073…
$ cdk5_n <dbl> 0.2947000, 0.2940598, 0.3018807, 0.2963818, 0.2968294…
$ s6_n <dbl> 0.3546045, 0.3545483, 0.3860868, 0.2906795, 0.3093450…
$ adarb1_n <dbl> 1.3390700, 1.3063231, 1.2796003, 1.1987645, 1.2069949…
$ acetyl_h3k9_n <dbl> 0.17011879, 0.17142709, 0.18545629, 0.15979906, 0.164…
$ rrp1_n <dbl> 0.1591024, 0.1581289, 0.1486963, 0.1661123, 0.1606870…
$ bax_n <dbl> 0.1888517, 0.1845700, 0.1905322, 0.1853235, 0.1882214…
$ arc_n <dbl> 0.10630521, 0.10659216, 0.10830306, 0.10318376, 0.104…
$ erbb4_n <dbl> 0.1449893, 0.1504709, 0.1453302, 0.1406558, 0.1419830…
$ n_nos_n <dbl> 0.1766677, 0.1783090, 0.1762129, 0.1638042, 0.1677096…
$ tau_n <dbl> 0.1251904, 0.1342751, 0.1325604, 0.1232096, 0.1368377…
$ gfap_n <dbl> 0.11529089, 0.11823450, 0.11776021, 0.11743941, 0.116…
$ glu_r3_n <dbl> 0.2280435, 0.2380731, 0.2448173, 0.2349467, 0.2555277…
$ glu_r4_n <dbl> 0.1427556, 0.1420366, 0.1424450, 0.1450682, 0.1408705…
$ il1b_n <dbl> 0.4309575, 0.4571562, 0.5104723, 0.4309959, 0.4812265…
$ p3525_n <dbl> 0.2475378, 0.2576322, 0.2553430, 0.2511031, 0.2517730…
$ p_casp9_n <dbl> 1.603310, 1.671738, 1.663550, 1.484624, 1.534835, 1.5…
$ psd95_n <dbl> 2.014875, 2.004605, 2.016831, 1.957233, 2.009109, 2.0…
$ snca_n <dbl> 0.1082343, 0.1097485, 0.1081962, 0.1198832, 0.1195244…
$ ubiquitin_n <dbl> 1.0449792, 1.0098831, 0.9968476, 0.9902247, 0.9977750…
$ p_gsk3b_tyr216_n <dbl> 0.8315565, 0.8492704, 0.8467087, 0.8332768, 0.8786678…
$ shh_n <dbl> 0.1888517, 0.2004036, 0.1936845, 0.1921119, 0.2056042…
$ bad_n <dbl> 0.1226520, 0.1166822, 0.1185082, 0.1327812, 0.1299541…
$ bcl2_n <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ p_s6_n <dbl> 0.10630521, 0.10659216, 0.10830306, 0.10318376, 0.104…
$ p_cfos_n <dbl> 0.1083359, 0.1043154, 0.1062193, 0.1112620, 0.1106939…
$ syp_n <dbl> 0.4270992, 0.4415813, 0.4357769, 0.3916910, 0.4341538…
$ h3ac_k18_n <dbl> 0.1147832, 0.1119735, 0.1118829, 0.1304053, 0.1184814…
$ egr1_n <dbl> 0.1317900, 0.1351030, 0.1333618, 0.1474442, 0.1403143…
$ h3me_k4_n <dbl> 0.1281856, 0.1311187, 0.1274311, 0.1469011, 0.1483799…
$ ca_na_n <dbl> 1.675652, 1.743610, 1.926427, 1.700563, 1.839730, 1.8…
$ genotype <chr> "Control", "Control", "Control", "Control", "Control"…
$ treatment <chr> "Memantine", "Memantine", "Memantine", "Memantine", "…
$ behavior <chr> "C/S", "C/S", "C/S", "C/S", "C/S", "C/S", "C/S", "C/S…
$ class <chr> "c-CS-m", "c-CS-m", "c-CS-m", "c-CS-m", "c-CS-m", "c-…
<- dataset %>%
dt separate(col = mouse_id,
into = c("mouse_id", "replicate"), sep = "_") %>%
group_by(mouse_id, genotype, treatment, behavior, class) %>%
summarize(across(where(is.numeric), mean, na.rm = TRUE)) %>%
arrange(., genotype, treatment, behavior, class) %>%
ungroup() %>%
filter(class == "t-CS-m" | class == "t-CS-s") %>%
mutate(class = factor(class, levels = c("t-CS-s", "t-CS-m")
) )
`summarise()` has grouped output by 'mouse_id', 'genotype', 'treatment',
'behavior'. You can override using the `.groups` argument.
Comparing two groups
t.test(dyrk1a_n ~ class, data = dt)
Welch Two Sample t-test
data: dyrk1a_n by class
t = -1.4507, df = 12.781, p-value = 0.171
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
-0.23312987 0.04601195
sample estimates:
mean in group t-CS-s mean in group t-CS-m
0.5257350 0.6192939
t.test(itsn1_n ~ class, data = dt)
Welch Two Sample t-test
data: itsn1_n by class
t = -0.56208, df = 13.892, p-value = 0.583
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
-0.1804562 0.1055562
sample estimates:
mean in group t-CS-s mean in group t-CS-m
0.7595565 0.7970065
t.test(bdnf_n ~ class, data = dt)
Welch Two Sample t-test
data: bdnf_n by class
t = -0.41928, df = 13.132, p-value = 0.6818
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
-0.04470177 0.03015822
sample estimates:
mean in group t-CS-s mean in group t-CS-m
0.3054604 0.3127322
Hovewer, it does not seem a very productive process because there are 77 proteins, and we will need to check the p-value of each test individually. Moreover, we did not even check the assumptions of the t-test.
Lists
Lists are a class of objects in R that allow us to store other objects of different classes with different dimensions. Data frames are also a particular type of lists because allow us to store character, numeric, factor, logical vectors of the same length resulting in a rectangular shape object.
Many functions in R return lists as output that can be stored, even though their class is not explictly a list.
<- t.test(dyrk1a_n ~ class, data = dt)
my.test class(my.test)
[1] "htest"
The function ls
help us to find out the objects that are contained in a list. We can access these objects using $ similar to a data frame. These objects are often pieces presented in the visual output of the function.
ls(my.test)
[1] "alternative" "conf.int" "data.name" "estimate" "method"
[6] "null.value" "p.value" "parameter" "statistic" "stderr"
$conf.int my.test
[1] -0.23312987 0.04601195
attr(,"conf.level")
[1] 0.95
$p.value my.test
[1] 0.1709624
$method my.test
[1] "Welch Two Sample t-test"
The ability to access individually objects for a procedure allow us to store only the information that we really need to perform our analysis.
<- my.test$p.value
my.pvalue my.pvalue
[1] 0.1709624
<- my.test$conf.int
my.ci my.ci
[1] -0.23312987 0.04601195
attr(,"conf.level")
[1] 0.95
<- my.test$method
my.method my.method
[1] "Welch Two Sample t-test"
Now, we can create a data frame that will contain such objects.
<- data.frame(protein = NA, method = NA, difference = NA,
results lower.ci = NA, upper.ci = NA, p.value = NA)
In this way, we can store only the objects that are useful for us.
<- t.test(dyrk1a_n ~ class, data = dt)
my.test 1, 'protein'] <- "dyrk1a_n"
results[1, 'method'] <- my.test$method
results[1, 'fold'] <- my.test$estimate[2]/my.test$estimate[1]
results[1, 'lower.ci'] <- my.test$conf.int[1]
results[1, 'upper.ci'] <- my.test$conf.int[2]
results[1, 'p.value'] <- my.test$p.value
results[
<- t.test(itsn1_n ~ class, data = dt)
my.test 2, 'protein'] <- "itsn1_n"
results[2, 'method'] <- my.test$method
results[2, 'fold'] <- my.test$estimate[2]/my.test$estimate[1]
results[2, 'lower.ci'] <- my.test$conf.int[1]
results[2, 'upper.ci'] <- my.test$conf.int[2]
results[2, 'p.value'] <- my.test$p.value
results[
<- t.test(bdnf_n ~ class, data = dt)
my.test 3, 'protein'] <- "itsn1_n"
results[3, 'method'] <- my.test$method
results[3, 'fold'] <- my.test$estimate[2]/my.test$estimate[1]
results[3, 'lower.ci'] <- my.test$conf.int[1]
results[3, 'upper.ci'] <- my.test$conf.int[2]
results[3, 'p.value'] <- my.test$p.value
results[
results
protein method difference lower.ci upper.ci p.value
1 dyrk1a_n Welch Two Sample t-test NA -0.23312987 0.04601195 0.1709624
2 itsn1_n Welch Two Sample t-test NA -0.18045624 0.10555618 0.5830175
3 itsn1_n Welch Two Sample t-test NA -0.04470177 0.03015822 0.6817885
fold
1 1.177958
2 1.049305
3 1.023806
Even though we do not need to check each output individually, we still need to code our testing procedure for each protein. The procedure is the same for all proteins, i.e., we have a recipe for all of them.
How can we simplify this procedure?
For loops
When we have repeated tasks, we can use for loop to execute a recipe several times.
for (i in 1:10){
iprint(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
The for loop is controlled by a counter that will help us to run our analysis for all the proteins.
Therefore, we need to choose the limits of our counter:
names(dt)
[1] "mouse_id" "genotype" "treatment" "behavior"
[5] "class" "dyrk1a_n" "itsn1_n" "bdnf_n"
[9] "nr1_n" "nr2a_n" "p_akt_n" "p_braf_n"
[13] "p_camkii_n" "p_creb_n" "p_elk_n" "p_erk_n"
[17] "p_jnk_n" "pkca_n" "p_mek_n" "p_nr1_n"
[21] "p_nr2a_n" "p_nr2b_n" "p_pkcab_n" "p_rsk_n"
[25] "akt_n" "braf_n" "camkii_n" "creb_n"
[29] "elk_n" "erk_n" "gsk3b_n" "jnk_n"
[33] "mek_n" "trka_n" "rsk_n" "app_n"
[37] "bcatenin_n" "sod1_n" "mtor_n" "p38_n"
[41] "p_mtor_n" "dscr1_n" "ampka_n" "nr2b_n"
[45] "p_numb_n" "raptor_n" "tiam1_n" "p_p70s6_n"
[49] "numb_n" "p70s6_n" "p_gsk3b_n" "p_pkcg_n"
[53] "cdk5_n" "s6_n" "adarb1_n" "acetyl_h3k9_n"
[57] "rrp1_n" "bax_n" "arc_n" "erbb4_n"
[61] "n_nos_n" "tau_n" "gfap_n" "glu_r3_n"
[65] "glu_r4_n" "il1b_n" "p3525_n" "p_casp9_n"
[69] "psd95_n" "snca_n" "ubiquitin_n" "p_gsk3b_tyr216_n"
[73] "shh_n" "bad_n" "bcl2_n" "p_s6_n"
[77] "p_cfos_n" "syp_n" "h3ac_k18_n" "egr1_n"
[81] "h3me_k4_n" "ca_na_n"
ncol(dt)
[1] 82
We found that our data.frame r, d
has 82 columns and proteins are stored starting at column 6. Let’s double check it:
<- names(dt)[6:82] protein_names
The second step is to incorporate the counter in our test procedure:
# One way
t.test(dyrk1a_n ~ class, data = dt)
Welch Two Sample t-test
data: dyrk1a_n by class
t = -1.4507, df = 12.781, p-value = 0.171
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
-0.23312987 0.04601195
sample estimates:
mean in group t-CS-s mean in group t-CS-m
0.5257350 0.6192939
t.test(get(protein_names[1]) ~ class, data = dt)
Welch Two Sample t-test
data: get(protein_names[1]) by class
t = -1.4507, df = 12.781, p-value = 0.171
alternative hypothesis: true difference in means between group t-CS-s and group t-CS-m is not equal to 0
95 percent confidence interval:
-0.23312987 0.04601195
sample estimates:
mean in group t-CS-s mean in group t-CS-m
0.5257350 0.6192939
Finally, we can obtain t-test result for all proteins:
for (i in 1:length(protein_names)){
<- t.test(get(protein_names[i]) ~ class, data = dt)
my.test 'protein'] <- names(dt)[i]
results[i, 'method'] <- my.test$method
results[i, 'fold'] <- my.test$estimate[2]/my.test$estimate[1]
results[i, 'lower.ci'] <- my.test$conf.int[1]
results[i, 'upper.ci'] <- my.test$conf.int[2]
results[i, 'p.value'] <- my.test$p.value
results[i,
}
results
protein method difference lower.ci
1 mouse_id Welch Two Sample t-test NA -0.2331298679
2 genotype Welch Two Sample t-test NA -0.1804562433
3 treatment Welch Two Sample t-test NA -0.0447017691
4 behavior Welch Two Sample t-test NA -0.2705101262
5 class Welch Two Sample t-test NA -0.5979478852
6 dyrk1a_n Welch Two Sample t-test NA -0.0346190608
7 itsn1_n Welch Two Sample t-test NA -0.0269768286
8 bdnf_n Welch Two Sample t-test NA -1.9282203775
9 nr1_n Welch Two Sample t-test NA -0.0155205246
10 nr2a_n Welch Two Sample t-test NA -0.2963585134
11 p_akt_n Welch Two Sample t-test NA -0.3320380355
12 p_braf_n Welch Two Sample t-test NA -0.0273300908
13 p_camkii_n Welch Two Sample t-test NA -0.0739802326
14 p_creb_n Welch Two Sample t-test NA -0.0392067366
15 p_elk_n Welch Two Sample t-test NA -0.0860347851
16 p_erk_n Welch Two Sample t-test NA -0.0667282489
17 p_jnk_n Welch Two Sample t-test NA -0.0844160803
18 pkca_n Welch Two Sample t-test NA -0.5138528237
19 p_mek_n Welch Two Sample t-test NA -0.1222570633
20 p_nr1_n Welch Two Sample t-test NA -0.0067605346
21 p_nr2a_n Welch Two Sample t-test NA -0.2416263658
22 p_nr2b_n Welch Two Sample t-test NA -0.0173320972
23 p_pkcab_n Welch Two Sample t-test NA -0.0307405828
24 p_rsk_n Welch Two Sample t-test NA -0.0155502188
25 akt_n Welch Two Sample t-test NA -0.0432998824
26 braf_n Welch Two Sample t-test NA -0.2497711413
27 camkii_n Welch Two Sample t-test NA -0.0362414598
28 creb_n Welch Two Sample t-test NA -0.0326955749
29 elk_n Welch Two Sample t-test NA -0.0905849541
30 erk_n Welch Two Sample t-test NA -0.0252604320
31 gsk3b_n Welch Two Sample t-test NA -0.0322959168
32 jnk_n Welch Two Sample t-test NA -0.3143189237
33 mek_n Welch Two Sample t-test NA -0.0209624060
34 trka_n Welch Two Sample t-test NA -0.0659138759
35 rsk_n Welch Two Sample t-test NA -0.0772767206
36 app_n Welch Two Sample t-test NA -0.1823801622
37 bcatenin_n Welch Two Sample t-test NA -0.1446787837
38 sod1_n Welch Two Sample t-test NA -0.0309708918
39 mtor_n Welch Two Sample t-test NA -0.1302304291
40 p38_n Welch Two Sample t-test NA -0.0178488114
41 p_mtor_n Welch Two Sample t-test NA -0.0362763677
42 dscr1_n Welch Two Sample t-test NA -0.0205825562
43 ampka_n Welch Two Sample t-test NA -0.2825885243
44 nr2b_n Welch Two Sample t-test NA 0.0157199341
45 p_numb_n Welch Two Sample t-test NA -0.0134353480
46 raptor_n Welch Two Sample t-test NA -0.0106127303
47 tiam1_n Welch Two Sample t-test NA -0.7380428178
48 p_p70s6_n Welch Two Sample t-test NA -0.0052828003
49 numb_n Welch Two Sample t-test NA 0.0290637423
50 p70s6_n Welch Two Sample t-test NA -0.1243853162
51 p_gsk3b_n Welch Two Sample t-test NA -0.1107094505
52 p_pkcg_n Welch Two Sample t-test NA -0.0065125486
53 cdk5_n Welch Two Sample t-test NA -0.0045686958
54 s6_n Welch Two Sample t-test NA -0.0025348467
55 adarb1_n Welch Two Sample t-test NA -0.0025563687
56 acetyl_h3k9_n Welch Two Sample t-test NA -0.0389455864
57 rrp1_n Welch Two Sample t-test NA 0.0042160320
58 bax_n Welch Two Sample t-test NA -0.0016815711
59 arc_n Welch Two Sample t-test NA -0.0546651271
60 erbb4_n Welch Two Sample t-test NA -0.0203556283
61 n_nos_n Welch Two Sample t-test NA -0.0644190458
62 tau_n Welch Two Sample t-test NA -0.0064086797
63 gfap_n Welch Two Sample t-test NA -0.1796378789
64 glu_r3_n Welch Two Sample t-test NA -0.2462807431
65 glu_r4_n Welch Two Sample t-test NA 0.0001674341
66 il1b_n Welch Two Sample t-test NA -0.0033679507
67 p3525_n Welch Two Sample t-test NA -0.1223429871
68 p_casp9_n Welch Two Sample t-test NA -0.0149151957
69 psd95_n Welch Two Sample t-test NA -0.0280898396
70 snca_n Welch Two Sample t-test NA -0.0320131410
71 ubiquitin_n Welch Two Sample t-test NA -0.0025348467
72 p_gsk3b_tyr216_n Welch Two Sample t-test NA -0.0210010435
73 shh_n Welch Two Sample t-test NA -0.0555990810
74 bad_n Welch Two Sample t-test NA -0.0413025934
75 bcl2_n Welch Two Sample t-test NA -0.0285559844
76 p_s6_n Welch Two Sample t-test NA -0.0326247170
77 p_cfos_n Welch Two Sample t-test NA -0.3071693896
upper.ci p.value fold
1 0.0460119490 0.17096237 1.1779584
2 0.1055561832 0.58301753 1.0493051
3 0.0301582199 0.68178851 1.0238059
4 0.2466411290 0.92202879 1.0054630
5 0.4957053361 0.84083031 1.0145444
6 0.0363075174 0.95878461 0.9960636
7 0.0086549884 0.28648605 1.0555899
8 0.6624227414 0.30820054 1.2542884
9 0.0288115052 0.52768963 0.9683609
10 0.2051517151 0.69868932 1.0300358
11 0.1221255918 0.32787788 1.1509789
12 0.0735824390 0.33884331 0.9230022
13 0.0529665091 0.72784487 1.0339358
14 0.0392265539 0.99956602 0.9999606
15 0.0721719723 0.85226744 1.0090652
16 0.2318006574 0.25519944 0.8718995
17 0.2273953591 0.34195169 0.9529007
18 0.6530769253 0.80175541 0.9604331
19 0.0360545512 0.26236721 1.1031920
20 0.1176030625 0.07661619 0.9133774
21 0.0392296120 0.14140883 1.2317234
22 0.0548473313 0.28342038 0.9460332
23 0.0047249986 0.13724037 1.0788989
24 0.3613179241 0.06919609 0.8600740
25 0.8520577885 0.07230040 0.8501262
26 0.1590605211 0.64040074 1.0367707
27 0.0184263865 0.49568647 1.0383236
28 0.0404492553 0.82243203 0.9852527
29 0.1271091566 0.72403495 0.9741872
30 0.0210173122 0.84569251 1.0133425
31 0.0467994810 0.69981707 0.9835753
32 0.2335883385 0.75543602 1.0202136
33 0.0369135384 0.56220321 0.9747409
34 0.0461734149 0.70497816 1.0254243
35 0.0004535698 0.05237683 1.1243384
36 0.1348572462 0.73391203 1.0371396
37 0.1474750541 0.98272222 0.9974512
38 0.0392669126 0.80364551 0.9875228
39 0.0571142791 0.39608340 1.0778105
40 0.0693980438 0.22482964 0.9291072
41 0.0224140814 0.61588302 1.0254583
42 0.0585778227 0.31868273 0.9532668
43 0.0626525919 0.19243129 1.3896551
44 0.0490595036 0.00145614 0.8443056
45 0.2024389589 0.08139178 0.9064139
46 0.0166155719 0.64307364 0.9824453
47 0.9529982871 0.78589290 0.9414939
48 0.0526384247 0.10080188 0.9246927
49 0.1823120279 0.01095842 0.8193837
50 0.6220825158 0.17087390 0.8050211
51 0.1582619688 0.70085366 0.8895628
52 0.0315959407 0.17084854 0.9265600
53 0.0233249553 0.16595340 0.9487837
54 0.0088503071 0.25386754 0.9716423
55 0.0111711428 0.19499877 0.9716673
56 0.0084273575 0.17327109 1.0935715
57 0.0545598842 0.02532592 0.8682246
58 0.0241327252 0.08136702 0.9131982
59 -0.0035917911 0.02850637 1.1482054
60 0.0059108936 0.25475983 1.0611046
61 0.0353151810 0.53428269 1.0300597
62 0.0458243812 0.12261580 0.9342263
63 0.2459994882 0.74080292 0.9777569
64 0.1339163459 0.53592647 1.0259821
65 0.0307742115 0.04787848 0.8992633
66 0.2171506051 0.05639016 0.9104577
67 0.0366702965 0.26327183 1.0510480
68 0.0253360811 0.58581422 0.9765448
69 0.0287135928 0.98116205 0.9979330
70 0.0283752762 0.89521653 1.0139717
71 0.0088503071 0.25386754 0.9716423
72 0.0094187909 0.41623719 1.0474796
73 0.0955776052 0.57319656 0.9538979
74 0.0590013406 0.70255575 0.9441286
75 0.0201861158 0.71119135 1.0264577
76 0.0536438551 0.59785341 0.9450997
77 0.1457542231 0.45176934 1.0519811
We can increase the complexity of our procedure such as including normality and homoscedasticity tests. Furthermore, we can replace the t-test for more conservative test such as Wilcox or Brunner-Munzel.
Volcano plots
We often use a volcano plot to visualize p-values:
<- results %>%
results mutate(label = ifelse(p.value < 0.05, "S", "NS"),
log_pvalue = -log10(p.value))
ggplot(results, aes(x = fold, y = log_pvalue, color = label)) +
geom_point() +
geom_hline(yintercept = -log10(0.05), linetype = "dotted") +
geom_text(data = results %>% filter(label == "S"),
aes(x = fold, y = log_pvalue,
label = protein),
nudge_x = 0.05,
nudge_y = 0.05) +
labs(x = "Fold Change", y = "-log(p value)") +
theme_bw() +
theme(legend.position = "none")
If we want a box around the protein names:
ggplot(results, aes(x = fold, y = log_pvalue, color = label)) +
geom_point() +
geom_hline(yintercept = -log10(0.05), linetype = "dotted") +
geom_label(data = results %>% filter(label == "S"),
aes(x = fold, y = log_pvalue,
label = protein),
nudge_x = 0.05,
nudge_y = 0.10) +
labs(x = "Fold Change", y = "-log(p value)") +
theme_bw() +
theme(legend.position = "none")
Multiple comparisons corrections
Do not forget to apply p value adjustments:
<- results %>%
results mutate(p.value_holm = p.adjust(p.value, method = "holm"),
log_pvalue_holm = -log10(p.value_holm)) %>%
mutate(p.value_fdr = p.adjust(p.value, method = "BH"),
log_pvalue_fdr= -log10(p.value_fdr))
ggplot(results, aes(x = fold, y = log_pvalue_holm)) +
geom_point() +
geom_hline(yintercept = -log10(0.05), linetype = "dotted") +
labs(x = "Fold Change", y = "-log(p value)") +
theme_bw() +
theme(legend.position = "none")
ggplot(results, aes(x = fold, y = log_pvalue_fdr)) +
geom_point() +
geom_hline(yintercept = -log10(0.05), linetype = "dotted") +
labs(x = "Fold Change", y = "-log(p value)") +
theme_bw() +
theme(legend.position = "none")
Holm’s correction controls the Family Wise Error Rate (FWER), consequently, it also controls the False Discovery Rate (FDR). However, Benjamin and Hochberg’s correction only controls FDR, which might be acceptable if validation experiments are planned.