library(tidyverse)
library(fs)
library(here)
# plotting helpers
library(cowplot)
library(patchwork)
library(ggbeeswarm)
theme_set(theme_cowplot())
# Okabe Ito color scheme with amber for yellow; see https://easystats.github.io/see/reference/scale_color_okabeito.html
colors_oi <- grDevices::palette.colors()
colors_oi['yellow'] <- "#F5C710"
Experiment described in Drive here
Note that the file extension indicates a CSV file, but the file is not a properly formulated tabular file. So our first task is to parse the data in this file. In addition, there are different sets of data in this file, which we will need to understand and document.
It will also be helfpul to plot a plate-based view of the florescence values to compare to our expected plate layout. We’ll also want to import the info for each well, to link to the data
fn <- here('_data/nao/2023-04-24-stock-quantification/QuantIt Protocol_230425_1137.csv')
data_raw <- read_lines(fn)
data_raw %>% head(20)
[1] "Testname: QuantIt Protocol"
[2] "Date: 4/25/2023 Time: 11:28:38 AM"
[3] "ID1: 107960 ID2: ID3: "
[4] "No. of Channels / Multichromatics: 21"
[5] "No. of Intervals: 1"
[6] "Configuration: Fluorescence"
[7] "Used filter settings and gain values:"
[8] " No. of scan procedures: 1 "
[9] " 1: 480-10/510-10 --> 480-10/530-10 2598"
[10] " 1: No. of scan values: 21 "
[11] " 2: - "
[12] " 2: - "
[13] "Focal height [mm]: 13.5"
[14] "End_of_header"
[15] ""
[16] "Chromatic: 1"
[17] "Interval: 1"
[18] "Time [s]: 0"
[19] "A01:\t151015.0000"
[20] "A02:\t154240.0000"
The data starts with a header, which ends with a line ‘End_of_header’, adn then has multiple blocks of data delimited by blank lines.
Each block of data starts with three lines giving values for Chromatic, Intervla, and Time variables, and then gives the values for each well.
The values for the well are in the format [Well id]:\t[Value]
.
Our process for importing the data might look like:
Let’s work within a tibble,
data_raw_tbl <- tibble(raw = data_raw)
First, I’ll collapse the different data blocks into separate data frames,
I’ll save the header information in a separate data frame in case we wish to pull any info from this later on.
Next, I’ll define a function to import a data block,
import_block <- function(x) {
# metadata is contained in the first three columns, in the format
# 'Chromatic: 1'
meta <- x %>%
slice_head(n = 3) %>%
separate(col = raw, into = c('name', 'value'), sep = ": ") %>%
pivot_wider() %>%
janitor::clean_names()
# main data follows, in the format "A01:\t151015.0000"
main <- x %>%
slice_tail(n = -3) %>%
separate(col = raw, into = c('well', 'value'), sep = ":\t") %>%
mutate(across(value, as.numeric))
crossing(meta, main)
}
test_block <- data_nest %>% filter(block == 1) %>% pull(data) %>% .[[1]]
import_block(test_block)
# A tibble: 96 × 5
chromatic interval time_s well value
<chr> <chr> <chr> <chr> <dbl>
1 1 1 0 A01 151015
2 1 1 0 A02 154240
3 1 1 0 A03 1400
4 1 1 0 A04 7566
5 1 1 0 A05 1363
6 1 1 0 A06 1557
7 1 1 0 A07 1521
8 1 1 0 A08 1444
9 1 1 0 A09 1605
10 1 1 0 A10 1681
# … with 86 more rows
Process all data blocks into one data frame,
data_proc <- data_nest %>%
filter(block != 0) %>%
mutate(data = map(data, import_block)) %>%
unnest(data) %>%
separate(well, into = c('row', 'column'), sep = 1, remove = FALSE) %>%
ungroup %>%
mutate(
across(chromatic, ~fct_inseq(.x, ordered = TRUE)),
# for column, coerce to integer first to remove leading zeros
across(column, ~as.integer(.x) %>% ordered),
across(row, ordered, levels = LETTERS[1:8]),
)
ss <- 'https://docs.google.com/spreadsheets/d/1HXdFckwlqdy4NQ0NSfUcB9_nN-n3xPJH2fAsK9Mi-Uc'
meta <- googlesheets4::read_sheet(ss, sheet = 2, na = 'NA') %>% glimpse
Rows: 13
Columns: 7
$ well <chr> "A01", "B01", "C01", "D01", "E01", "A02", "B02",…
$ name_ari <chr> "Std 1", "Std 2", "Std 3", "Std 4", "Std 5", "St…
$ type <chr> "standard", "standard", "standard", "standard", …
$ vol_te <dbl> 0, 50, 90, 99, 100, 0, 50, 90, 99, 100, 98, 98, …
$ vol_na <dbl> 100, 50, 10, 1, 0, 100, 50, 10, 1, 0, 2, 2, 2
$ vol_reagent <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100…
$ conc_ari <dbl> 1000, 500, 100, 10, 0, 1000, 500, 100, 10, 0, NA…
Visualize the first chromatic
data_proc %>%
filter(chromatic == 1) %>%
mutate(
# Reverse the row order so that the plate is oriented correctly in the plot
across(row, fct_rev)
) %>%
ggplot(aes(y = row, x = column, fill = log10(value))) +
coord_fixed() +
geom_tile() +
geom_text(aes(label = log10(value) %>% round(1)), size = 5, color = 'white') +
# scale_fill_brewer(type = "qual", palette = 2) +
labs(title = "Plate layout") +
theme(
legend.position = "bottom"
)
data_proc %>%
filter(column %in% c(1, 2, 3, 4)) %>%
ggplot(aes(x = value, y = well, color = chromatic)) +
scale_x_log10() +
geom_point()
Check for the standard curve
p1 <- data_proc %>%
filter(type == 'standard') %>%
ggplot(aes(x = conc_ari, y = value, color = chromatic)) +
geom_point()
p2 <- p1 +
scale_x_log10() +
scale_y_log10()
p1 / p2
Now let’s look at the SC for each chromatic, with the (geometric) mean value of the target sample marked.
chromatic | value |
---|---|
1 | 7522 |
2 | 7816 |
3 | 8295 |
4 | 8340 |
5 | 8380 |
6 | 8712 |
7 | 8827 |
8 | 9177 |
9 | 9431 |
10 | 9475 |
11 | 9564 |
12 | 9415 |
13 | 9415 |
14 | 9146 |
15 | 9567 |
16 | 9401 |
17 | 9624 |
18 | 9610 |
19 | 9738 |
20 | 9998 |
21 | 10174 |
data_proc %>%
filter(type == 'standard') %>%
ggplot(aes(x = conc_ari, y = value, color = chromatic)) +
facet_wrap(~chromatic) +
scale_x_log10() +
scale_y_log10() +
geom_hline(data = target_mean, aes(yintercept = value), linetype = 3) +
geom_point()
Note, I have not estimated and subtracted the background; possibly should, but want to understand why this is needed/suggested.
From these plots, it does seem like there is a non-linear relationship between the concentration and the flourescence signal.
We should not see this if the signal was RFU = background + slope * concentration
.
Simply subtracting the value of the blank from each sample will not fix this issue.
For each chromatic, the mean florescence value for the target sample appears in the non-linear part of the standard curve. To me, this suggests that the SC seen here may not be appropriate for a target sample in this range. The target sample was diluted 50-fold; but if it were not diluted, it would be in a good part of the SC.
The assay manual shows using a linear fit of non-transformed values. Is this appropriate?
TODO Question - what are the other chromatics?
TODO Baseline subtraction
TODO Standard curve estimation
TODO Power analysis - value of replicates for the SC and the target (?)