library(tidyverse)
library(fs)
library(here)
library(knitr)
library(broom)
# plotting helpers
library(cowplot)
library(patchwork)
library(ggbeeswarm)
theme_set(theme_cowplot())
# Okabe Ito color scheme with amber for yellow; see https://easystats.github.io/see/reference/scale_color_okabeito.html
colors_oi <- grDevices::palette.colors()
colors_oi['yellow'] <- "#F5C710"
# Custon qPCR helpers
source('_functions.R')
We are interested in using some old stocks of phagemid tracers as spike-in controls for an upcoming experiment. For that, we need an estimate of the tracer concentration in the stocks, in terms of Cq values and/or particle concentration.
Ari did two qPCRs on 2023-05-31; see the experiment folder in Drive.
Store in local data file for now; could perhaps put in top level data directory.
url <- "https://docs.google.com/spreadsheets/d/1VV7B6m-gy-VqTVpi9feP0f-8QUd5ZLNR"
dir_create('_data')
data_file <- '_data/5-31 28 Test.xls'
if (!file_exists(data_file)) {
googledrive::drive_download(url, path = data_file, overwrite = FALSE)
}
Import and clean as in previous analyses,
Rows: 9
Columns: 21
$ well <int> 1, 2, 3, 13, 14, 15, 25, 26, 27
$ well_position <chr> "A1", "A2", "A3", "B1", "B2", "B3", "…
$ sample_name <chr> "Blank", "NTC_10", "28", "Blank", "NT…
$ target_name <fct> Blank, B1_NTC, 10, Blank, B1_NTC, 10,…
$ ct <dbl> 33.604317, NA, 4.515600, 6.221345, NA…
$ ct_mean <chr> "16.076345443725586", NA, "4.59870433…
$ ct_sd <chr> "15.218825340270996", NA, "0.08602836…
$ automatic_ct_threshold <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ ct_threshold <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.…
$ automatic_baseline <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ baseline_start <int> 3, 3, 3, 3, 3, 3, 3, 3, 3
$ baseline_end <int> 27, 39, 3, 6, 20, 3, 6, 39, 3
$ amp_status <chr> "No Amp", "No Amp", "Amp", "Inconclus…
$ comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA
$ cq_conf <dbl> 0.0000000, 0.0000000, 0.9364234, 0.24…
$ cqconf <chr> "Y", "Y", "N", "Y", "Y", "N", "Y", "Y…
$ expfail <chr> "N", "Y", "N", "N", "N", "N", "N", "Y…
$ highsd <chr> "Y", "N", "N", "Y", "N", "N", "Y", "N…
$ spike <chr> "N", "N", "N", "N", "N", "N", "Y", "N…
$ noamp <chr> "N", "Y", "N", "N", "Y", "N", "N", "Y…
$ prfdrop <chr> "Y", "N", "N", "Y", "N", "N", "N", "N…
Check the threshold settings,
# A tibble: 1 × 3
automatic_ct_threshold ct_threshold n
<lgl> <dbl> <int>
1 FALSE 0.2 9
ct_threshold = 0.2
Load the amplification data — the relative florescence values (Rn and Delta Rn) — and remove the unused wells (based on those with no sample name).
For future reference, the line at which the data starts can vary between files; here it is 43, but in a previous experiment it was 40.
Rows: 360
Columns: 24
$ well <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ well_position <chr> "A1", "A1", "A1", "A1", "A1", "A1", "…
$ cycle <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…
$ target_name <chr> "Blank", "Blank", "Blank", "Blank", "…
$ rn <dbl> 3.744115, 3.723282, 3.720500, 3.73922…
$ delta_rn <dbl> -1.644028e-02, -3.789984e-02, -4.1309…
$ sample_name <chr> "Blank", "Blank", "Blank", "Blank", "…
$ ct <dbl> 33.60432, 33.60432, 33.60432, 33.6043…
$ ct_mean <chr> "16.076345443725586", "16.07634544372…
$ ct_sd <chr> "15.218825340270996", "15.21882534027…
$ automatic_ct_threshold <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ ct_threshold <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.…
$ automatic_baseline <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ baseline_start <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ baseline_end <int> 27, 27, 27, 27, 27, 27, 27, 27, 27, 2…
$ amp_status <chr> "No Amp", "No Amp", "No Amp", "No Amp…
$ comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cq_conf <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cqconf <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y…
$ expfail <chr> "N", "N", "N", "N", "N", "N", "N", "N…
$ highsd <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y…
$ spike <chr> "N", "N", "N", "N", "N", "N", "N", "N…
$ noamp <chr> "N", "N", "N", "N", "N", "N", "N", "N…
$ prfdrop <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y…
And get the baseline coordinates for plotting,
Rows: 18
Columns: 23
$ well <int> 1, 1, 2, 2, 3, 3, 13, 13, 14, 14, 15,…
$ well_position <chr> "A1", "A1", "A2", "A2", "A3", "A3", "…
$ sample_name <chr> "Blank", "Blank", "NTC_10", "NTC_10",…
$ target_name <fct> Blank, Blank, B1_NTC, B1_NTC, 10, 10,…
$ ct <dbl> 33.604317, 33.604317, NA, NA, 4.51560…
$ ct_mean <chr> "16.076345443725586", "16.07634544372…
$ ct_sd <chr> "15.218825340270996", "15.21882534027…
$ automatic_ct_threshold <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ ct_threshold <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.…
$ automatic_baseline <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ amp_status <chr> "No Amp", "No Amp", "No Amp", "No Amp…
$ comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cq_conf <dbl> 0.0000000, 0.0000000, 0.0000000, 0.00…
$ cqconf <chr> "Y", "Y", "Y", "Y", "N", "N", "Y", "Y…
$ expfail <chr> "N", "N", "Y", "Y", "N", "N", "N", "N…
$ highsd <chr> "Y", "Y", "N", "N", "N", "N", "Y", "Y…
$ spike <chr> "N", "N", "N", "N", "N", "N", "N", "N…
$ noamp <chr> "N", "N", "Y", "Y", "N", "N", "N", "N…
$ prfdrop <chr> "Y", "Y", "N", "N", "N", "N", "Y", "Y…
$ baseline_boundary <chr> "start", "end", "start", "end", "star…
$ cycle <int> 3, 27, 3, 39, 3, 3, 3, 6, 3, 20, 3, 3…
$ rn <dbl> 3.720500, 3.710590, 5.606404, 5.66603…
$ delta_rn <dbl> -0.0413092077, -0.0662675351, -0.0023…
# A tibble: 3 × 2
sample_name n
<chr> <int>
1 28 3
2 Blank 3
3 NTC_10 3
delta_rn_min <- 1e-3
p1 <- amp %>%
filter(!is.na(target_name)) %>%
ggplot(aes(cycle, pmax(delta_rn, delta_rn_min), color = target_name)) +
geom_line(aes(group = well)) +
geom_hline(yintercept = ct_threshold, alpha = 0.3) +
scale_color_brewer(type = 'qual') +
geom_point(data = baselines, aes(shape = baseline_boundary), size = 3) +
scale_shape_manual(values = c(1, 4)) +
labs(y = 'Delta Rn', x = 'Cycle', color = 'Target')
p2 <- p1 +
scale_y_log10()
p1 / p2
Note the weird stuff going on with the blanks, particularly the one that appears to grow steadily. This may be an artefact of baseline subtraction, so let’s look at the raw Rn.
p3 <- amp %>%
filter(!is.na(target_name)) %>%
ggplot(aes(cycle, rn, color = target_name)) +
geom_line(aes(group = well)) +
scale_color_brewer(type = 'qual') +
geom_point(data = baselines, aes(shape = baseline_boundary), size = 3) +
scale_shape_manual(values = c(1, 4)) +
labs(y = 'Rn', x = 'Cycle', color = 'Target')
p4 <- p3 +
scale_y_log10()
p4 / p2
Comparing Delta Rn to Rn shows that the blanks didn,t have amplication and the apparent growth is just an artefact of baseline subtraction, as I hypothesized.
Note, the dRn trajectories of the target samples are already saturating at the threshold point.
delta_rn_min <- 1e-3
p1 <- amp %>%
filter(target_name == '10', cycle <= 8) %>%
pivot_longer(c(rn, delta_rn)) %>%
ggplot(aes(cycle, value)) +
facet_wrap(~name, scales = 'free_y') +
geom_line(aes(group = well)) +
# geom_hline(yintercept = expected_ct_threshold, alpha = 0.3) +
scale_color_brewer(type = 'qual') +
scale_shape_manual(values = c(1, 4)) +
labs(y = '(Delta) Rn', x = 'Cycle')
p2 <- p1 +
scale_y_log10()
p1 / p2
I’m confused about why the rn curves aren’t being affected by the log scaling; something seems to be off with the faceting and y-axis scaling interaction.
I’m concerned that baseline subtraction may not be functioning properly given the high amount of phagemid that is present.
The Ct mean as reported by the software is 4.599.
Store in local data file for now; could perhaps put in top level data directory.
url <- "https://docs.google.com/spreadsheets/d/1NdmA43mYqXuHeu3E_QJJ40NijfXPVWwc"
data_file <- '_data/5-31 28 Test 2.xls'
if (!file_exists(data_file)) {
googledrive::drive_download(url, path = data_file, overwrite = FALSE)
}
Import and clean as in previous analyses. Here, some sample names are numerical and indicate the dilution factor, so I’ll parse those.
results <- data_file %>%
read_qpcr_results %>%
# Remove some unused columns
select(-c(omit, task, reporter, quencher, starts_with('quantity'),
y_intercept:efficiency, target_color)) %>%
mutate(
dilution_factor = str_replace(sample_name, ',', '') %>% as.numeric,
) %>%
glimpse
Rows: 21
Columns: 20
$ well <int> 1, 2, 3, 4, 5, 6, 7, 13, 14, 15, 16, …
$ well_position <chr> "A1", "A2", "A3", "A4", "A5", "A6", "…
$ sample_name <chr> "Blank", "NTC_10", "1", "10", "100", …
$ target_name <fct> Blank, B1_NTC, 10, 10, 10, 10, 10, Bl…
$ ct <dbl> NA, 24.986382, 4.538310, 7.290561, 10…
$ ct_mean <chr> NA, "25.81987190246582", "4.574169158…
$ ct_sd <chr> NA, "0.73780524730682373", "0.0433460…
$ automatic_ct_threshold <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ ct_threshold <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.…
$ automatic_baseline <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ baseline_start <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ baseline_end <int> 39, 21, 3, 4, 6, 10, 12, 16, 19, 3, 4…
$ amp_status <chr> "Inconclusive", "Amp", "Amp", "Amp", …
$ comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cq_conf <dbl> 0.0000000, 0.9817477, 0.9352701, 0.95…
$ cqconf <chr> "Y", "N", "N", "N", "N", "N", "N", "Y…
$ expfail <chr> "Y", "N", "N", "N", "N", "N", "N", "N…
$ highsd <chr> "N", "Y", "N", "N", "N", "N", "N", "N…
$ spike <chr> "N", "N", "N", "N", "N", "N", "N", "Y…
$ dilution_factor <dbl> NA, NA, 1, 10, 100, 1000, 10000, NA, …
# A tibble: 7 × 3
sample_name dilution_factor n
<chr> <dbl> <int>
1 1 1 3
2 1,000 1000 3
3 10 10 3
4 10,000 10000 3
5 100 100 3
6 Blank NA 3
7 NTC_10 NA 3
Check the threshold settings,
# A tibble: 1 × 3
automatic_ct_threshold ct_threshold n
<lgl> <dbl> <int>
1 FALSE 0.2 21
ct_threshold = 0.2
Check the well breakdown by sample name,
Load the amplification data — the relative florescence values (Rn and Delta Rn) — and remove the unused wells (based on those with no sample name).
For future reference, the line at which the data starts can vary between files; here it is 43, but in a previous experiment it was 40.
amp <- data_file %>%
read_qpcr_amplification(start = 43, results = TRUE) %>%
# Remove some unused columns
select(-c(omit, task, reporter, quencher, starts_with('quantity'),
y_intercept:efficiency, target_color)) %>%
filter(!is.na(sample_name)) %>%
mutate(
dilution_factor = str_replace(sample_name, ',', '') %>% as.numeric,
) %>%
glimpse
Rows: 840
Columns: 23
$ well <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ well_position <chr> "A1", "A1", "A1", "A1", "A1", "A1", "…
$ cycle <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…
$ target_name <chr> "Blank", "Blank", "Blank", "Blank", "…
$ rn <dbl> 3.925871, 3.929747, 3.918463, 3.89655…
$ delta_rn <dbl> 0.136036664, 0.136512622, 0.121829681…
$ sample_name <chr> "Blank", "Blank", "Blank", "Blank", "…
$ ct <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ ct_mean <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ ct_sd <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ automatic_ct_threshold <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ ct_threshold <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.…
$ automatic_baseline <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ baseline_start <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ baseline_end <int> 39, 39, 39, 39, 39, 39, 39, 39, 39, 3…
$ amp_status <chr> "Inconclusive", "Inconclusive", "Inco…
$ comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cq_conf <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cqconf <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y…
$ expfail <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y…
$ highsd <chr> "N", "N", "N", "N", "N", "N", "N", "N…
$ spike <chr> "N", "N", "N", "N", "N", "N", "N", "N…
$ dilution_factor <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
And get the baseline coordinates for plotting,
Rows: 42
Columns: 22
$ well <int> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7…
$ well_position <chr> "A1", "A1", "A2", "A2", "A3", "A3", "…
$ sample_name <chr> "Blank", "Blank", "NTC_10", "NTC_10",…
$ target_name <fct> Blank, Blank, B1_NTC, B1_NTC, 10, 10,…
$ ct <dbl> NA, NA, 24.986382, 24.986382, 4.53831…
$ ct_mean <chr> NA, NA, "25.81987190246582", "25.8198…
$ ct_sd <chr> NA, NA, "0.73780524730682373", "0.737…
$ automatic_ct_threshold <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ ct_threshold <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.…
$ automatic_baseline <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ amp_status <chr> "Inconclusive", "Inconclusive", "Amp"…
$ comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cq_conf <dbl> 0.0000000, 0.0000000, 0.9817477, 0.98…
$ cqconf <chr> "Y", "Y", "N", "N", "N", "N", "N", "N…
$ expfail <chr> "Y", "Y", "N", "N", "N", "N", "N", "N…
$ highsd <chr> "N", "N", "Y", "Y", "N", "N", "N", "N…
$ spike <chr> "N", "N", "N", "N", "N", "N", "N", "N…
$ dilution_factor <dbl> NA, NA, NA, NA, 1, 1, 10, 10, 100, 10…
$ baseline_boundary <chr> "start", "end", "start", "end", "star…
$ cycle <int> 3, 39, 3, 21, 3, 3, 3, 4, 3, 6, 3, 10…
$ rn <dbl> 3.918463, 4.017810, 6.315196, 6.36331…
$ delta_rn <dbl> 0.1218296811, 0.0988022089, -0.000196…
Plot with the baselines,
delta_rn_min <- 1e-3
p1 <- amp %>%
filter(!is.na(target_name)) %>%
ggplot(aes(cycle, pmax(delta_rn, delta_rn_min), color = target_name)) +
geom_line(aes(group = well)) +
geom_hline(yintercept = ct_threshold, alpha = 0.3) +
scale_color_brewer(type = 'qual') +
geom_point(data = baselines, aes(shape = baseline_boundary), size = 3) +
scale_shape_manual(values = c(1, 4)) +
labs(y = 'Delta Rn', x = 'Cycle', color = 'Target')
p2 <- p1 +
scale_y_log10()
p1 / p2
results %>%
mutate(rel_conc = 1 / dilution_factor) %>%
ggplot(aes(rel_conc, ct)) +
labs(x = 'Concentration relative to stock') +
scale_x_log10() +
stat_smooth(method = 'lm') +
geom_point()
Examine the linear fit,
Call:
lm(formula = ct ~ rel_conc_log10, data = results1)
Residuals:
Min 1Q Median 3Q Max
-0.43246 -0.16149 0.03244 0.14476 0.44235
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.43792 0.10797 41.10 3.77e-15 ***
rel_conc_log10 -3.28510 0.04408 -74.53 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2414 on 13 degrees of freedom
(6 observations deleted due to missingness)
Multiple R-squared: 0.9977, Adjusted R-squared: 0.9975
F-statistic: 5554 on 1 and 13 DF, p-value: < 2.2e-16
Estimate the efficiency,
rel_conc_log10
1.0156
stan_fit <- stan_glm(
ct ~ rel_conc_log10,
data = results1,
)
stan_fit
stan_glm
family: gaussian [identity]
formula: ct ~ rel_conc_log10
observations: 15
predictors: 2
------
Median MAD_SD
(Intercept) 4.4 0.1
rel_conc_log10 -3.3 0.0
Auxiliary parameter(s):
Median MAD_SD
sigma 0.3 0.1
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
Use the posterior samples of the slope param to get the posterior of the efficiency,
slope_post <- rstan::extract(stan_fit$stanfit)$beta
efficiency_post <- 10^(-1/slope_post) - 1
efficiency_post %>% median_hdci(.width = 0.9)
y ymin ymax .width .point .interval
1 1.015849 0.9779378 1.04701 0.9 median hdci
efficiency_post %>% qplot +
labs(x = 'Efficiency', y = 'Posterior density') +
geom_vline(xintercept = median(efficiency_post), color = 'darkred')
The median is slighty larger than 1 and a value of 1 or less is quite plausible. However, it would be useful to think about whether my concern with the undiluted sample above would tend to cause the efficiency estimate to increase or decrease.
notes
In the actual experiment, we used a dilution factor of 20000 for the spike-in stock. Of this spike-in stock, 100 ul was added to each 40 mL sample (an additional 400-fold dilution), for a total dilution factor of 8e6 in the experiment samples relative to the initial stock.
The sample processing added some concentration: On the order of 40 mL of sample was used (with little loss), and eluted into 80 uL, for a concentration factor of ~500. Hence the final elutant is only diluted by a factor of 1.6e4 relative to the initial stock.
What are the corresponding Ct values?
1 2 3
18.56548 27.11276 18.24715
Therefore we expect Ct values of around 18.6 for the spike-in stock, 27.1 for the spiked sample, and 18.2 for the extracted NA solution.