Description Usage Arguments Details Value Direction and magnitude of the log-fold change Blocking on uninteresting factors Regressing out unwanted factors Author(s) References See Also Examples

Perform pairwise Welch t-tests between groups of cells, possibly after blocking on uninteresting factors of variation.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ```
pairwiseTTests(x, ...)
## S4 method for signature 'ANY'
pairwiseTTests(
x,
groups,
block = NULL,
design = NULL,
restrict = NULL,
exclude = NULL,
direction = c("any", "up", "down"),
lfc = 0,
std.lfc = FALSE,
log.p = FALSE,
gene.names = NULL,
subset.row = NULL,
BPPARAM = SerialParam()
)
## S4 method for signature 'SummarizedExperiment'
pairwiseTTests(x, ..., assay.type = "logcounts")
## S4 method for signature 'SingleCellExperiment'
pairwiseTTests(x, groups = colLabels(x, onAbsence = "error"), ...)
``` |

`x` |
A numeric matrix-like object of normalized log-expression values, where each column corresponds to a cell and each row corresponds to an endogenous gene. Alternatively, a SummarizedExperiment or SingleCellExperiment object containing such a matrix. |

`...` |
For the generic, further arguments to pass to specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. For the SingleCellExperiment method, further arguments to pass to the SummarizedExperiment method. |

`groups` |
A vector of length equal to |

`block` |
A factor specifying the blocking level for each cell. |

`design` |
A numeric matrix containing blocking terms for uninteresting factors.
Note that these factors should not be confounded with |

`restrict` |
A vector specifying the levels of |

`exclude` |
A vector specifying the levels of |

`direction` |
A string specifying the direction of log-fold changes to be considered in the alternative hypothesis. |

`lfc` |
A positive numeric scalar specifying the log-fold change threshold to be tested against. |

`std.lfc` |
A logical scalar indicating whether log-fold changes should be standardized. |

`log.p` |
A logical scalar indicating if log-transformed p-values/FDRs should be returned. |

`gene.names` |
Deprecated, use |

`subset.row` |
See |

`BPPARAM` |
A BiocParallelParam object indicating whether and how parallelization should be performed across genes. |

`assay.type` |
A string specifying which assay values to use, usually |

This function performs t-tests to identify differentially expressed genes (DEGs) between pairs of groups of cells.
A typical aim is to use the DEGs to determine cluster identity based on expression of marker genes with known biological activity.
A list of tables is returned where each table contains per-gene statistics for a comparison between one pair of groups.
This can be examined directly or used as input to `combineMarkers`

for marker gene detection.

We use t-tests as they are simple, fast and perform reasonably well for single-cell data (Soneson and Robinson, 2018). However, if one of the groups contains fewer than two cells, no p-value will be reported for comparisons involving that group. A warning will also be raised about insufficient degrees of freedom (d.f.) in such cases.

When `log.p=TRUE`

, the log-transformed p-values and FDRs are reported using the natural base.
This is useful in cases with many cells such that reporting the p-values directly would lead to double-precision underflow.

If `restrict`

is specified, comparisons are only performed between pairs of groups in `restrict`

.
This can be used to focus on DEGs distinguishing between a subset of the groups (e.g., closely related cell subtypes).

If `exclude`

is specified, comparisons are not performed between groups in `exclude`

.
Similarly, if any entries of `groups`

are `NA`

, the corresponding cells are are ignored.

A list is returned containing `statistics`

and `pairs`

.

The `statistics`

element is itself a list of DataFrames.
Each DataFrame contains the statistics for a comparison between a pair of groups,
including the log-fold changes, p-values and false discovery rates.

The `pairs`

element is a DataFrame where each row corresponds to an entry of `statistics`

.
This contains the fields `first`

and `second`

,
specifying the two groups under comparison in the corresponding DataFrame in `statistics`

.

In each DataFrame in `statistics`

, the log-fold change represents the change in the `first`

group compared to the `second`

group.

Log-fold changes are reported as differences in the values of `x`

.
Thus, all log-fold changes have the same base as whatever was used to perform the log-transformation in `x`

.
If `logNormCounts`

was used, this would be base 2.

If `direction="any"`

, two-sided tests will be performed for each pairwise comparisons between groups.
Otherwise, one-sided tests in the specified direction will be used instead.
This can be used to focus on genes that are upregulated in each group of interest, which is often easier to interpret when assigning cell type to a cluster.

To interpret the setting of `direction`

, consider the DataFrame for group X, in which we are comparing to another group Y.
If `direction="up"`

, genes will only be significant in this DataFrame if they are upregulated in group X compared to Y.
If `direction="down"`

, genes will only be significant if they are downregulated in group X compared to Y.

The magnitude of the log-fold changes can also be tested by setting `lfc`

.
By default, `lfc=0`

meaning that we will reject the null upon detecting any differential expression.
If this is set to some other positive value, the null hypothesis will change depending on `direction`

:

If

`direction="any"`

, the null hypothesis is that the true log-fold change is either`-lfc`

or`lfc`

with equal probability. A two-sided p-value is computed against this composite null.If

`direction="up"`

, the null hypothesis is that the true log-fold change is`lfc`

, and a one-sided p-value is computed.If

`direction="down"`

, the null hypothesis is that the true log-fold change is`-lfc`

, and a one-sided p-value is computed.

This is similar to the approach used in `treat`

and allows users to focus on genes with strong log-fold changes.

If `std.lfc=TRUE`

, the log-fold change for each gene is standardized by the variance.
When the Welch t-test is being used, this is equivalent to Cohen's d.
Standardized log-fold changes may be more appealing for visualization as it avoids large fold changes due to large variance.
The choice of `std.lfc`

does not affect the calculation of the p-values.

If `block`

is specified, Welch t-tests are performed between groups within each level of `block`

.
For each pair of groups, the p-values for each gene across all levels of `block`

are combined using Stouffer's weighted Z-score method.
The reported log-fold change for each gene is also a weighted average of log-fold changes across levels.

The weight for a particular level is defined as *(1/N_x + 1/N_y)^{-1}*,
where *Nx* and *Ny* are the number of cells in groups X and Y, respectively, for that level.
This is inversely proportional to the expected variance of the log-fold change, provided that all groups and blocking levels have the same variance.

When comparing two groups, blocking levels are ignored if no p-value was reported, e.g., if there were insufficient cells for a group in a particular level.
This includes levels that contain fewer than two cells for either group, as this cannot yield a p-value from the Welch t-test.
If all levels are ignored in this manner, the entire comparison will only contain `NA`

p-values and a warning will be emitted.

If `design`

is specified, a linear model is instead fitted to the expression profile for each gene.
This linear model will include the `groups`

as well as any blocking factors in `design`

.
A t-test is then performed to identify DEGs between pairs of groups, using the values of the relevant coefficients and the gene-wise residual variance.

Note that `design`

must be full rank when combined with the `groups`

terms,
i.e., there should not be any confounding variables.
We make an exception for the common situation where `design`

contains an `"(Intercept)"`

column,
which is automatically detected and removed (emitting a warning along the way).

We recommend using `block`

instead of `design`

for uninteresting categorical factors of variation.
The former accommodates differences in the variance of expression in each group via Welch's t-test.
As a result, it is more robust to misspecification of the groups, as misspecified groups (and inflated variances) do not affect the inferences for other groups.
Use of `block`

also avoids assuming additivity of effects between the blocking factors and the groups.

Nonetheless, use of `design`

is unavoidable when blocking on real-valued covariates.
It is also useful for ensuring that log-fold changes/p-values are computed for comparisons between all pairs of groups
(assuming that `design`

is not confounded with the groups).
This may not be the case with `block`

if a pair of groups never co-occur in a single blocking level.

Aaron Lun

Whitlock MC (2005).
Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach.
*J. Evol. Biol.* 18, 5:1368-73.

Soneson C and Robinson MD (2018).
Bias, robustness and scalability in single-cell differential expression analysis.
*Nat. Methods*

Lun ATL (2018).
Comments on marker detection in *scran*.
https://ltla.github.io/SingleCellThoughts/software/marker_detection/comments.html

`t.test`

, on which this function is based.

`combineMarkers`

, to combine pairwise comparisons into a single DataFrame per group.

`getTopMarkers`

, to obtain the top markers from each pairwise comparison.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ```
library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)
# Any clustering method is okay.
kout <- kmeans(t(logcounts(sce)), centers=3)
# Vanilla application:
out <- pairwiseTTests(logcounts(sce), groups=kout$cluster)
out
# Directional with log-fold change threshold:
out <- pairwiseTTests(logcounts(sce), groups=kout$cluster,
direction="up", lfc=0.2)
out
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.