library(groupedHyperframe)3 Grouping ppp-Hypercolumn
These packages (Note 1) are a one-person project undergoing rapid evolution. Backward compatibility (per Hadley Wickham) is provided as a courtesy rather than a guarantee.
Until further notice, these packages should
- not be used as a basis for research grant applications,
- not be cited as an actively maintained tool in a peer-reviewed manuscript,
- not be used to support or fulfill requirements for pursuing an academic degree.
In addition, work primarily based on these packages (Note 1) should not be presented at academic conferences or similar scholarly venues.
Furthermore, a person’s ability to use these packages (Note 1) does not necessarily imply an understanding of their underlying mechanisms. Accordingly, demonstration of their use alone should not be considered sufficient evidence of expertise, nor should it be credited as a basis for academic promotion or advancement.
These statements do not apply to the contributors (Tip 1) to these packages (Note 1) with respect to their specific contributions.
These statements do not apply when the maintainer of these packages (Note 1), Tingting Zhan, is credited as the first author, the lead author, and/or the corresponding author in a peer-reviewed manuscript, or as the Principal Investigator or Co-Principal Investigator in a research grant application and/or a final research progress report.
These statements are advisory in nature and do not modify or restrict the rights granted under the GNU General Public License https://www.r-project.org/Licenses/.
The examples in Chapter 3 require
In Chapter 3, the author
- creates a grouped hyper data frame with one-and-only-one point-pattern (
ppp) hypercolumn (Section 3.1); - discusses the spatial point-pattern analyses applicable to the point-pattern hypercolumn(s) of a (grouped) hyper data frame (Section 3.2);
- summarizes the outcome(s) (Section 3.3)
- aggregates the summary statistics over the nested grouping structure (Section 3.4).
3.1 Creation
Listing 3.1 creates a grouped hyper data frame s with one-and-only-one point-pattern (ppp, Chapter 36) hypercolumn from the data frame wrobel_lung. This process (Chapter 45)
- takes a data frame
wrobel_lungas input; - creates a point-pattern hypercolumn
ppp.from the \(x\)- and \(y\)-coordinates, the numeric markhladrand the multi-type markphenotype, perimage_idnested withinpatient_id; - aggregates other variables of interest, e.g.,
OS,genderandage, at the level ofimage_idnested withinpatient_id. Those variables must be identical within the nested grouping structure~patient_id/image_id; - returns a grouped hyper data frame
s.
ppp-hypercolumn
s = wrobel_lung |>
grouped_ppp(
formula = hladr + phenotype ~ OS + gender + age,
by = ~ patient_id/image_id
)Readers may view the grouped hyper data frame s (Listing 3.1) by simply typing s at the R console prompt and pressing Enter (Listing 3.2),
s (Listing 3.1)
s
# Grouped Hyper Data Frame: ~patient_id
#
# 3 patient_id
#
# OS gender age patient_id image_id ppp.
# 1 176 M 84 #03 2-080-378 [36953,13765].im3 (ppp)
# 2 176 M 84 #03 2-080-378 [39206,15250].im3 (ppp)
# 3 176 M 84 #03 2-080-378 [40242,17359].im3 (ppp)
# 4 176 M 84 #03 2-080-378 [40863,16444].im3 (ppp)
# 5 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp)
# ✂️ --- output truncated --- ✂️Also, readers may view the summary information of the grouped hyper data frame s (Listing 3.1) using the function summary() (Listing 3.3),
s (Listing 3.1)
s |>
summary()
# Grouped Hyper Data Frame: ~patient_id
#
# 3 patient_id
#
# OS gender age patient_id image_id ppp.
# (Surv) (factor) (numeric) (factor) (factor) (ppp)
# <time-to-event> :(Surv) F: 5 Min. :66.00 #01 0-889-121:5 [36953,13765].im3:1
# [right-censored]:5 M:10 1st Qu.:66.00 #02 1-037-393:5 [39206,15250].im3:1
# [observed] :10 Median :84.00 #03 2-080-378:5 [40242,17359].im3:1
# Mean :78.33 [40863,16444].im3:1
# 3rd Qu.:85.00 [40864,18015].im3:1
# Max. :85.00 [41191,13764].im3:1
# (Other) :9Readers must note that Chapter 2 and Section 3.1 describe two independent approaches to
- create a grouped hyper data frame, from a data frame (Chapter 2, Listing 2.2, Section 18.1);
- create a grouped hyper data frame with one-and-only-one point-pattern hypercolumn, from a data frame (Section 3.1, Listing 3.1, Chapter 45).
These two approaches are independent and unrelated to each other (Section 57.1).
3.2 Function-Value-Table on Eligible Marks
Listing 3.4 applies multiple functions to the eligible marks in each point-pattern of the point-pattern hypercolumn s$ppp. in the grouped hyper data frame s (Listing 3.1),
- the conditional mean \(E(r)\) of the numeric mark
hladrusing the functionEmark_()(Table 36.22). The results are stored in the function-value-table (fv, Chapter 20) hypercolumn (Chapter 21)$hladr.Eof the output grouped hyper data frame; - the multi-type nearest-neighbor distance \(G_{\text{CK+.CD8- to CK-.CD8+}}(r)\) of the multi-type mark
phenotypeusing the functionGcross_()(Table 36.24). The results are stored in the function-value-table hypercolumn$phenotype.Gof the output grouped hyper data frame; - the nearest neighbor distance from
CK+.CD8-toCK-.CD8+marks in the multi-type markphenotypeusing the functionnncross_()(Table 36.25). The results are stored in the numeric-hypercolumn$phenotype.nncof the output grouped hyper data frame.
Listing 3.4 substitutes the recommended function values outside the recommended range with the corresponding theoretical values using the function .disrecommend2theo() (Section 20.5.1). The function-value-table hypercolumns $hladr.E and $phenotype.G of the output are replaced with the substituted function-value-table hypercolumns.
r = seq.int(from = 0, to = 250, by = 10)
out = s |>
within(expr = {
hladr.E = ppp. |>
Emark_(r = r, correction = 'none') |>
.disrecommend2theo()
phenotype.G = ppp. |>
Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'none') |>
.disrecommend2theo()
phenotype.nnc = ppp. |>
nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'none')
})
out
# Grouped Hyper Data Frame: ~patient_id
#
# 3 patient_id
#
# OS gender age patient_id image_id ppp. hladr.E phenotype.G phenotype.nnc
# 1 176 M 84 #03 2-080-378 [36953,13765].im3 (ppp) (fv) (fv) (numeric)
# 2 176 M 84 #03 2-080-378 [39206,15250].im3 (ppp) (fv) (fv) (numeric)
# 3 176 M 84 #03 2-080-378 [40242,17359].im3 (ppp) (fv) (fv) (numeric)
# 4 176 M 84 #03 2-080-378 [40863,16444].im3 (ppp) (fv) (fv) (numeric)
# 5 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp) (fv) (fv) (numeric)
# ✂️ --- output truncated --- ✂️3.3 Summarization
3.3.1 of Statistics of Point-Pattern Marks
Listing 3.5 summarizes various customized statistics of the numeric- and/or multi-type marks of the point-pattern hypercolumn s$ppp. in the grouped hyper data frame s (Listing 3.1) using the function aggregate_marks(). The results are stored in the numeric-hypercolumn $markstats of the output (Listing 3.5, Listing 3.6).
s_markstat = s |>
within(expr = {
markstats = ppp. |>
aggregate_marks(by = hladr ~ phenotype, FUN = \(z) {
c(mean = mean(z), sd = sd(z))
}, vectorize = TRUE)
})
s_markstat
# Grouped Hyper Data Frame: ~patient_id
#
# 3 patient_id
#
# OS gender age patient_id image_id ppp. markstats
# 1 176 M 84 #03 2-080-378 [36953,13765].im3 (ppp) (numeric)
# 2 176 M 84 #03 2-080-378 [39206,15250].im3 (ppp) (numeric)
# 3 176 M 84 #03 2-080-378 [40242,17359].im3 (ppp) (numeric)
# 4 176 M 84 #03 2-080-378 [40863,16444].im3 (ppp) (numeric)
# 5 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp) (numeric)
# ✂️ --- output truncated --- ✂️s_markstat$markstats: summarizing customized statistics of point-pattern marks (Listing 3.5)
s_markstat$markstats
# 1:
# CK-.CD8-.hladr.mean CK-.CD8-.hladr.sd CK+.CD8-.hladr.mean CK+.CD8-.hladr.sd CK-.CD8+.hladr.mean CK-.CD8+.hladr.sd
# 0.19611248 0.09574274 0.13157655 0.02220862 0.37042708 0.15887763
#
# 2:
# CK-.CD8-.hladr.mean CK-.CD8-.hladr.sd CK+.CD8-.hladr.mean CK+.CD8-.hladr.sd CK-.CD8+.hladr.mean CK-.CD8+.hladr.sd
# 0.32685110 0.26761137 0.11682620 0.04559459 0.56401579 0.26288830
#
# ✂️ --- output truncated --- ✂️3.3.2 of fv-Hypercolumns
Listing 3.7 summarizes the function-value-table (fv, Chapter 20) hypercolumns (Chapter 21) out$hladr.E and out$phenotype.G from the batch processes (Listing 3.4),
- by the recommended function values using the function
keyval()(Section 20.2). The results are stored in the numeric-hypercolumns$hladr.Eyand$phenotype.Gyof the output grouped hyper data frame; - by the cumulative average vertical height of the trapezoidal integration of the recommended function values using the function
cumvtrapz()(Section 11.2). The results are stored in the numeric-hypercolumns$hladr.E.cumvand$phenotype.G.cumof the output grouped hyper data frame.
out_fv = out |>
within(expr = {
hladr.Ey = keyval(hladr.E)
phenotype.Gy = keyval(phenotype.G)
hladr.E.cumv = cumvtrapz(hladr.E, drop = TRUE)
phenotype.G.cumv = cumvtrapz(phenotype.G, drop = TRUE)
})
out_fv
# Grouped Hyper Data Frame: ~patient_id
#
# 3 patient_id
#
# OS gender age patient_id image_id ppp. hladr.E phenotype.G phenotype.nnc hladr.Ey phenotype.Gy hladr.E.cumv phenotype.G.cumv
# 1 176 M 84 #03 2-080-378 [36953,13765].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric) (numeric) (numeric)
# 2 176 M 84 #03 2-080-378 [39206,15250].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric) (numeric) (numeric)
# 3 176 M 84 #03 2-080-378 [40242,17359].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric) (numeric) (numeric)
# 4 176 M 84 #03 2-080-378 [40863,16444].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric) (numeric) (numeric)
# 5 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric) (numeric) (numeric)
# ✂️ --- output truncated --- ✂️3.3.3 of Quantiles
Listing 3.8 inspects the hypercolumns of the input grouped hyper data frame out (Listing 3.4) and finds the quantiles of,
- the numeric-hypercolumn
out$phenotype.nnc(Listing 3.4). The results are stored in the numeric-hypercolumn$phenotype.nnc.qof the output grouped hyper data frame; - the numeric mark
hladrin the point-pattern hypercolumnout$ppp.(Section 3.1, Listing 3.1). The results are stored in the numeric-hypercolumn$hladr.qof the output grouped hyper data frame.
out_q = out |>
within(expr = {
hladr.q = quantile(ppp., probs = seq.int(from = 0, to = 1, by = .1))
phenotype.nnc.q = quantile(phenotype.nnc, probs = seq.int(from = 0, to = 1, by = .1))
})
out_q
# Grouped Hyper Data Frame: ~patient_id
#
# 3 patient_id
#
# OS gender age patient_id image_id ppp. hladr.E phenotype.G phenotype.nnc hladr.q phenotype.nnc.q
# 1 176 M 84 #03 2-080-378 [36953,13765].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# 2 176 M 84 #03 2-080-378 [39206,15250].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# 3 176 M 84 #03 2-080-378 [40242,17359].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# 4 176 M 84 #03 2-080-378 [40863,16444].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# 5 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# ✂️ --- output truncated --- ✂️3.3.4 of Kernel Density Estimates
Listing 3.9 inspects the hypercolumns of the input grouped hyper data frame out (Listing 3.4) and finds the kernel density estimates of,
- the numeric-hypercolumn
out$phenotype.nnc(Listing 3.4). The results are stored in the numeric-hypercolumn$phenotype.nnc.kernof the output grouped hyper data frame; - the numeric mark
hladrin the point-pattern hypercolumnout$ppp.(Section 3.1, Listing 3.1). The results are stored in the numeric-hypercolumn$hladr.kernof the output grouped hyper data frame.
mdist = out$phenotype.nnc |> unlist() |> max()
out_k = out |>
within(expr = {
phenotype.nnc.kern = phenotype.nnc |>
kerndens(from = 0, to = mdist)
hladr.kern = ppp. |>
kerndens(from = 0, to = mdist)
})
out_k
# Grouped Hyper Data Frame: ~patient_id
#
# 3 patient_id
#
# OS gender age patient_id image_id ppp. hladr.E phenotype.G phenotype.nnc phenotype.nnc.kern hladr.kern
# 1 176 M 84 #03 2-080-378 [36953,13765].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# 2 176 M 84 #03 2-080-378 [39206,15250].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# 3 176 M 84 #03 2-080-378 [40242,17359].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# 4 176 M 84 #03 2-080-378 [40863,16444].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# 5 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp) (fv) (fv) (numeric) (numeric) (numeric)
# ✂️ --- output truncated --- ✂️3.4 Aggregation
Listing 3.10 aggregates (Section 26.4) the customized statistics s_markstat$markstats (Listing 3.6) of the numeric- and/or multi-type marks of the point-pattern hypercolumn by patient_id using point-wise means pmeans(), and returns a hyper data frame.
s_markstat |>
aggregate() |>
within(expr = {
markstats = markstats |> do.call(what = pmean)
})
# Variable(s) image_id removed from aggregation
# Hyperframe:
# OS gender age patient_id ppp. markstats
# 1 3488+ F 85 #01 0-889-121 (ppplist) (numeric)
# 2 1605 M 66 #02 1-037-393 (ppplist) (numeric)
# 3 176 M 84 #03 2-080-378 (ppplist) (numeric)Listing 3.11 aggregates (Section 26.4) the summarized information from function-value-table hypercolumns (Listing 3.7) by patient_id using point-wise means pmeans(), and returns a hyper data frame.
out_fv |>
aggregate() |>
within(expr = {
hladr.Ey = hladr.Ey |> do.call(what = pmean)
phenotype.Gy = phenotype.Gy |> do.call(what = pmean)
hladr.E.cumv = hladr.E.cumv |> do.call(what = pmean)
phenotype.G.cumv = phenotype.G.cumv |> do.call(what = pmean)
})
# Variable(s) image_id removed from aggregation
# Hyperframe:
# OS gender age patient_id ppp. hladr.E phenotype.G phenotype.nnc hladr.Ey phenotype.Gy hladr.E.cumv phenotype.G.cumv
# 1 3488+ F 85 #01 0-889-121 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric) (numeric) (numeric)
# 2 1605 M 66 #02 1-037-393 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric) (numeric) (numeric)
# 3 176 M 84 #03 2-080-378 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric) (numeric) (numeric)Listing 3.12 aggregates (Section 26.4) the quantiles from the numeric-hypercolumns or the numeric marks of the point-pattern hypercolumn (Listing 3.8) by patient_id using point-wise means pmeans(), and returns a hyper data frame.
out_q |>
aggregate() |>
within(expr = {
hladr.q = hladr.q |> do.call(what = pmean)
phenotype.nnc.q = phenotype.nnc.q |> do.call(what = pmean)
})
# Variable(s) image_id removed from aggregation
# Hyperframe:
# OS gender age patient_id ppp. hladr.E phenotype.G phenotype.nnc hladr.q phenotype.nnc.q
# 1 3488+ F 85 #01 0-889-121 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric)
# 2 1605 M 66 #02 1-037-393 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric)
# 3 176 M 84 #03 2-080-378 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric)Listing 3.13 aggregates (Section 26.4) the kernel density estimates from the numeric-hypercolumns or the numeric marks of the point-pattern hypercolumn (Listing 3.9) by patient_id using point-wise means pmeans(), and returns a hyper data frame.
out_k |>
aggregate() |>
within(expr = {
hladr.kern = hladr.kern |> do.call(what = pmean)
phenotype.nnc.kern = phenotype.nnc.kern |> do.call(what = pmean)
})
# Variable(s) image_id removed from aggregation
# Hyperframe:
# OS gender age patient_id ppp. hladr.E phenotype.G phenotype.nnc phenotype.nnc.kern hladr.kern
# 1 3488+ F 85 #01 0-889-121 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric)
# 2 1605 M 66 #02 1-037-393 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric)
# 3 176 M 84 #03 2-080-378 (ppplist) (anylist) (anylist) (anylist) (numeric) (numeric)