3  Grouping ppp-Hypercolumn

ImportantDisclaimer

These packages (Note 1) are a one-person project undergoing rapid evolution. Backward compatibility (per Hadley Wickham) is provided as a courtesy rather than a guarantee.

Until further notice, these packages should

  • not be used as a basis for research grant applications,
  • not be cited as an actively maintained tool in a peer-reviewed manuscript,
  • not be used to support or fulfill requirements for pursuing an academic degree.

In addition, work primarily based on these packages (Note 1) should not be presented at academic conferences or similar scholarly venues.

Furthermore, a person’s ability to use these packages (Note 1) does not necessarily imply an understanding of their underlying mechanisms. Accordingly, demonstration of their use alone should not be considered sufficient evidence of expertise, nor should it be credited as a basis for academic promotion or advancement.

These statements do not apply to the contributors (Tip 1) to these packages (Note 1) with respect to their specific contributions.

These statements do not apply when the maintainer of these packages (Note 1), Tingting Zhan, is credited as the first author, the lead author, and/or the corresponding author in a peer-reviewed manuscript, or as the Principal Investigator or Co-Principal Investigator in a research grant application and/or a final research progress report.

These statements are advisory in nature and do not modify or restrict the rights granted under the GNU General Public License https://www.r-project.org/Licenses/.

Note

The examples in Chapter 3 require

library(groupedHyperframe)

In Chapter 3, the author

3.1 Creation

Listing 3.1 creates a grouped hyper data frame s with one-and-only-one point-pattern (ppp, Chapter 36) hypercolumn from the data frame wrobel_lung. This process (Chapter 45)

  • takes a data frame wrobel_lung as input;
  • creates a point-pattern hypercolumn ppp. from the \(x\)- and \(y\)-coordinates, the numeric mark hladr and the multi-type mark phenotype, per image_id nested within patient_id;
  • aggregates other variables of interest, e.g., OS, gender and age, at the level of image_id nested within patient_id. Those variables must be identical within the nested grouping structure ~patient_id/image_id;
  • returns a grouped hyper data frame s.
Listing 3.1: Create a grouped hyper data frame with one-and-only-one ppp-hypercolumn
s = wrobel_lung |>
   grouped_ppp(
     formula = hladr + phenotype ~ OS + gender + age, 
     by = ~ patient_id/image_id
   )

Readers may view the grouped hyper data frame s (Listing 3.1) by simply typing s at the R console prompt and pressing Enter (Listing 3.2),

Listing 3.2: View grouped hyper data frame s (Listing 3.1)
s
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#       OS gender age    patient_id          image_id  ppp.
# 1    176      M  84 #03 2-080-378 [36953,13765].im3 (ppp)
# 2    176      M  84 #03 2-080-378 [39206,15250].im3 (ppp)
# 3    176      M  84 #03 2-080-378 [40242,17359].im3 (ppp)
# 4    176      M  84 #03 2-080-378 [40863,16444].im3 (ppp)
# 5  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)
# ✂️ --- output truncated --- ✂️

Also, readers may view the summary information of the grouped hyper data frame s (Listing 3.1) using the function summary() (Listing 3.3),

Listing 3.3: Summarized information of grouped hyper data frame s (Listing 3.1)
s |> 
  summary()
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#                 OS         gender        age                patient_id              image_id ppp. 
#             (Surv)         (factor) (numeric)                 (factor)              (factor) (ppp)
#  <time-to-event> :(Surv)   F: 5     Min.   :66.00   #01 0-889-121:5    [36953,13765].im3:1        
#  [right-censored]:5        M:10     1st Qu.:66.00   #02 1-037-393:5    [39206,15250].im3:1        
#  [observed]      :10                Median :84.00   #03 2-080-378:5    [40242,17359].im3:1        
#                                     Mean   :78.33                      [40863,16444].im3:1        
#                                     3rd Qu.:85.00                      [40864,18015].im3:1        
#                                     Max.   :85.00                      [41191,13764].im3:1        
#                                                                        (Other)          :9

Readers must note that Chapter 2 and Section 3.1 describe two independent approaches to

  1. create a grouped hyper data frame, from a data frame (Chapter 2, Listing 2.2, Section 18.1);
  2. create a grouped hyper data frame with one-and-only-one point-pattern hypercolumn, from a data frame (Section 3.1, Listing 3.1, Chapter 45).

These two approaches are independent and unrelated to each other (Section 57.1).

3.2 Function-Value-Table on Eligible Marks

Listing 3.4 applies multiple functions to the eligible marks in each point-pattern of the point-pattern hypercolumn s$ppp. in the grouped hyper data frame s (Listing 3.1),

  • the conditional mean \(E(r)\) of the numeric mark hladr using the function Emark_() (Table 36.22). The results are stored in the function-value-table (fv, Chapter 20) hypercolumn (Chapter 21) $hladr.E of the output grouped hyper data frame;
  • the multi-type nearest-neighbor distance \(G_{\text{CK+.CD8- to CK-.CD8+}}(r)\) of the multi-type mark phenotype using the function Gcross_() (Table 36.24). The results are stored in the function-value-table hypercolumn $phenotype.G of the output grouped hyper data frame;
  • the nearest neighbor distance from CK+.CD8- to CK-.CD8+ marks in the multi-type mark phenotype using the function nncross_() (Table 36.25). The results are stored in the numeric-hypercolumn $phenotype.nnc of the output grouped hyper data frame.

Listing 3.4 substitutes the recommended function values outside the recommended range with the corresponding theoretical values using the function .disrecommend2theo() (Section 20.5.1). The function-value-table hypercolumns $hladr.E and $phenotype.G of the output are replaced with the substituted function-value-table hypercolumns.

Listing 3.4: Operations on Eligible Marks (Listing 3.1)
r = seq.int(from = 0, to = 250, by = 10)
out = s |>
  within(expr = {
    hladr.E = ppp. |> 
      Emark_(r = r, correction = 'none') |>
      .disrecommend2theo()
    phenotype.G = ppp. |> 
      Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'none') |>
      .disrecommend2theo()
    phenotype.nnc = ppp. |>
      nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'none')
  })
out
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#       OS gender age    patient_id          image_id  ppp. hladr.E phenotype.G phenotype.nnc
# 1    176      M  84 #03 2-080-378 [36953,13765].im3 (ppp)    (fv)        (fv)     (numeric)
# 2    176      M  84 #03 2-080-378 [39206,15250].im3 (ppp)    (fv)        (fv)     (numeric)
# 3    176      M  84 #03 2-080-378 [40242,17359].im3 (ppp)    (fv)        (fv)     (numeric)
# 4    176      M  84 #03 2-080-378 [40863,16444].im3 (ppp)    (fv)        (fv)     (numeric)
# 5  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)    (fv)        (fv)     (numeric)
# ✂️ --- output truncated --- ✂️

3.3 Summarization

3.3.1 of Statistics of Point-Pattern Marks

Listing 3.5 summarizes various customized statistics of the numeric- and/or multi-type marks of the point-pattern hypercolumn s$ppp. in the grouped hyper data frame s (Listing 3.1) using the function aggregate_marks(). The results are stored in the numeric-hypercolumn $markstats of the output (Listing 3.5, Listing 3.6).

Listing 3.5: Summarizing customized statistics of point-pattern marks (Listing 3.1)
s_markstat = s |>
  within(expr = {
    markstats = ppp. |>
      aggregate_marks(by = hladr ~ phenotype, FUN = \(z) {
        c(mean = mean(z), sd = sd(z))
      }, vectorize = TRUE)
  })
s_markstat
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#       OS gender age    patient_id          image_id  ppp. markstats
# 1    176      M  84 #03 2-080-378 [36953,13765].im3 (ppp) (numeric)
# 2    176      M  84 #03 2-080-378 [39206,15250].im3 (ppp) (numeric)
# 3    176      M  84 #03 2-080-378 [40242,17359].im3 (ppp) (numeric)
# 4    176      M  84 #03 2-080-378 [40863,16444].im3 (ppp) (numeric)
# 5  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp) (numeric)
# ✂️ --- output truncated --- ✂️
Listing 3.6: Numeric-hypercolumn s_markstat$markstats: summarizing customized statistics of point-pattern marks (Listing 3.5)
s_markstat$markstats
# 1:
# CK-.CD8-.hladr.mean   CK-.CD8-.hladr.sd CK+.CD8-.hladr.mean   CK+.CD8-.hladr.sd CK-.CD8+.hladr.mean   CK-.CD8+.hladr.sd 
#          0.19611248          0.09574274          0.13157655          0.02220862          0.37042708          0.15887763 
# 
# 2:
# CK-.CD8-.hladr.mean   CK-.CD8-.hladr.sd CK+.CD8-.hladr.mean   CK+.CD8-.hladr.sd CK-.CD8+.hladr.mean   CK-.CD8+.hladr.sd 
#          0.32685110          0.26761137          0.11682620          0.04559459          0.56401579          0.26288830 
# 
# ✂️ --- output truncated --- ✂️

3.3.2 of fv-Hypercolumns

Listing 3.7 summarizes the function-value-table (fv, Chapter 20) hypercolumns (Chapter 21) out$hladr.E and out$phenotype.G from the batch processes (Listing 3.4),

  • by the recommended function values using the function keyval() (Section 20.2). The results are stored in the numeric-hypercolumns $hladr.Ey and $phenotype.Gy of the output grouped hyper data frame;
  • by the cumulative average vertical height of the trapezoidal integration of the recommended function values using the function cumvtrapz() (Section 11.2). The results are stored in the numeric-hypercolumns $hladr.E.cumv and $phenotype.G.cum of the output grouped hyper data frame.
Listing 3.7: Summarizing function-value-tables hypercolumns (Listing 3.4)
out_fv = out |>
  within(expr = {
    hladr.Ey = keyval(hladr.E)
    phenotype.Gy = keyval(phenotype.G)
    hladr.E.cumv = cumvtrapz(hladr.E, drop = TRUE)
    phenotype.G.cumv = cumvtrapz(phenotype.G, drop = TRUE)
  })
out_fv
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#       OS gender age    patient_id          image_id  ppp. hladr.E phenotype.G phenotype.nnc  hladr.Ey phenotype.Gy hladr.E.cumv phenotype.G.cumv
# 1    176      M  84 #03 2-080-378 [36953,13765].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)    (numeric)    (numeric)        (numeric)
# 2    176      M  84 #03 2-080-378 [39206,15250].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)    (numeric)    (numeric)        (numeric)
# 3    176      M  84 #03 2-080-378 [40242,17359].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)    (numeric)    (numeric)        (numeric)
# 4    176      M  84 #03 2-080-378 [40863,16444].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)    (numeric)    (numeric)        (numeric)
# 5  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)    (numeric)    (numeric)        (numeric)
# ✂️ --- output truncated --- ✂️

3.3.3 of Quantiles

Listing 3.8 inspects the hypercolumns of the input grouped hyper data frame out (Listing 3.4) and finds the quantiles of,

  • the numeric-hypercolumn out$phenotype.nnc (Listing 3.4). The results are stored in the numeric-hypercolumn $phenotype.nnc.q of the output grouped hyper data frame;
  • the numeric mark hladr in the point-pattern hypercolumn out$ppp. (Section 3.1, Listing 3.1). The results are stored in the numeric-hypercolumn $hladr.q of the output grouped hyper data frame.
Listing 3.8: Summarizing quantiles (Listing 3.4)
out_q = out |>
  within(expr = {
    hladr.q = quantile(ppp., probs = seq.int(from = 0, to = 1, by = .1))
    phenotype.nnc.q = quantile(phenotype.nnc, probs = seq.int(from = 0, to = 1, by = .1))
  })
out_q
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#       OS gender age    patient_id          image_id  ppp. hladr.E phenotype.G phenotype.nnc   hladr.q phenotype.nnc.q
# 1    176      M  84 #03 2-080-378 [36953,13765].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)       (numeric)
# 2    176      M  84 #03 2-080-378 [39206,15250].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)       (numeric)
# 3    176      M  84 #03 2-080-378 [40242,17359].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)       (numeric)
# 4    176      M  84 #03 2-080-378 [40863,16444].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)       (numeric)
# 5  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)    (fv)        (fv)     (numeric) (numeric)       (numeric)
# ✂️ --- output truncated --- ✂️

3.3.4 of Kernel Density Estimates

Listing 3.9 inspects the hypercolumns of the input grouped hyper data frame out (Listing 3.4) and finds the kernel density estimates of,

  • the numeric-hypercolumn out$phenotype.nnc (Listing 3.4). The results are stored in the numeric-hypercolumn $phenotype.nnc.kern of the output grouped hyper data frame;
  • the numeric mark hladr in the point-pattern hypercolumn out$ppp. (Section 3.1, Listing 3.1). The results are stored in the numeric-hypercolumn $hladr.kern of the output grouped hyper data frame.
Listing 3.9: Summarizing kernel density estimates (Listing 3.4)
mdist = out$phenotype.nnc |> unlist() |> max()
out_k = out |> 
  within(expr = {
    phenotype.nnc.kern = phenotype.nnc |> 
      kerndens(from = 0, to = mdist)
    hladr.kern = ppp. |>
      kerndens(from = 0, to = mdist)
  })
out_k
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#       OS gender age    patient_id          image_id  ppp. hladr.E phenotype.G phenotype.nnc phenotype.nnc.kern hladr.kern
# 1    176      M  84 #03 2-080-378 [36953,13765].im3 (ppp)    (fv)        (fv)     (numeric)          (numeric)  (numeric)
# 2    176      M  84 #03 2-080-378 [39206,15250].im3 (ppp)    (fv)        (fv)     (numeric)          (numeric)  (numeric)
# 3    176      M  84 #03 2-080-378 [40242,17359].im3 (ppp)    (fv)        (fv)     (numeric)          (numeric)  (numeric)
# 4    176      M  84 #03 2-080-378 [40863,16444].im3 (ppp)    (fv)        (fv)     (numeric)          (numeric)  (numeric)
# 5  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)    (fv)        (fv)     (numeric)          (numeric)  (numeric)
# ✂️ --- output truncated --- ✂️

3.4 Aggregation

Listing 3.10 aggregates (Section 26.4) the customized statistics s_markstat$markstats (Listing 3.6) of the numeric- and/or multi-type marks of the point-pattern hypercolumn by patient_id using point-wise means pmeans(), and returns a hyper data frame.

Listing 3.10: Aggregates the customized statistics of the numeric- and/or multi-type marks (Listing 3.5)
s_markstat |>
  aggregate() |>
  within(expr = {
    markstats = markstats |> do.call(what = pmean)
  })
# Variable(s) image_id removed from aggregation
# Hyperframe:
#      OS gender age    patient_id      ppp. markstats
# 1 3488+      F  85 #01 0-889-121 (ppplist) (numeric)
# 2  1605      M  66 #02 1-037-393 (ppplist) (numeric)
# 3   176      M  84 #03 2-080-378 (ppplist) (numeric)

Listing 3.11 aggregates (Section 26.4) the summarized information from function-value-table hypercolumns (Listing 3.7) by patient_id using point-wise means pmeans(), and returns a hyper data frame.

Listing 3.11: Aggregates the function-value-tables (Listing 3.7)
out_fv |>
  aggregate() |>
  within(expr = {
    hladr.Ey = hladr.Ey |> do.call(what = pmean)
    phenotype.Gy = phenotype.Gy |> do.call(what = pmean)
    hladr.E.cumv = hladr.E.cumv |> do.call(what = pmean)
    phenotype.G.cumv = phenotype.G.cumv |> do.call(what = pmean)
  })
# Variable(s) image_id removed from aggregation
# Hyperframe:
#      OS gender age    patient_id      ppp.   hladr.E phenotype.G phenotype.nnc  hladr.Ey phenotype.Gy hladr.E.cumv phenotype.G.cumv
# 1 3488+      F  85 #01 0-889-121 (ppplist) (anylist)   (anylist)     (anylist) (numeric)    (numeric)    (numeric)        (numeric)
# 2  1605      M  66 #02 1-037-393 (ppplist) (anylist)   (anylist)     (anylist) (numeric)    (numeric)    (numeric)        (numeric)
# 3   176      M  84 #03 2-080-378 (ppplist) (anylist)   (anylist)     (anylist) (numeric)    (numeric)    (numeric)        (numeric)

Listing 3.12 aggregates (Section 26.4) the quantiles from the numeric-hypercolumns or the numeric marks of the point-pattern hypercolumn (Listing 3.8) by patient_id using point-wise means pmeans(), and returns a hyper data frame.

Listing 3.12: Aggregates the quantiles (Listing 3.8)
out_q |> 
  aggregate() |>
  within(expr = {
    hladr.q = hladr.q |> do.call(what = pmean)
    phenotype.nnc.q = phenotype.nnc.q |> do.call(what = pmean)
  })
# Variable(s) image_id removed from aggregation
# Hyperframe:
#      OS gender age    patient_id      ppp.   hladr.E phenotype.G phenotype.nnc   hladr.q phenotype.nnc.q
# 1 3488+      F  85 #01 0-889-121 (ppplist) (anylist)   (anylist)     (anylist) (numeric)       (numeric)
# 2  1605      M  66 #02 1-037-393 (ppplist) (anylist)   (anylist)     (anylist) (numeric)       (numeric)
# 3   176      M  84 #03 2-080-378 (ppplist) (anylist)   (anylist)     (anylist) (numeric)       (numeric)

Listing 3.13 aggregates (Section 26.4) the kernel density estimates from the numeric-hypercolumns or the numeric marks of the point-pattern hypercolumn (Listing 3.9) by patient_id using point-wise means pmeans(), and returns a hyper data frame.

Listing 3.13: Aggregates the kernel density estimates (Listing 3.9)
out_k |>
  aggregate() |>
  within(expr = {
    hladr.kern = hladr.kern |> do.call(what = pmean)
    phenotype.nnc.kern = phenotype.nnc.kern |> do.call(what = pmean)
  })
# Variable(s) image_id removed from aggregation
# Hyperframe:
#      OS gender age    patient_id      ppp.   hladr.E phenotype.G phenotype.nnc phenotype.nnc.kern hladr.kern
# 1 3488+      F  85 #01 0-889-121 (ppplist) (anylist)   (anylist)     (anylist)          (numeric)  (numeric)
# 2  1605      M  66 #02 1-037-393 (ppplist) (anylist)   (anylist)     (anylist)          (numeric)  (numeric)
# 3   176      M  84 #03 2-080-378 (ppplist) (anylist)   (anylist)     (anylist)          (numeric)  (numeric)