2  Grouped Hyper Data Frame

ImportantDisclaimer

These packages (Note 1) are a one-person project undergoing rapid evolution. Backward compatibility (per Hadley Wickham) is provided as a courtesy rather than a guarantee.

Until further notice, these packages should

  • not be used as a basis for research grant applications,
  • not be cited as an actively maintained tool in a peer-reviewed manuscript,
  • not be used to support or fulfill requirements for pursuing an academic degree.

In addition, work primarily based on these packages (Note 1) should not be presented at academic conferences or similar scholarly venues.

Furthermore, a person’s ability to use these packages (Note 1) does not necessarily imply an understanding of their underlying mechanisms. Accordingly, demonstration of their use alone should not be considered sufficient evidence of expertise, nor should it be credited as a basis for academic promotion or advancement.

These statements do not apply to the contributors (Tip 1) to these packages (Note 1) with respect to their specific contributions.

These statements do not apply when the maintainer of these packages (Note 1), Tingting Zhan, is credited as the first author, the lead author, and/or the corresponding author in a peer-reviewed manuscript, or as the Principal Investigator or Co-Principal Investigator in a research grant application and/or a final research progress report.

These statements are advisory in nature and do not modify or restrict the rights granted under the GNU General Public License https://www.r-project.org/Licenses/.

TipExamples in Chapter 2 Require
library(groupedHyperframe)
Caution 2.1: Chapter 2 Prerequisite

Caution 1.1, Caution 1.2 and Section 1.2

The S3 class groupedData in the nlme package (Pinheiro et al. 2025, v3.1.168, GPL (>= 2)).

Package groupedHyperframe (v0.4.0, GPL-2) introduces the grouped hyper data frame, a hyper data frame augmented with a (nested) grouping structure (Chapter 25).

The author provides a toy dataset wrobel_lung, originally contributed by Dr. Julia Wrobel. Listing 2.1 creates a subset lung0, in which the non-identical column(s) within the lowest-level group image_id (under the nested grouping structure ~patient_id/image_id) are hladr and phenotype.

Listing 2.1: Data frame lung0
lung0 = wrobel_lung |>
  within.data.frame(expr = {
    x = y = NULL
    dapi = NULL
  })
lung0
#                image_id    patient_id gender hladr phenotype    OS age
# 1     [40864,18015].im3 #01 0-889-121      F 0.115  CK-.CD8- 3488+  85
# 2     [40864,18015].im3 #01 0-889-121      F 0.239  CK-.CD8- 3488+  85
# 3     [40864,18015].im3 #01 0-889-121      F 0.268  CK-.CD8- 3488+  85
# 4     [40864,18015].im3 #01 0-889-121      F 0.245  CK-.CD8- 3488+  85
# 5     [40864,18015].im3 #01 0-889-121      F 0.127  CK+.CD8- 3488+  85
# 6     [40864,18015].im3 #01 0-889-121      F 0.136  CK+.CD8- 3488+  85
# ✂️ --- output truncated --- ✂️

Listing 2.2 creates a grouped hyper data frame lung_g from the data frame lung0 (Listing 2.1) by specifying a (nested) grouping structure (Section 18.1),

Listing 2.2: Grouped hyper data frame lung_g
lung_g = lung0 |> 
  aggregate2hyper(by = ~ patient_id/image_id)
# Hypercolumn(s) hladr, phenotype created!

Readers may view the grouped hyper data frame lung_g (Listing 2.2) by simply typing lung_g at the R console prompt and pressing Enter (Listing 2.3),

Listing 2.3: View grouped hyper data frame lung_g (Listing 2.2)
lung_g
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#             image_id    patient_id gender    OS age     hladr phenotype
# 1  [36953,13765].im3 #03 2-080-378      M   176  84 (numeric)  (factor)
# 2  [39206,15250].im3 #03 2-080-378      M   176  84 (numeric)  (factor)
# 3  [40242,17359].im3 #03 2-080-378      M   176  84 (numeric)  (factor)
# 4  [40863,16444].im3 #03 2-080-378      M   176  84 (numeric)  (factor)
# 5  [40864,18015].im3 #01 0-889-121      F 3488+  85 (numeric)  (factor)
# ✂️ --- output truncated --- ✂️

Also, readers may view the summary information of the grouped hyper data frame lung_g (Listing 2.2) using the function summary() (Listing 2.4),

Listing 2.4: Summarized information of grouped hyper data frame lung_g (Listing 2.2)
lung_g |> 
  summary()
# Grouped Hyper Data Frame: ~patient_id
# 
# 3 patient_id
# 
#               image_id         patient_id gender                  OS              age        hladr     phenotype
#               (factor)           (factor) (factor)            (Surv)         (numeric)       (numeric)  (factor)
#  [36953,13765].im3:1   #01 0-889-121:5    F: 5     <time-to-event> :(Surv)   Min.   :66.00                      
#  [39206,15250].im3:1   #02 1-037-393:5    M:10     [right-censored]:5        1st Qu.:66.00                      
#  [40242,17359].im3:1   #03 2-080-378:5             [observed]      :10       Median :84.00                      
#  [40863,16444].im3:1                                                         Mean   :78.33                      
#  [40864,18015].im3:1                                                         3rd Qu.:85.00                      
#  [41191,13764].im3:1                                                         Max.   :85.00                      
#  (Other)          :9

Listing 2.5 computes and aggregates the quantiles of each element in the numeric-hypercolumn lung_g$hladr at the biologically independent grouping level patient_id (Section 3.3.3, Section 3.4).

Listing 2.5: Aggregated quantiles
lung_g |>
  within(expr = {
    hladr.q = quantile(hladr, probs = seq.int(from = .01, to = .99, by = .01))
  }) |>
  aggregate() |>
  within(expr = {
    hladr.q = hladr.q |> do.call(what = pmean)
  })
# Variable(s) image_id removed from aggregation
# Hyperframe:
#      patient_id gender    OS age     hladr phenotype   hladr.q
# 1 #01 0-889-121      F 3488+  85 (anylist) (anylist) (numeric)
# 2 #02 1-037-393      M  1605  66 (anylist) (anylist) (numeric)
# 3 #03 2-080-378      M   176  84 (anylist) (anylist) (numeric)