24  groupedData

ImportantDisclaimer

These packages (Note 1) are a one-person project undergoing rapid evolution. Backward compatibility (per Hadley Wickham) is provided as a courtesy rather than a guarantee.

Until further notice, these packages should

  • not be used as a basis for research grant applications,
  • not be cited as an actively maintained tool in a peer-reviewed manuscript,
  • not be used to support or fulfill requirements for pursuing an academic degree.

In addition, work primarily based on these packages (Note 1) should not be presented at academic conferences or similar scholarly venues.

Furthermore, a person’s ability to use these packages (Note 1) does not necessarily imply an understanding of their underlying mechanisms. Accordingly, demonstration of their use alone should not be considered sufficient evidence of expertise, nor should it be credited as a basis for academic promotion or advancement.

These statements do not apply to the contributors (Tip 1) to these packages (Note 1) with respect to their specific contributions.

These statements do not apply when the maintainer of these packages (Note 1), Tingting Zhan, is credited as the first author, the lead author, and/or the corresponding author in a peer-reviewed manuscript, or as the Principal Investigator or Co-Principal Investigator in a research grant application and/or a final research progress report.

These statements are advisory in nature and do not modify or restrict the rights granted under the GNU General Public License https://www.r-project.org/Licenses/.

The function nlme::groupedData() (Pinheiro et al. 2025, v3.1.168, GPL (>= 2)) creates a grouped data frame, i.e., an R object of S3 class 'groupedData'. Listing 24.1 summarizes the S3 methods for the class 'groupedData' in packages nlme,

Listing 24.1: Existing (but not exported) S3 methods nlme::*.groupedData
Code
library(nlme)
.S3methods(class = 'groupedData', all.names = TRUE) |> 
  attr(which = 'info', exact = TRUE)
#                           visible                from       generic  isS4
# [.groupedData               FALSE registered S3method             [ FALSE
# as.data.frame.groupedData   FALSE registered S3method as.data.frame FALSE
# asTable.groupedData         FALSE registered S3method       asTable FALSE
# collapse.groupedData        FALSE registered S3method      collapse FALSE
# formula.groupedData         FALSE registered S3method       formula FALSE
# isBalanced.groupedData      FALSE registered S3method    isBalanced FALSE
# lme.groupedData             FALSE registered S3method           lme FALSE
# lmList.groupedData          FALSE registered S3method        lmList FALSE
# print.groupedData           FALSE registered S3method         print FALSE
# update.groupedData          FALSE registered S3method        update FALSE
Note

The examples in Chapter 24 require

library(groupedHyperframe)

24.1 Aggregate

The S3 method aggregate2hyper.groupedData() (Section 18.1, Table 18.1) aggregates a grouped data frame into a (grouped) hyper data frame using its grouping structure.

24.1.1 To Hyper Data Frame

Listing 24.3 aggregates the grouped data frame Remifentanil with grouping structure ~Subject (Listing 24.2) into a hyper data frame.

Listing 24.2: Data: Remifentanil
nlme::Remifentanil
# Grouped Data: conc ~ Time | Subject
#      ID Subject   Time   conc   Rate       Amt   Age    Sex  Ht    Wt    BSA     LBM
# 1     1       1   0.00     NA  71.99  107.9850 30.58   Male 171  72.0 1.8393 56.5075
# 2     1       1   1.50   9.51  71.99   35.9950 30.58   Male 171  72.0 1.8393 56.5075
# 3     1       1   2.00  11.50  71.99   37.4348 30.58   Male 171  72.0 1.8393 56.5075
# ✂️ --- output truncated --- ✂️
Listing 24.3: Example: function aggregate2hyper.groupedData() on Remifentanil
Remifentanil_g = nlme::Remifentanil |> 
  aggregate2hyper()
Remifentanil_g
# Hyperframe:
#    ID Subject   Age    Sex  Ht    Wt    BSA     LBM      Time      conc      Rate       Amt
# 1  30      30 21.00 Female 165  55.9 1.6095 42.8260 (numeric) (numeric) (numeric) (numeric)
# 2  21      21 24.00 Female 161  58.6 1.6131 43.0953 (numeric) (numeric) (numeric) (numeric)
# 3  25      25 32.00 Female 157  45.9 1.4278 36.4631 (numeric) (numeric) (numeric) (numeric)
# 4  23      23 23.00 Female 163  50.0 1.5215 39.5740 (numeric) (numeric) (numeric) (numeric)
# 5  29      29 25.00 Female 163  54.5 1.5782 41.7695 (numeric) (numeric) (numeric) (numeric)
# 6  28      28 30.00 Female 178  79.1 1.9708 55.4106 (numeric) (numeric) (numeric) (numeric)
# 7  32      32 54.00 Female 167  45.0 1.4806 37.4038 (numeric) (numeric) (numeric) (numeric)
# 8  64      64 55.00 Female 168  74.8 1.8455 50.6969 (numeric) (numeric) (numeric) (numeric)
# ✂️ --- output truncated --- ✂️

Listing 24.5 converts the grouped data frame bdf with grouping structure ~schoolNR (Listing 24.4) into a hyper data frame.

Listing 24.4: Data: bdf
nlme::bdf
# Grouped Data: langPOST ~ IQ.verb | schoolNR
# <environment: 0x1269150b0>
#      schoolNR pupilNR IQ.verb  IQ.perf sex Minority repeatgr aritPRET classNR aritPOST langPRET langPOST ses denomina schoolSES satiprin natitest meetings currmeet mixedgra percmino aritdiff homework
# 1           1   17001    15.0 12.33333   0        N        0       14     180       24       36       46  23        1        11  3.42857        0  1.70000  1.83333        0       60       12  2.33333
# 2           1   17002    14.5 10.00000   0        Y        0       12     180       19       36       45  10        1        11  3.42857        0  1.70000  1.83333        0       60       12  2.33333
# 3           1   17003     9.5 11.00000   0        N        0       10     180       24       33       33  15        1        11  3.42857        0  1.70000  1.83333        0       60       12  2.33333
# ✂️ --- output truncated --- ✂️
Listing 24.5: Example: function aggregate2hyper.groupedData() on bdf
bdf_g = nlme::bdf |> 
  aggregate2hyper()
bdf_g
# Hyperframe:
#     schoolNR denomina schoolSES satiprin natitest meetings currmeet aritdiff avg.IQ.ver.cen  pupilNR   IQ.verb   IQ.perf      sex Minority  repeatgr  aritPRET   classNR  aritPOST  langPRET  langPOST
# 1         47        3        11  2.85714        0  2.11111  1.83333       13  -3.5215620901 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric)
# 2        103        3        12  3.00000        1  2.10000  2.16667       17  -5.0840620901 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric)
# 3          2        1        11  3.00000        0  1.60000  1.66667       27  -2.8340620901 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric)
# 4        123        2        20  3.14286        1  2.10000  2.00000       27  -2.0840620901 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric)
# 5         10        1        15  3.00000        0  2.60000  2.66667       27  -1.3340620901 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric)
# 6        258        3        14  2.71429        0  2.00000  2.00000       13  -1.1912049472 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric)
# 7         27        1        17  3.28571        0  1.10000  1.00000        9  -0.4054906615 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric)
# 8         12        1        20  3.14286        0  2.70000  2.66667       17  -2.4007287567 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric)
# ✂️ --- output truncated --- ✂️

Converting a (grouped) data frame with substantial amount of duplicated information into a grouped hyper data frame not necessarily(!!) reduces the memory allocation (Listing 24.8), because the hyperframe object (Chapter 26) carries additional auxiliary information. And even when it does reduce the memory allocation (Listing 24.6), a grouped hyper data frame would not reduce much the saved file.size compared to a data frame, if xz compression is used for both (Listing 24.7).

Listing 24.6: Advanced: Listing 24.3 reduces memory allocation
unclass(object.size(Remifentanil_g) / object.size(nlme::Remifentanil))
# [1] 0.3070008
Listing 24.7: Advanced: Listing 24.3 does not reduce saved file size
f = replicate(n = 2L, expr = tempfile(fileext = '.rds'))
Remifentanil_g |> saveRDS(file = f[1L], compress = 'xz')
nlme::Remifentanil |> saveRDS(file = f[2L], compress = 'xz')
file.size(f[1L]) / file.size(f[2L])
# [1] 0.8992504
Listing 24.8: Advanced: Listing 24.5 does not reduce memory allocation
unclass(object.size(bdf_g) / object.size(nlme::bdf))
# [1] 25.78619

24.1.2 To Grouped Hyper Data Frame

Listing 24.10 aggregates the grouped data frame Wafer with grouping structure ~Wafer/Site (Listing 24.9) into a grouped hyper data frame.

Listing 24.9: Data: Wafer
nlme::Wafer
# Grouped Data: current ~ voltage | Wafer/Site
#     Wafer Site voltage  current
# 1       1    1     0.8  0.90088
# 2       1    1     1.2  3.86820
# 3       1    1     1.6  7.64060
# ✂️ --- output truncated --- ✂️
Listing 24.10: Example: function aggregate2hyper.groupedData() on Wafer
nlme::Wafer |> 
  aggregate2hyper()
# Grouped Hyper Data Frame: ~Wafer
# 
# 10 Wafer
# 
#    Wafer Site   voltage   current
# 1      1    1 (numeric) (numeric)
# 2      2    1 (numeric) (numeric)
# 3      3    1 (numeric) (numeric)
# 4      4    1 (numeric) (numeric)
# 5      5    1 (numeric) (numeric)
# ✂️ --- output truncated --- ✂️