Title: | Canonical Joint and Individual Variation Explained (CJIVE) |
---|---|
Description: | Joint and Individual Variation Explained (JIVE) is a method for decomposing multiple datasets obtained on the same subjects into shared structure, structure unique to each dataset, and noise. The two most common implementations are R.JIVE, an iterative approach, and AJIVE, which uses principal angle analysis. JIVE estimates subspaces but interpreting these subspaces can be challenging with AJIVE or R.JIVE. We expand upon insights into AJIVE as a canonical correlation analysis (CCA) of principal component scores. This reformulation, which we call CJIVE, 1) provides an ordering of joint components by the degree of correlation between corresponding canonical variables; 2) uses a computationally efficient permutation test for the number of joint components, which provides a p-value for each component; and 3) can be used to predict subject scores for out-of-sample observations. Please cite the following article when utilizing this package: Murden, R., Zhang, Z., Guo, Y., & Risk, B. (2022) <doi:10.3389/fnins.2022.969510>. |
Authors: | Raphiel Murden [aut, cre]
|
Maintainer: | Raphiel Murden <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-01-24 05:57:59 UTC |
Source: | https://github.com/cran/CJIVE |
Adjusts the proportion of total variation attributable to each signal component to predetermined values
AdjSigVarExp(J, I, N, JntVarEx, IndVarEx)
AdjSigVarExp(J, I, N, JntVarEx, IndVarEx)
J |
joint signal matrix of size n-by-p |
I |
individual signal matrix of size n-by-p |
N |
noise/error matrix of size n-by-p |
JntVarEx |
desired proportion of total variation explained by the joint signal |
IndVarEx |
desired proportion of total variation explained by the individual signal |
a list of 3 items: 1) adjusted joint signal matrix; 2) adjusted individual signal matrix; 3) data matrix additively comprised of the adjusted signal matrices
Performs Canonical JIVE as described in the CJVE manuscript. This method is equivalent to AJIVE for 2 data sets.
cc.jive( dat.blocks, signal.ranks = NULL, joint.rank = 1, perc.var = 0.95, perm.test = TRUE, center = FALSE, nperms = 1000 )
cc.jive( dat.blocks, signal.ranks = NULL, joint.rank = 1, perc.var = 0.95, perm.test = TRUE, center = FALSE, nperms = 1000 )
dat.blocks |
a list of two matrices with samples along rows and features along columns, which contain data on the same n individuals/sampling units |
signal.ranks |
a vector of length two which contains the rank for the signal within each data block. The rank corresponds to the number of principal components (PCs) to be retained within each data block. If NULL, the ranks are determined by the parameter 'perc.var.' Default is NULL |
joint.rank |
The rank of the joint subspace i.e., number of components in the joint subspace |
perc.var |
an alternative to signal.ranks that allows specification of ranks based on the desired proportion of total variation to be retained. F For perc.var = p (where 0<p<1), rank is determined as the minimum number of eigenvalues whose cumulative sum is at least p*(total sum of eigenvalues) Default is 0.95 (i.e. 95% of total variation preserved for each data block). |
perm.test |
logical (TRUE/FALSE) of whether permutation test for joint rank should be performed. Overrides 'joint.rank' parameter if TRUE. Default is TRUE |
center |
logical (TRUE/FALSE) indicating whether data should be column-centered prior to testing. Default is TRUE |
nperms |
integer value indicating the number of permutations that should be performed. Default is 1000 |
A list of two lists: 1) 'CanCorRes' contains results from the canonical correlation of PC scores including, the joint rank, joint subject sores, canonical correlations (and their respective p-values if perm.test was used), canonical loadings for the joint subspace, and total signal ranks 2) 'sJIVE', i.e. Simple JIVE results, correspond to the AJIVE when all ranks are known; includes the joint and individual signal matrices, concatenated PC scores, and the projection matrix used to project each data block onto the joint subspace
#Assign sample size and the number of features in each dataset n = 200 #sample size p1 = 100 #Number of features in data set X1 p2 = 100 #Number of features in data set X2 # Assign values of joint and individual signal ranks r.J = 1 #joint rank r.I1 = 2 #individual rank for data set X1 r.I2 = 2 #individual rank for data set X2 # Simulate data sets ToyDat = GenerateToyData(n = 200, p1 = p1, p2 = p2, JntVarEx1 = 0.05, JntVarEx2 = 0.05, IndVarEx1 = 0.25, IndVarEx2 = 0.25, jnt_rank = r.J, equal.eig = FALSE, ind_rank1 = r.I1, ind_rank2 = r.I2, SVD.plots = TRUE, Error = TRUE, print.cor = TRUE) # Store simulated data sets in an object called 'blocks' blocks <- ToyDat$'Data Blocks' # Save Subject scores as R objects JntScores = ToyDat[['Scores']][['Joint']] IndivScore.X = ToyDat[['Scores']][["Indiv_1"]] IndivScore.Y = ToyDat[['Scores']][["Indiv_2"]] # Save joint variable loadings as R objects JntLd.X = t(ToyDat$Loadings$Joint_1) JntLd.Y = t(ToyDat$Loadings$Joint_2) # Save individual variable loadings as R objects IndivLd.X =t(ToyDat$Loadings$Indiv_1) IndivLd.Y = t(ToyDat$Loadings$Indiv_2) # Save joint, individual, and noise signal matrices as R objects JX = ToyDat[[1]]$J1 JY = ToyDat[[1]]$J2 IX = ToyDat[[1]]$I1 IY = ToyDat[[1]]$I2 EX = ToyDat[[1]]$E1 EY = ToyDat[[1]]$E2 ## Check that proportions of variation explained are (approximately) equal to intended values JVE.X = MatVar(JX)/MatVar(blocks[[1]]) JVE.Y = MatVar(JY)/MatVar(blocks[[2]]) IVE.X = MatVar(IX)/MatVar(blocks[[1]]) IVE.Y = MatVar(IY)/MatVar(blocks[[2]]) TotVE.X = MatVar((JX + IX))/MatVar(blocks[[1]]) TotVE.Y = MatVar((JY + IY))/MatVar(blocks[[2]]) CJIVE.res = cc.jive(blocks, c(r.I1,r.I2)+r.J, r.J, perm.test = FALSE) # CJIVE signal matrix estimates J.hat = CJIVE.res$sJIVE$joint_matrices I.hat = CJIVE.res$sJIVE$indiv_matrices # CJIVE loading estimates WJ = lapply(J.hat, function(x) x[['v']]) WI = lapply(I.hat, function(x) x[['v']]) # Plots of CJIVE estimates against true counterparts and include an estimate of their chordal norm layout(matrix(1:6,2, byrow = TRUE)) plot(JntScores, CJIVE.res$CanCorRes$Jnt_Scores, xlab = "True Joint Scores", ylab = "CJIVE Joint Scores", sub = paste0("Chordal Norm = ", round(chord.norm.diff(JntScores, CJIVE.res$CanCorRes$Jnt_Scores), 3))) plot(JntLd.X, WJ[[1]][,1], xlab = "True Joint Loadings X", ylab = "CJIVE Joint Loadings X", sub = paste0("Chordal Norm = ", round(chord.norm.diff(JntLd.X, WJ[[1]][,1]), 3))) plot(JntLd.Y, WJ[[2]][,1], xlab = "True Joint Loadings Y", ylab = "CJIVE Joint Loadings Y", sub = paste0("Chordal Norm = ", round(chord.norm.diff(JntLd.Y, WJ[[2]][,1]), 3))) plot.new(); legend("left", paste("Comp.", 1:2), pch = 1, col = c("orange", "green"),bty = "n" ) plot(IndivLd.X, WI[[1]][,1:2], xlab = "True Individual Loadings X", ylab = "CJIVE Individual Loadings X", col = c(rep("orange",p1), rep("green",p2)), sub = paste0("Chordal Norm = ", round(chord.norm.diff(IndivLd.X, WI[[1]][,1:2]), 3))) plot(IndivLd.Y, WI[[2]][,1:2], xlab = "True Individual Loadings Y", ylab = "CJIVE Individual Loadings Y", col = c(rep("orange",p1), rep("green",p2)), sub = paste0("Chordal Norm = ", round(chord.norm.diff(IndivLd.Y, WI[[2]][,1:2]), 3))) layout(1)
#Assign sample size and the number of features in each dataset n = 200 #sample size p1 = 100 #Number of features in data set X1 p2 = 100 #Number of features in data set X2 # Assign values of joint and individual signal ranks r.J = 1 #joint rank r.I1 = 2 #individual rank for data set X1 r.I2 = 2 #individual rank for data set X2 # Simulate data sets ToyDat = GenerateToyData(n = 200, p1 = p1, p2 = p2, JntVarEx1 = 0.05, JntVarEx2 = 0.05, IndVarEx1 = 0.25, IndVarEx2 = 0.25, jnt_rank = r.J, equal.eig = FALSE, ind_rank1 = r.I1, ind_rank2 = r.I2, SVD.plots = TRUE, Error = TRUE, print.cor = TRUE) # Store simulated data sets in an object called 'blocks' blocks <- ToyDat$'Data Blocks' # Save Subject scores as R objects JntScores = ToyDat[['Scores']][['Joint']] IndivScore.X = ToyDat[['Scores']][["Indiv_1"]] IndivScore.Y = ToyDat[['Scores']][["Indiv_2"]] # Save joint variable loadings as R objects JntLd.X = t(ToyDat$Loadings$Joint_1) JntLd.Y = t(ToyDat$Loadings$Joint_2) # Save individual variable loadings as R objects IndivLd.X =t(ToyDat$Loadings$Indiv_1) IndivLd.Y = t(ToyDat$Loadings$Indiv_2) # Save joint, individual, and noise signal matrices as R objects JX = ToyDat[[1]]$J1 JY = ToyDat[[1]]$J2 IX = ToyDat[[1]]$I1 IY = ToyDat[[1]]$I2 EX = ToyDat[[1]]$E1 EY = ToyDat[[1]]$E2 ## Check that proportions of variation explained are (approximately) equal to intended values JVE.X = MatVar(JX)/MatVar(blocks[[1]]) JVE.Y = MatVar(JY)/MatVar(blocks[[2]]) IVE.X = MatVar(IX)/MatVar(blocks[[1]]) IVE.Y = MatVar(IY)/MatVar(blocks[[2]]) TotVE.X = MatVar((JX + IX))/MatVar(blocks[[1]]) TotVE.Y = MatVar((JY + IY))/MatVar(blocks[[2]]) CJIVE.res = cc.jive(blocks, c(r.I1,r.I2)+r.J, r.J, perm.test = FALSE) # CJIVE signal matrix estimates J.hat = CJIVE.res$sJIVE$joint_matrices I.hat = CJIVE.res$sJIVE$indiv_matrices # CJIVE loading estimates WJ = lapply(J.hat, function(x) x[['v']]) WI = lapply(I.hat, function(x) x[['v']]) # Plots of CJIVE estimates against true counterparts and include an estimate of their chordal norm layout(matrix(1:6,2, byrow = TRUE)) plot(JntScores, CJIVE.res$CanCorRes$Jnt_Scores, xlab = "True Joint Scores", ylab = "CJIVE Joint Scores", sub = paste0("Chordal Norm = ", round(chord.norm.diff(JntScores, CJIVE.res$CanCorRes$Jnt_Scores), 3))) plot(JntLd.X, WJ[[1]][,1], xlab = "True Joint Loadings X", ylab = "CJIVE Joint Loadings X", sub = paste0("Chordal Norm = ", round(chord.norm.diff(JntLd.X, WJ[[1]][,1]), 3))) plot(JntLd.Y, WJ[[2]][,1], xlab = "True Joint Loadings Y", ylab = "CJIVE Joint Loadings Y", sub = paste0("Chordal Norm = ", round(chord.norm.diff(JntLd.Y, WJ[[2]][,1]), 3))) plot.new(); legend("left", paste("Comp.", 1:2), pch = 1, col = c("orange", "green"),bty = "n" ) plot(IndivLd.X, WI[[1]][,1:2], xlab = "True Individual Loadings X", ylab = "CJIVE Individual Loadings X", col = c(rep("orange",p1), rep("green",p2)), sub = paste0("Chordal Norm = ", round(chord.norm.diff(IndivLd.X, WI[[1]][,1:2]), 3))) plot(IndivLd.Y, WI[[2]][,1:2], xlab = "True Individual Loadings Y", ylab = "CJIVE Individual Loadings Y", col = c(rep("orange",p1), rep("green",p2)), sub = paste0("Chordal Norm = ", round(chord.norm.diff(IndivLd.Y, WI[[2]][,1:2]), 3))) layout(1)
Predicts joint scores for new subjects based on CJIVE joint scores
cc.jive.pred( orig.dat.blocks, new.subjs, signal.ranks, cc.jive.loadings, can.cors )
cc.jive.pred( orig.dat.blocks, new.subjs, signal.ranks, cc.jive.loadings, can.cors )
orig.dat.blocks |
list of the two data matrices on which CJIVE was initially conducted |
new.subjs |
list of two data matrices containing information on new subjects |
signal.ranks |
a vector of length two which contains the rank for the signal within each data block. The rank corresponds to the number of principal components (PCs) to be retained within each data block. If NULL, the ranks are determined by the parameter 'perc.var.' Default is NULL |
cc.jive.loadings |
canonical loadings for the joint subspace |
can.cors |
canonical correlations from the PCs of the data on which CJIVE was initially conducted - notated as rho_j in CJIVE manuscript |
matrix of joint subject score for new subjects
n = 200 #sample size p1 = 100 #Number of features in data set X1 p2 = 100 #Number of features in data set X2 # Assign values of joint and individual signal ranks r.J = 1 #joint rank r.I1 = 2 #individual rank for data set X1 r.I2 = 2 #individual rank for data set X2 true_signal_ranks = r.J + c(r.I1,r.I2) # Simulate data sets ToyDat = GenerateToyData(n = n, p1 = p1, p2 = p2, JntVarEx1 = 0.05, JntVarEx2 = 0.05, IndVarEx1 = 0.25, IndVarEx2 = 0.25, jnt_rank = r.J, equal.eig = FALSE, ind_rank1 = r.I1, ind_rank2 = r.I2, SVD.plots = TRUE, Error = TRUE, print.cor = TRUE) # Store simulated data sets in an object called 'blocks' blocks <- ToyDat$'Data Blocks' # Split data randomly into two subsamples rnd.smp = sample(n, n/2) blocks.sub1 = lapply(blocks, function(x){x[rnd.smp,]}) blocks.sub2 = lapply(blocks, function(x){x[-rnd.smp,]}) # Joint scores for the two sub samples JntScores.1 = ToyDat[['Scores']][['Joint']][rnd.smp] JntScores.2 = ToyDat[['Scores']][['Joint']][-rnd.smp] # Conduct CJIVE analysis on the first sub-sample and store the canonical loadings and canonical # correlations cc.jive.res_sub1 = cc.jive(blocks.sub1, signal.ranks = r.J+c(r.I1,r.I2), center = FALSE, perm.test = FALSE, joint.rank = r.J) cc.ldgs1 = cc.jive.res_sub1$CanCorRes$Loadings can.cors = cc.jive.res_sub1$CanCorRes$Canonical_Correlations[1:r.J] # Conduct CJIVE analysis on the second sub-sample. We will predict these joint scores using the # results above cc.jive.res_sub2 = cc.jive(blocks.sub2, signal.ranks = true_signal_ranks, center = FALSE, perm.test = FALSE, joint.rank = r.J) cc.jnt.scores.sub2 = cc.jive.res_sub2$CanCorRes$Jnt_Scores cc.pred.jnt.scores.sub2 = cc.jive.pred(blocks.sub1, new.subjs = blocks.sub2, signal.ranks = true_signal_ranks, cc.jive.loadings = cc.ldgs1, can.cors = can.cors) # Calculate the Pearson correlation coefficient between predicted and calculated joint scores # for sub-sample 2 cc.pred.cor = diag(cor(cc.pred.jnt.scores.sub2, cc.jnt.scores.sub2)) print(cc.pred.cor) # Plots of CJIVE estimates against true counterparts and include an estimate of their chordal # norm layout(matrix(1:2, ncol = 2)) plot(JntScores.2, cc.pred.jnt.scores.sub2, ylab = "Predicted Joint Scores", xlab = "True Joint Scores", col = rep(1:r.J, each = n/2), main = paste("Chordal Norm = ", round(chord.norm.diff(JntScores.2, cc.pred.jnt.scores.sub2),2))) legend("topleft", legend = paste("Component", 1:r.J), col = 1:r.J, pch = 1) plot(cc.jnt.scores.sub2, cc.pred.jnt.scores.sub2, ylab = "Predicted Joint Scores", xlab = "Estimated Joint Scores", col = rep(1:r.J, each = n/2), main = paste("Chordal Norm = ", round(chord.norm.diff(cc.jnt.scores.sub2, cc.pred.jnt.scores.sub2),2))) layout(1)
n = 200 #sample size p1 = 100 #Number of features in data set X1 p2 = 100 #Number of features in data set X2 # Assign values of joint and individual signal ranks r.J = 1 #joint rank r.I1 = 2 #individual rank for data set X1 r.I2 = 2 #individual rank for data set X2 true_signal_ranks = r.J + c(r.I1,r.I2) # Simulate data sets ToyDat = GenerateToyData(n = n, p1 = p1, p2 = p2, JntVarEx1 = 0.05, JntVarEx2 = 0.05, IndVarEx1 = 0.25, IndVarEx2 = 0.25, jnt_rank = r.J, equal.eig = FALSE, ind_rank1 = r.I1, ind_rank2 = r.I2, SVD.plots = TRUE, Error = TRUE, print.cor = TRUE) # Store simulated data sets in an object called 'blocks' blocks <- ToyDat$'Data Blocks' # Split data randomly into two subsamples rnd.smp = sample(n, n/2) blocks.sub1 = lapply(blocks, function(x){x[rnd.smp,]}) blocks.sub2 = lapply(blocks, function(x){x[-rnd.smp,]}) # Joint scores for the two sub samples JntScores.1 = ToyDat[['Scores']][['Joint']][rnd.smp] JntScores.2 = ToyDat[['Scores']][['Joint']][-rnd.smp] # Conduct CJIVE analysis on the first sub-sample and store the canonical loadings and canonical # correlations cc.jive.res_sub1 = cc.jive(blocks.sub1, signal.ranks = r.J+c(r.I1,r.I2), center = FALSE, perm.test = FALSE, joint.rank = r.J) cc.ldgs1 = cc.jive.res_sub1$CanCorRes$Loadings can.cors = cc.jive.res_sub1$CanCorRes$Canonical_Correlations[1:r.J] # Conduct CJIVE analysis on the second sub-sample. We will predict these joint scores using the # results above cc.jive.res_sub2 = cc.jive(blocks.sub2, signal.ranks = true_signal_ranks, center = FALSE, perm.test = FALSE, joint.rank = r.J) cc.jnt.scores.sub2 = cc.jive.res_sub2$CanCorRes$Jnt_Scores cc.pred.jnt.scores.sub2 = cc.jive.pred(blocks.sub1, new.subjs = blocks.sub2, signal.ranks = true_signal_ranks, cc.jive.loadings = cc.ldgs1, can.cors = can.cors) # Calculate the Pearson correlation coefficient between predicted and calculated joint scores # for sub-sample 2 cc.pred.cor = diag(cor(cc.pred.jnt.scores.sub2, cc.jnt.scores.sub2)) print(cc.pred.cor) # Plots of CJIVE estimates against true counterparts and include an estimate of their chordal # norm layout(matrix(1:2, ncol = 2)) plot(JntScores.2, cc.pred.jnt.scores.sub2, ylab = "Predicted Joint Scores", xlab = "True Joint Scores", col = rep(1:r.J, each = n/2), main = paste("Chordal Norm = ", round(chord.norm.diff(JntScores.2, cc.pred.jnt.scores.sub2),2))) legend("topleft", legend = paste("Component", 1:r.J), col = 1:r.J, pch = 1) plot(cc.jnt.scores.sub2, cc.pred.jnt.scores.sub2, ylab = "Predicted Joint Scores", xlab = "Estimated Joint Scores", col = rep(1:r.J, each = n/2), main = paste("Chordal Norm = ", round(chord.norm.diff(cc.jnt.scores.sub2, cc.pred.jnt.scores.sub2),2))) layout(1)
Calculates the chordal norm between the column subspaces of two matrices. Matrices must have the same number of rows. Let U_x and U_y represent the singular vectors of matrices X and Y, respectively. The chordal norm can be calculated as the square root of the sum of the singular values of t(U_x)
chord.norm.diff(X, Y, tol = 1e-08)
chord.norm.diff(X, Y, tol = 1e-08)
X |
a matrix with the same number of rows as Y and any number of columns |
Y |
a matrix with the same number of rows as X and any number of columns |
tol |
threshold under which singular values of inner product are zeroed out |
(Numeric) Chordal norm between column-subspaces of X and Y, scaled to the interval [0,1]
Convert results from simulation study into a form for graphing with ggplot
ConvSims_gg(AllSims)
ConvSims_gg(AllSims)
AllSims |
matrix with each row representing results from a replicate in the simulation study described in CJIVE manuscript |
list of 2 items: 1) joint ranks determined by each method employed in the simulations study 2) chordal norms between true and estimated joint/individual loadings/scores for each method employed in the simulation study
Convert matrix representation of a network for graphical display via ggplot
create.graph.long(gmatrix, sort_indices = NULL)
create.graph.long(gmatrix, sort_indices = NULL)
gmatrix |
square matrix of size p-by-p in which entries represent the strength of (un-directed) edges between the p nodes |
sort_indices |
vector of length p by which nodes are sorted. If NULL, then nodes are not sorted. Default is NULL. |
a data frame of three variables: X1, which represents the row from which the edge comes; X2, which represents the column from which the edge comes; 3) value, matrix entry representing the strength of the edge between the nodes represented by X1 and X2
Generates two Simulated Datasets that follow JIVE Model using binary subject scores
GenerateToyData( n, p1, p2, JntVarEx1, JntVarEx2, IndVarEx1, IndVarEx2, jnt_rank = 1, equal.eig = FALSE, ind_rank1 = 2, ind_rank2 = 2, SVD.plots = TRUE, Error = TRUE, print.cor = TRUE )
GenerateToyData( n, p1, p2, JntVarEx1, JntVarEx2, IndVarEx1, IndVarEx2, jnt_rank = 1, equal.eig = FALSE, ind_rank1 = 2, ind_rank2 = 2, SVD.plots = TRUE, Error = TRUE, print.cor = TRUE )
n |
integer for sample size, i.e. number of subjects |
p1 |
integer for number of features/variables in first data set |
p2 |
integer for number of features/variables in second data set |
JntVarEx1 |
numeric between (0,1) which describes proportion of variance in the first data set which is attributable to the joint signal |
JntVarEx2 |
numeric between (0,1) which describes proportion of variance in the second data set which is attributable to the joint signal |
IndVarEx1 |
numeric between (0,1) which describes proportion of variance in the first data set which is attributable to the individual signal |
IndVarEx2 |
numeric between (0,1) which describes proportion of variance in the second data set which is attributable to the individual signal |
jnt_rank |
integer for rank of the joint signal, i.e., number of joint components |
equal.eig |
logical (TRUE/FALSE) for whether components should contribute equal variance to signal matrices - default is FALSE |
ind_rank1 |
integer for rank of the individual signal in first data set, i.e., number of joint components |
ind_rank2 |
integer for rank of the individual signal in second data set, i.e., number of joint components |
SVD.plots |
logical (TRUE/FALSE) for whether plots of singular values from signal should be produced - used to confirm number of components |
Error |
logical (TRUE/FALSE) final data sets should be noise contaminated - default is FALSE; use TRUE to obtain pure signal datasets |
print.cor |
logical (TRUE/FALSE) for whether to print matrix of correlations between subject scores) |
A 'list' object which contains 1) list of signal matrices which additively comprise the simulated data sets, i.e. joint, individual, and error matrices for each data set; 2) list of simulated data sets (each equal to the sum of the matrices in part 1); 3) list of joint subject scores and individual subject scores for each data set, and 4) lsit of joint and individual loadings for each data set
ToyDat = GenerateToyData(n = 200, p1 = 2000, p2 = 1000, JntVarEx1 = 0.05, JntVarEx2 = 0.05, IndVarEx1 = 0.25, IndVarEx2 = 0.25, jnt_rank = 1, equal.eig = FALSE, ind_rank1 = 2, ind_rank2 = 3, SVD.plots = TRUE, Error = TRUE, print.cor = TRUE)
ToyDat = GenerateToyData(n = 200, p1 = 2000, p2 = 1000, JntVarEx1 = 0.05, JntVarEx2 = 0.05, IndVarEx1 = 0.25, IndVarEx2 = 0.25, jnt_rank = 1, equal.eig = FALSE, ind_rank1 = 2, ind_rank2 = 3, SVD.plots = TRUE, Error = TRUE, print.cor = TRUE)
Retrives and compiles results from simulation study which are stored in a directory. A directory should contain separate .csv files (one per replicate), each of which will include all evaluation metrics and most experimental settings for that particular replicate. For the CJIVE manuscript, a directory houses results of all 100 replicates for each combination of experimental factors.
GetSimResults_Dir(sim.dir, p1, p2, Preds = FALSE)
GetSimResults_Dir(sim.dir, p1, p2, Preds = FALSE)
sim.dir |
(character string) file path for the directory from which results will be retrieved |
p1 |
number of features in data set 1 |
p2 |
number of features in data set 2 |
Preds |
(logical) do the replicate results contain correlations between predicted and true joint subject scores. Default is FALSE |
upper triangular p-by-p matrix
Graphically displays the center and spread of chordal norms for joint/individual subject score subspaces
gg.corr.plot(cor.dat, cols, show.legend = FALSE, text.size)
gg.corr.plot(cor.dat, cols, show.legend = FALSE, text.size)
cor.dat |
data frame with at least the 5 following variables: Norm - the value of the norm for a particular subspace; Type - the subspace for which the norm is given (i.e., joint/individual score/loading for dataset X1 or X2 (except for joint scores)) Method - the method by which the subspace was estimated, e.g. CJIVE, AJIVE, R.JIVE JVE_1 and JVE_2 - labels describing the proportion of joint variation explained in each dataset (and typically the number of variables in dataset X2) |
cols |
a vector of colors, must have length equal to the number of methods used in the simulation |
show.legend |
logical (TRUE/FALSE) for whether a legend should be included in the plot. Default is FALSE |
text.size |
numeric value for the font size |
graphical display (via ggplot2)
Graphically displays the center and spread of chordal norms for joint/individual variable loading subspaces
gg.load.norm.plot( norm.dat, cols, show.legend = FALSE, text.size, lty = 1, y.max = 1, x.lab.angle = 70 )
gg.load.norm.plot( norm.dat, cols, show.legend = FALSE, text.size, lty = 1, y.max = 1, x.lab.angle = 70 )
norm.dat |
data frame with at least the 5 following variables: Norm - the value of the norm for a particular subspace; Type - the subspace for which the norm is given (i.e., joint/individual variable loadings for dataset X1 or X2) Method - the method by which the subspace was estimated, e.g. CJIVE, AJIVE, R.JIVE JVE_1 and JVE_2 - labels describing the proportion of joint variation explained in each dataset (and typically the number of variables in dataset X2) |
cols |
a vector of colors, must have length equal to the number of methods used in the simulation |
show.legend |
logical (TRUE/FALSE) for whether a legend should be included in the plot. Default is FALSE |
text.size |
numeric value for the font size |
lty |
linetype (see ggplot2). Default = 1 |
y.max |
maximum value for the horizontal axis of the plot |
x.lab.angle |
angle at which x-axis labels are tilted |
graphical display (via ggplot2)
Graphically displays the center and spread of chordal norms for joint/individual score/loading subspaces
gg.norm.plot( norm.dat, cols, show.legend = FALSE, text.size, lty = 1, y.max = 1, x.lab.angle = 70 )
gg.norm.plot( norm.dat, cols, show.legend = FALSE, text.size, lty = 1, y.max = 1, x.lab.angle = 70 )
norm.dat |
data frame with at least the 5 following variables: Norm - the value of the norm for a particular subspace; Type - the subspace for which the norm is given (i.e., joint/individual score/loading for dataset X1 or X2 (except for joint scores)) Method - the method by which the subspace was estimated, e.g. CJIVE, AJIVE, R.JIVE JVE_1 and JVE_2 - labels describing the proportion of joint variation explained in each dataset (and typically the number of variables in dataset X2) |
cols |
a vector of colors, must have length equal to the number of methods used in the simulation |
show.legend |
logical (TRUE/FALSE) for whether a legend should be included in the plot. Default is FALSE |
text.size |
numeric value for the font size |
lty |
linetype (see ggplot2). Default = 1 |
y.max |
maximum value for the horizontal axis of the plot |
x.lab.angle |
angle at which x-axis labels are tilted |
graphical display (via ggplot2)
Graphically displays the count of joint ranks selected by each method employed in the simulation study described in the CJIVE manuscript
gg.rank.plot(rank.dat, cols, show.legend = FALSE, text.size, num.sims)
gg.rank.plot(rank.dat, cols, show.legend = FALSE, text.size, num.sims)
rank.dat |
data frame expected to be built with the functions dplyr::count and tidyr::complete, which should include the following variables Rank - numeric values of the rank selected by each method in each replicate simulation n - the number of times this value was selected as the rank Type - the subspace for which the norm is given (i.e., joint/individual score/loading for dataset X1 or X2 (except for joint scores)) Method - the method by which the subspace was estimated, e.g. CJIVE, AJIVE, R.JIVE JVE_1 and JVE_2 - labels describing the proportion of joint variation explained in each dataset (and typically the number of variables in dataset X2) |
cols |
a vector of colors, must have length equal to the number of methods used in the simulation |
show.legend |
logical (TRUE/FALSE) for whether a legend should be included in the plot. Default is FALSE |
text.size |
numeric value for the font size |
num.sims |
numeric value for the number of replicates evaluated in each full combination of experimental settings |
graphical display (via ggplot2)
Graphically displays the center and spread of chordal norms for joint/individual subject score subspaces
gg.score.norm.plot( norm.dat, cols, show.legend = FALSE, text.size, lty = 1, y.max = 1, x.lab.angle = 70 )
gg.score.norm.plot( norm.dat, cols, show.legend = FALSE, text.size, lty = 1, y.max = 1, x.lab.angle = 70 )
norm.dat |
data frame with at least the 5 following variables: Norm - the value of the norm for a particular subspace; Type - the subspace for which the norm is given (i.e., joint and individual subject scores for dataset X1 or X2 (except joint scores, which are for both datasets)) Method - the method by which the subspace was estimated, e.g. CJIVE, AJIVE, R.JIVE JVE_1 and JVE_2 - labels describing the proportion of joint variation explained in each dataset (and typically the number of variables in dataset X2) |
cols |
a vector of colors, must have length equal to the number of methods used in the simulation |
show.legend |
logical (TRUE/FALSE) for whether a legend should be included in the plot. Default is FALSE |
text.size |
numeric value for the font size |
lty |
linetype (see ggplot2). Default = 1 |
y.max |
maximum value for the horizontal axis of the plot |
x.lab.angle |
angle at which x-axis labels are tilted |
graphical display (via ggplot2)
Calculates the Frobenius norm of a matrix, which can be used as a measure of total variation
MatVar(X)
MatVar(X)
X |
a matrix of any size |
The Frobenius norm of the matrix X, calculated as the square root of the sum of squared entries in X
X = matrix(rnorm(10), 5,2) MatVar(X)
X = matrix(rnorm(10), 5,2) MatVar(X)
Calculates the Frobenius norm of a matrix, which can be used as a measure of total variation
MatVar2(X)
MatVar2(X)
X |
a matrix of any size |
The Frobenius norm of the matrix X, calculated as the square root of the trace of t(X)
X = matrix(rnorm(10), 5,2) MatVar2(X)
X = matrix(rnorm(10), 5,2) MatVar2(X)
Converts correlations of predicted to true joint subject scores into a format conducive to ggplot2
Melt.Sim.Cors(sim.dat, r.J, p1, p2)
Melt.Sim.Cors(sim.dat, r.J, p1, p2)
sim.dat |
matrix with each row representing results from a replicate in the simulation study described in CJIVE manuscript |
r.J |
(Numeric/integer) the joint rank, i.e. number of components in the joint subspace |
p1 |
number of variables/features in data set X1 |
p2 |
number of variables/features in data set X2 |
data frame with seven columns: one each for the joint variance explained in each data set, one column containing the method by which predictions were obtained, one column containing the component number (1,...,r.J),
Conducts the permutation test for the number of joint components as described in CJIVE manuscript. Briefly, canonical correlations (CC) between principal component vectors of the data are obtained (PC). Then for 1:nperms, the rows of one data set are permuted and CCs between PC vectors are calculated, retaining the maximum CC. These maximum CCs form a null distribution against which the original CCs are tested. The number of original CCs exceeding the (1-alpha)^th percentile is the returned as the joint rank.
perm.jntrank( dat.blocks, signal.ranks = NULL, nperms = 500, perc.var = 0.95, alpha = 0.05, center = TRUE )
perm.jntrank( dat.blocks, signal.ranks = NULL, nperms = 500, perc.var = 0.95, alpha = 0.05, center = TRUE )
dat.blocks |
a list of two matrices with samples along rows and features along columns, which contain data on the same n individuals/sampling units |
signal.ranks |
a vector of length two which contains the rank for the signal within each data block. The rank corresponds to the number of principal components (PCs) to be retained within each data block. If NULL, the ranks are determined by the parameter 'perc.var.' Default is NULL |
nperms |
integer value indicating the number of permutations that should be performed |
perc.var |
numeric value of either a scalar or of length 2: an alternative to signal.ranks that allows specification of signal ranks based on the desired proportion of total variation to be retained in each data block. For perc.var = p (where 0<p<1), rank is determined as the minimum number of eigenvalues whose cumulative sum is at least p*(total sum of eigenvalues). Default is 0.95 (i.e. 95% of total variation preserved for each data block). For p=c(p1,p2) pk is used to determine the rank of block k |
alpha |
nominal type-I error rate |
center |
logical (TRUE/FALSE) indicating whether data should be column-centered prior to testing. Default is TRUE |
The Frobenius norm of the matrix X, calculated as the sum of square entries in X
Scale loadings for a joint or individual component by its largest absolute value resulting in loadings between -1 and 1. Loadings are also sign-corrected to result in positive skewness
scale_loadings(loading.comp)
scale_loadings(loading.comp)
loading.comp |
numeric vector of variable loadings from a JIVE analysis |
numeric vector of loadings which have been scaled and sign-corrected
Visual display of a matrix as a heatmap with colors determined by entry values, and including a colorbar to aid interpretation of the heatmap
show.image.2( Image, ylab = "", xlab = "", net = FALSE, main = "", sub = "", colorbar = TRUE )
show.image.2( Image, ylab = "", xlab = "", net = FALSE, main = "", sub = "", colorbar = TRUE )
Image |
matrix to display |
ylab |
lab for y-axis of heatmap |
xlab |
lab for x-axis of heatmap |
net |
logical (TRUE/FALUSE) of whether entries correspond to edges between regions of interest in the Power-264 brain atlas. Default is FALSE |
main |
main title for heatmap |
sub |
subtitle for heatmap |
colorbar |
logical (TRUE/FALUSE) of whether colorabar shouldl be included to aid interpretation. Default is TRUE |
graphical display of matrix as a heatmap
Conducts AJIVE estimation under the assumption that all ranks are known and no components are discarded
sjive(blocks, signal_ranks, joint.rank, joint_scores = NULL)
sjive(blocks, signal_ranks, joint.rank, joint_scores = NULL)
blocks |
list of data blocks, i.e. matrices, all having the same number of rows, which correspond to the same sampling units (i.e. study participants, patients, etc.) |
signal_ranks |
numerical vector of the same length as 'blocks' with each entry corresponding to the rank of the respective matrix in 'blocks' |
joint.rank |
integer value corresponding to the rank of the joint signal subspace, i.e. number of components in the signal subspace |
joint_scores |
numerical matrix containing joint subject scores if they were calculated by some other method, e.g. Canonical Correlation of PC scores. Must have the same number of rows as each matrix in 'blocks' and number of columns equal to 'joint_rank'. If NULL, joint scores are calculated and returned. Default is NULL. |
list of 4 or 5 items: 1) joint signal matrices, their SVDs, and the proportion of total variation in each matrix that is attributable to the joint signal 2) individual signal matrices, their SVDs, and the proportion of total variation in each matrix that is attributable to the individual signal 3) concatenated PC scores, used to determine joint subspace 4) projection matrix for joint subspace 5) joint subject scores (only returned if not provided initially)
Converts a vector of size p choose 2 into a p-by-p lower triangular matrix
vec2net.l(invector)
vec2net.l(invector)
invector |
numeric vector of size p choose 2 |
lower triangular p-by-p matrix
Converts a vector of size p choose 2 into a p-by-p upper triangular matrix
vec2net.u(invector)
vec2net.u(invector)
invector |
numeric vector of size p choose 2 |
upper triangular p-by-p matrix