\(\Theta \in \mathbb{R}^{N \times T \times L}\) is the parameter tensor \(A = (a_1, \ldots, a_r) \in \mathbb{R}^{N \times r}\) is the subject-topic matrix \(B = (b_1, \ldots, b_r) \in \mathbb{R}^{T \times r}\) is the age-topic matrix \(C = (c_1, \ldots, c_r) \in \mathbb{R}^{L \times r}\) is the disease-topic matrix \(\lambda_f\) represents the importance of each topic
\(\Theta \in \mathbb{R}^{N \times T \times L}\) is the parameter tensor \(A = (a_1, \ldots, a_r) \in \mathbb{R}^{N \times r}\) is the subject-topic matrix \(B = (b_1, \ldots, b_r) \in \mathbb{R}^{T \times r}\) is the age-topic matrix \(C = (c_1, \ldots, c_r) \in \mathbb{R}^{L \times r}\) is the disease-topic matrix \(\lambda_f\) represents the importance of each topic
For a specific entry: \[\theta_{i,t,l} = \sum_{f=1}^r \lambda_f a_{i,f} b_{t,f} c_{l,f}\]
Where:
Only \(b_{t,f}\) varies with time \(c_{l,f}\) (disease loading) is constant over time Time variation is projected onto the same factor (\(b_{t,f}\)) for both topics and diseases
\(\lambda_{ik}(t)\) represents the topic weight for individual \(i\) and topic \(k\) at time \(t\)\(\phi_{kd}(t)\) represents the disease loading for topic \(k\) and disease \(d\) at time \(t\)\(K_{\ell, \sigma^2}(t, t')\) is the covariance function for the Gaussian Process
The key difference in our model is that \(\lambda_{ik}(t)\) and \(\phi_{kd}(t)\) are modeled as separate Gaussian Processes, allowing for distinct temporal patterns.
In contrast, the Tucker decomposition approach you mentioned earlier constrains the time variation pattern: \[
\text{Lambda}[i, k, t] = U_1[i, k] \cdot U_3[t, k]
\]\[
\text{Phi}[k, d, t] = U_2[d, k] \cdot U_3[t, k]
\] Here, \(U_3[t, k]\) imposes the same temporal pattern on both topic weights and disease loadings for a given topic \(k\). Your approach provides more flexibility by allowing:
Different temporal patterns for topic weights and disease loadings Individual-specific temporal patterns for topic weights Disease-specific temporal patterns for loadings within each topic
This increased flexibility in modeling temporal patterns separately for topic weights and disease loadings is a key advantage of our approach over standard tensor decomposition methods.
Key Differences:
Both λ_ik(t) (topic weights) and φ_kd(t) (disease loadings) are modeled as separate Gaussian Processes, allowing for distinct temporal patterns.
The covariance structure U_k for φ_k captures disease correlation patterns over time, which is different from projecting both topics and diseases onto the same time factor (U3) as in standard tensor decomposition.
Our model allows for:
Different temporal patterns for topic weights and disease loadings
Individual-specific temporal patterns for topic weights
Disease-specific temporal patterns for loadings within each topic
Explicit modeling of disease correlations over time through U_k
This formulation provides significantly more flexibility in modeling temporal patterns and disease interactions compared to the standard tensor decomposition approach. It allows for a more nuanced representation of how both topic weights and disease loadings evolve over time, as well as how diseases correlate within topics across time.
Code
library(rTensor)library(ggplot2)library(reshape2)library(dplyr)library(data.table)set.seed(123)# DimensionsN <-50# Number of individualsD <-20# Number of diseasesT <-30# Number of time pointsK <-3# Number of latent factors/topics# Create synthetic datacreate_synthetic_data <-function(N, D, T, K) {# Individual-topic preferences (varying over time) individual_preferences <-array(0, dim =c(N, K, T))for(i in1:N) {for(k in1:K) { individual_preferences[i,k,] <-cumsum(rnorm(T, 0, 0.1)) +sin(seq(0, runif(1,min =0.5,max =5)*pi, length.out = T)) } }# Disease-topic associations (varying over time) disease_associations <-array(0, dim =c(D, K, T))for(d in1:D) {for(k in1:K) { disease_associations[d,k,] <-cumsum(rnorm(T, 0, 0.1)) +cos(seq(0, runif(1,min =0.5,max =5)*pi, length.out = T)) } }# Create tensor tensor <-array(0, dim =c(N, D, T))for(i in1:N) {for(d in1:D) {for(t in1:T) { tensor[i,d,t] <-sum(individual_preferences[i,,t] * disease_associations[d,,t]) } } }return(list(tensor = tensor, individual_preferences = individual_preferences,disease_associations = disease_associations))}# Generate datadata <-create_synthetic_data(N, D, T, K)# Apply Tucker decompositiontucker_result <-tucker(as.tensor(data$tensor), ranks =c(K, K, K))
Now we simulate under the tucker situation in which the disease loadings are constant over time and the time variation is projected onto the same factor for both topics and diseases.
Code
## tucker compoativble# DimensionsN <-50# Number of individualsD <-20# Number of diseasesT <-30# Number of time pointsK <-3# Number of latent factors/topics# Create Tucker-compatible synthetic datacreate_tucker_compatible_data <-function(N, D, T, K) {# Fixed individual-topic preferences U1 <-matrix(runif(N * K), N, K)# Fixed disease-topic associations U2 <-matrix(runif(D * K), D, K)# Time-varying topic strengths U3 <-matrix(0, T, K)for(k in1:K) { U3[,k] <-cumsum(rnorm(T, 0, 0.1)) +sin(seq(0, 2*pi, length.out = T)) }# Core tensor G <-array(runif(K^3), dim =c(K, K, K))# Create tensor tensor <-ttl(as.tensor(G), list(U1, U2, U3), ms =c(1,2,3))return(list(tensor = tensor@data, U1 = U1, U2 = U2, U3 = U3, G = G))}# Generate Tucker-compatible datatucker_data <-create_tucker_compatible_data(N, D, T, K)# Apply Tucker decompositiontucker_result <-tucker(as.tensor(tucker_data$tensor), ranks =c(K, K, K))