2. Tractable density DGM Recall that c i , j is considered as evaluated value of pixel x i , j and h i , j is predictive value of pixel x i , j . It is interesting that h i , j is generated pixel within the aforementioned generation process by PD. Turning back the generation process, without loss of generality, given k randomized pixels x i –1,1 , x i –1,2 ,.., x i –1, j +1 ,…, x i ,1 , x i ,2 ,…, x i , j , we will generate the next pixel x i , j +1 . Firstly, PD model must be trained by some dataset as a set of images. Secondly, k randomized pixels x i –1,1 , x i –1,2 ,.., x i –1, j +1 ,…, x i ,1 , x i ,2 ,…, x i , j are fed to PD again so as to update k sets of parameters W (.) , U (.) , and b (.) as well as compute k predictive values h i –1,1 , h i –1,2 ,.., h i –1, j +1 ,…, h i ,1 , h i ,2 ,…, h i , j . Finally, it is possible to determine the predictive value h i , j +1 of the next pixel ( i , j +1) given x i , j +1 , h i , j , and h i –1, j +1 along with learned parameters of two-dimension LSTM PD. It is important to note that x i , j +1 is randomized arbitrarily whereas h i , j and h i –1, j +1 were computed previously. Obviously, it is easy to generate next predictive values h i , j +2 , h i , j +2 ,…, h i +1, j , h i +1, j +1 , etc. by the similar process. Note, backpropagation algorithm can be applied into learning two-dimension LSTM as usual. Note, backpropagation algorithm is often associated with stochastic gradient descent (SGD) method and so, please pay attention to SGD. 5/15/2024 Tutorial DGM - Loc Nguyen 59