Trending Topics on Science, a Tensor Memory Hypothesis Approach

The current human knowledge is written. Documenting is the most used manner to preserve memories and to store fantastic stories. Thus, to distinguish the reality from fiction, the scientific writing cites previous works moreover than become form experimental setups. Books and scientific papers are only a small part of the existent literature but are considered more thrust as information sources. It is useful to find more relations and to know where to focus the lookup of a topic using the information about the authors and the keywords on the titles and abstracts. This is possible using relational databases or knowledge graphs, a semantic approach, but with the tensor memory hypothesis, that adds a temporal dimension, is possible to process the information with an episodic memory approach. If well, knowledge graphs are of extended use on question answering and chatbots, they need a previous relational schema generated automatically or by-hand and stored in an easy-to-query file format. I use JATS, a standard format that allows integrating scientific papers in semantic searches but is not spread on all scientific publishers, to extract the markup tags from PDF files, current year journal articles of one particular topic, and then construct the tensors memory with their references to extract relations and predictions with statistical relational learning techniques. Introduction Memory is defined as the ability to record information and after recall it. Writing is a human invention that facilitates this capacity in particular for declarative memories that are facts or events that can be expressed with language and it could be of two types: semantic or episodic (Tresp et al., 2017). The memories and knowledge of humanity are stored on written documents, getting more reliability if they include references to previous works from others authors. Scientific articles are the model of well-structured presentation and storage of information, each one of them with an own title, explicit authorship, and references to information related to other documents or within the same document. But, what almost always is relevant for the consideration of reading them, the retrieval action, is their publishing year. Thus, their ordered structure makes possible to use them as a representation of global human episodic knowledge and memories. Also, scientific publication as a human activity could be modeled as a social network. From this kind of networks the expression “trending topic” emerged to call the more frequent term or word used in a specific temporal window and it is understood as the principal theme or main subject that is related to the information described in a piece of content. In a mathematical and computational framework, semantic memories could be represented as knowledge graphs, where the entities are nodes and the links are relations between them. A relation between entities is then possible to define as a triple (s, p, o) or as a simple sentence subject-predicate-objective. An episodic memory adds a time marker, thus a temporal prepositional phrase is added to the simple sentence: subject-predicate-objective-temporal_preposition Proceedings of the 4th Congress on Robotics and Neuroscience or a quad (s, p, o, t). This approach is widely used on semantic web technologies under the Linked Datamethodology (Bizer et al., 2011). Thus, it is plausible to use complex networks analysis tools to search for the most relevant relations between authors, paper titles or keywords. The scientific publication databases can easily contain millions of authors, papers and their respective citations. A reduced number of relevant documents is expected from a specific topic query, and not thousands of results that search engines like Google Scholar or publisher’s own engines could generate for a given chain of words. The field of science of science studies these relations and the former works were realized using knowledge graphs, that are expressed as adjacency matrices. If the temporal dimension and various types of relationships are considered, then its possible to form tensors of fourth order. A matrix X of the network could be bipartite (X ∈ Rn×m) if there are two types of nodes (authors-articles, authors-words, articles-words) or monopartite (X ∈ Rn×n); unweighted ( xij ∈ {0, 1}) or weighted (xij ∈ R), directed or undirected (XT = X) (Zeng et al., 2017). (Tresp and Ma, 2017) introduced the Tensor Memory Hypothesis, where a knowledge graph is represented by a Tucker decomposition of the tensors. It is based on representational learning, i.e, a discrete entity e is associated with a vector of real numbers ae called latent variables. (Tresp and Ma, 2017) also argue that representational learning might also be the basis for perception, planning and decision making. From a physiological point of view, there is evidence that the hippocampus plays a central role in the temporal organization of memories and supports the disambiguation of overlapping episodes (Eichenbaum, 2014a), then in the standard consolidation of memory theory (SCT), the episodic memory is a neocortical representation that arises from hippocampal activity while in the multiple trace theory (MTT) the episodic memory is only represented on the hippocampus and is used to form semantic memories on the neocortex. Also, there is evidence of the existence of “place cells” and “time cells”in the hippocampus and that these support associative networks that represent spatiotemporal relations between the entities of memories (Eichenbaum, 2014b). Table 1. PCA variance for the number of latent components. Latent Components PCA variance (%) 3 2.93 5 4.3 10 7.32 15 10.03 20 12.5 25 14.8 50 24.99 100 41.88 200 63.9 There are some previous works on trending or hot topics in science: (Griffiths and Steyvers, 2004) used Latent Dirichlet Allocation (LDA) to analyze the abstracts from Proceedings on the National Academy of Sciences (PNAS) from 1991 to 2001. (Wei et al., 2013) performed a statistical analysis to find if scientists follow hot topics on their investigations, they used published papers from the American Physical Society (APS) Physical Review journals beginning in 1976 and ending in 2009. (Kang and Lin, 2018) used non-smooth non-negative matrix factorization (snNMF) to extract themore prominent topics from a dataset of keywords from scientific articles related to "Machine Learning" from 2014 to 2016 in arXiv.org stat.ML, the similarity of this work with the Tensor Memory Hypothesis belongs to the use of matrix decomposition to reduce the rank of the matrix. (Alshareef et al., 2018) indexes based on cosine similarity to estimate a score that represents the anticipation of a prospective relationship between authors. They used two subsets of the IEEE digital library containing the keywords “database” and “multimedia”. Results The quantity of latent components is not associated with a specific statistical measure of data. However, to have an approach, table 1 presents the correspondent percentage of variance if the same number of PCA components were employed. Proceedings of the 4th Congress on Robotics and Neuroscience Table 2. Most probable words for the query with an entity type. Entity Type Latent Components Authors Articles Words 3 neuromodulation neuromodulation neuromodulation 5 stimulus, presented stimulus, presented stimulus, technique 10 presented presented presented 15 sleep, memory sleep sleep 20 stimulus, memory stimulus, cued stimulus, cued 25 memory, sws memory, spatial, sws memory, sws 50 sleep, stimulus sleep, stimulus sleep, stimulus 100 assr, memory assr, memory assr, memory 200 wireless, monitoring sleep, slow sleep, slow Table 3. Most probable word with NMF decomposition. Entity Type Latent Components Authors Articles Words 3 slow, sleep, auditory stimulation, sleep sleep, memory 5 spindles, auditory, sleep sleep sleep 10 sleep, stimulation sleep, stimulation sleep, memory 15 sleep, memory brain, consolidation sleep, memory 20 sleep, memory oscillations, sleep sleep, memory 25 sleep, stimulation activity, memory sleep, memory 50 sleep, memory oscillations, humans sleep, memory 100 sleep, role reactivation, slow-wave sleep, memory 200 sleep, slow sleep, brain sleep, memory The words with more relations in the complete tensor, before decomposition, are sleep, memory, stimulation, slow, brain, consolidation, auditory, spindles, reactivation, and activity. Table 2 is populated using a selection strategy of most frequently word from queries of the type wordi = argmaxo{P (s, o, t)}, (1) where s is each author, paper title or word in the database, o a word, t a year and, i is the index of a entity . The most probable words, from the same queries, using more latent components are more than using a few latent variables. For example, there are 21 different words from query results using 200 latent components. In the other hand for few latent components, the results of queries are only the words shown in table 2. Table 3 is populated using the of NMF decomposition in the collapsed on time matrix, adding the weights of each year. The more frequently words are selected from which are maximum for each topic or k-row in the matrix H of the decompositions. The same processing using nsNMF decomposition results with the words sleep and memory as the most probable for all the cases. The analysis of relationships between entities needs a metric of distance. Each entity is represented by latent vectors, then one metric selection could be the Euclidean distance but given this particular type of data, content from documents, the usual metric employed is the cosine similarity. However, the use of distances on the original data space demand high computational costs, the use of a reduced space alleviates the computational cost of calculating distances but requires a previous high cost of space transformation. Figure 1 is an example of the Euclidean Proceedings of the 4th Congress on Robotics 


Introduction
Memory is defined as the ability to record information and after recall it. Writing is a human invention that facilitates this capacity in particular for declarative memories that are facts or events that can be expressed with language and it could be of two types: semantic or episodic (Tresp et al., 2017).
The memories and knowledge of humanity are stored on written documents, getting more reliability if they include references to previous works from others authors. Scientific articles are the model of well-structured presentation and storage of information, each one of them with an own title, explicit authorship, and references to information related to other documents or within the same document. But, what almost always is relevant for the consideration of reading them, the retrieval action, is their publishing year. Thus, their ordered structure makes possible to use them as a representation of global human episodic knowledge and memories. Also, scientific publication as a human activity could be modeled as a social network. From this kind of networks the expression "trending topic" emerged to call the more frequent term or word used in a specific temporal window and it is understood as the principal theme or main subject that is related to the information described in a piece of content.
In a mathematical and computational framework, semantic memories could be represented as knowledge graphs, where the entities are nodes and the links are relations between them. A relation between entities is then possible to define as a triple ( , , ) or as a simple sentence subject-predicate-objective. An episodic memory adds a time marker, thus a temporal prepositional phrase is added to the simple sentence: subject-predicate-objective-temporal_preposition or a quad ( , , , ). This approach is widely used on semantic web technologies under the Linked Data methodology (Bizer et al., 2011).
Thus, it is plausible to use complex networks analysis tools to search for the most relevant relations between authors, paper titles or keywords. The scientific publication databases can easily contain millions of authors, papers and their respective citations. A reduced number of relevant documents is expected from a specific topic query, and not thousands of results that search engines like Google Scholar or publisher's own engines could generate for a given chain of words. The field of science of science studies these relations and the former works were realized using knowledge graphs, that are expressed as adjacency matrices. If the temporal dimension and various types of relationships are considered, then its possible to form tensors of fourth order. A matrix of the network could be bipartite ( ∈ ℝ × ) if there are two types of nodes (authors-articles, authors-words, articles-words) or monopartite ( ∈ ℝ × ); unweighted ( ∈ {0, 1}) or weighted ( ∈ ℝ), directed or undirected ( = ) (Zeng et al., 2017). (Tresp and Ma, 2017) introduced the Tensor Memory Hypothesis, where a knowledge graph is represented by a Tucker decomposition of the tensors. It is based on representational learning, i.e, a discrete entity is associated with a vector of real numbers called latent variables. (Tresp and Ma, 2017) also argue that representational learning might also be the basis for perception, planning and decision making. From a physiological point of view, there is evidence that the hippocampus plays a central role in the temporal organization of memories and supports the disambiguation of overlapping episodes (Eichenbaum, 2014a), then in the standard consolidation of memory theory (SCT), the episodic memory is a neocortical representation that arises from hippocampal activity while in the multiple trace theory (MTT) the episodic memory is only represented on the hippocampus and is used to form semantic memories on the neocortex. Also, there is evidence of the existence of "place cells" and "time cells"in the hippocampus and that these support associative networks that represent spatiotemporal relations between the entities of memories (Eichenbaum, 2014b). There are some previous works on trending or hot topics in science: (Griffiths and Steyvers, 2004) (Kang and Lin, 2018) used non-smooth non-negative matrix factorization (snNMF) to extract the more prominent topics from a dataset of keywords from scientific articles related to "Machine Learning" from 2014 to 2016 in arXiv.org stat.ML, the similarity of this work with the Tensor Memory Hypothesis belongs to the use of matrix decomposition to reduce the rank of the matrix. (Alshareef et al., 2018) indexes based on cosine similarity to estimate a score that represents the anticipation of a prospective relationship between authors. They used two subsets of the IEEE digital library containing the keywords "database" and "multimedia".

Results
The quantity of latent components is not associated with a specific statistical measure of data. However, to have an approach, table 1 presents the correspondent percentage of variance if the same number of PCA components were employed.  The words with more relations in the complete tensor, before decomposition, are sleep, memory, stimulation, slow, brain, consolidation, auditory, spindles, reactivation, and activity. Table 2 is populated using a selection strategy of most frequently word from queries of the type where is each author, paper title or word in the database, a word, a year and, is the index of a entity . The most probable words, from the same queries, using more latent components are more than using a few latent variables. For example, there are 21 different words from query results using 200 latent components. In the other hand for few latent components, the results of queries are only the words shown in table 2. Table 3 is populated using the of NMF decomposition in the collapsed on time matrix, adding the weights of each year. The more frequently words are selected from which are maximum for each topic or k-row in the matrix H of the decompositions. The same processing using nsNMF decomposition results with the words sleep and memory as the most probable for all the cases.
The analysis of relationships between entities needs a metric of distance. Each entity is represented by latent vectors, then one metric selection could be the Euclidean distance but given this particular type of data, content from documents, the usual metric employed is the cosine similarity. However, the use of distances on the original data space demand high computational costs, the use of a reduced space alleviates the computational cost of calculating distances but requires a previous high cost of space transformation. Figure 1 is an example of the Euclidean distance and cosine similarity that was extracted from the tensor of the RESCAL factorization. The difference between the years of sources and the years of only cited papers is most evident with less latent components. Moreover, the similarity is greater, then lesser Euclidean distance, between the entities of the previous years.

Discussion
There are scientific papers meta-data databases or it is possible to extract article's meta-data from a specific journal or publisher. But in practice, it is usual to have few references from a previous search and they are from different journals or publishers, then to extract the meta-data I used the JATS format 1 , a semantic web standard format for scientific papers popularized by National Center for Biotechnology Information (NCBI). A most popular format is the Resource Description Framework (RDF) and various scientific publishers are adopting this one.
The analysis of the statistical features of the tensor without any other process could give information of the most related entities, as the most cited author, most cited article or most used word in each slice of time. However, employing a tensor decomposition technique allows the use of a latent components space, where more information could be extracted given that the relationships are expressed in fewer variables, thus, clustering some properties of data. This work is an example of how from a small sample of documents with a known relationship between them, the topic was already known, some words that are not the most frequent could be extracted and provide a new perspective of the topics covered on the documents. The figure 1 is an example of extracted information that is not easy to visualize in the original space of the data. The comparison of different tensor decompositions and the search for the optimum number of latent components is work to be done to take advantage of relational data, that due to semantic web technologies is not restricted only to formal scientific documents and it is available for various type of data. Also, the proposal of (Tresp et al., 2017) of considering the knowledge graphs as semantic and episodic memories allows having a framework that links computational memory with the biological one. Its capacities and defects need to be explored. Curiously, the etymology of "topic" comes from the Greek topos or place, that as memory is other of the known hippocampus cognitive functions.
Finally, from the results obtained is evident that sleep and memory are the most relevant words of the selected papers, these words and slow are the few words that are the result from queries too with RESCAL decomposition. The nsNMF decomposition gives for any number of components the same words, then it is more robust to the change in the number of components.

Data extraction
The meta-data from 11 articles from different publishers (Table 4) related to "Stimulation during NREM sleep" in PDF files was obtained using the software CERMINE (Tkaczyk et al., 2015) and stored in JATS format. After, with a Python script, the own title, authors and abstract were extracted and also the title and authors of references inside the time range 2008-2018. Later, the titles and abstracts were tokenized and semantically tagged, using nltk library, to extract the adjectives and nouns that are considered the principal terms of the articles. For de-duplicating authors, all names are formatted to "(Last name) (First name initial.) (Middle name initial.)" For de-duplication of titles and words, all words were transformed to lowercase and special characters were eliminated.
For each year, a zeros square matrix ∈ ℝ ( + + )×( + + ) was populated with weighted and directed values using the next rules only in the -year correspondent to relations: ℎ co-wrote with ℎ + = 2 ℎ cited ℎ , + = 1 ℎ wrote , + = 2 ℎ cited , + = 1 ℎ wrote , + = 1 cited , + = 1 contained , + = 1 is in the same document of , + = 1 This approach for expressing the relations simplifies the tensor representation because the dimension correspondent to predicate are intrinsic on the weighted values and allows the use of RESCAL factorization. (Ma et al., 2018) explain other tensor decomposition methods that could be used to get the latent components.

Cosine similarity
The cosine similarity is an adequate distance metric for vectors where the magnitude is dependent on the size of the sample, as the frequency of words in a document.
Tensor memory hypothesis A fourth order tensor could be decomposed as The probability of the existence of the relationship between the entities of a quad is given by Where , 4 ( 1 , 2 , 3 , 4 ).
The analysis of tensors, as for matrices, is possible to perform using a reduced form obtained by factorization. One popular factorization method of tensors is the Tucker representation, however, there are other matrices and tensor decomposition algorithms. Here, I used RESCAL and the construction of the tensor with weighted values allow to omit the predicate dimension, then the characteristic function becomes

RESCAL
This tensor decomposition was proposed by (Nickel, 2013). The decomposed tensor needs to have two dimensions of the same size, i.e., ∈ ℝ × × and the results are a matrix ∈ ℝ × and a tensor ∈ ℝ × × . ≈ × 1 × 2 , = . (9) The algorithm is an alternating least squares (ALS) procedure where the outputs are updated with: Where is a slice of the tensor and for optimization a singular value decomposition of matrix is employed. is the matrix such that ( ( )) =̂ , which can be constructed by rearranging the diagonal entries of̂ via the inverse vectorization operator −1 (⋅) Then, for regularization, the Kronecker product of the diagonal matrix is employed.
Non-negative Matrix Factorization (NMF) This matrix factorization method finds two matrices ∈ ℝ × and ∈ ℝ × which multiplication minimizes the Froebenius norm with the original matrix ∈ ℝ × .
The updates using the algorithm proposed by (Lee and Seung, 2001) are: Proceedings of the 4th Congress on Robotics and Neuroscience Non-smooth Non-negative Matrix Factorization (nsNMF) This decomposition is a modification of NMF proposed by (Kang and Lin, 2018).
Where = (1 − ) + 11 , And using Finally, the matrix decomposition could be expressed as

Funding
This work was supported by Beca Doctorado Nacional Conicyt, Folio No 21180640.