Citation analysis

By Adam Santoro

The concept of pattern separation emerged from the computational literature as a biological manifestation of orthogonalization. The idea explains the process whereby input stimuli into a network are parsed into more distinct outputs. For example, if two input stimuli are represented by cell populations that overlap by 80%, then pattern separation will result in an output cell population that overlaps by <80%.  The idea is quite powerful. We have a general truth that unique cell population firing in the brain translates to  unique behaviors; thus, if pattern separation can reduce the overlap between cell populations then this can potentially induce different behaviors (such as an active response to one stimulus, but a supressed response to a very similar looking stimulus).

It is no coincidence that the pattern separation is eclectic. The paragraph above mentions mathematical concepts, processes at the level of the cell population, and behavior. There is a definite debate in the literature as to how each of these is linked (click here for my recent opinion on this topic). Because of the complexity of the topic, I wanted to see if some new insights could emerge from a citation analysis in the pattern separation literature. Which papers have been the most influential? What types of papers are they (computational/behavioral/etc.)? Do we have enough evidence to say that pattern separation exists at the level of cell populations?

The following data are purely exploratory, and I tried to not make any conclusions – instead, the reader is free to make his or her own conclusions based on what is presented.

Methods & Results

I performed a PubMed query to obtain the largest, but most relevant subset of pattern separation literature to sample:

("hippocampus" OR "hippocampal") AND ("spatial" or "conjunctive") AND "encoding"
OR
("hippocampus" OR "dentate") AND "pattern separation"

This query returned approximately ~750 papers at the time of the search (March 2012).  Each paper was manually searched for any mention of “pattern separation,” or variants thereof (i.e. “pattern separator”) If a statement was made (for example “The dentate gyrus is thought to partake in pattern separation [1][2]”) then the paper was assigned a unique citation ID. In addition, the statement was recorded, and the associated citations were also given unique IDs. If no citation was present, then it would be assigned a citation index of “N/A”.  Thus, if two papers cite the same source, this source would have a consistent identification number (similar to a PubMed ID).

Example:

Input: 
Paper "Pattern separation exists" contains the statement "the dentate gyrus is a pattern separator [Paper A, 1990] [Paper B, 1984]"
Paper "Pattern separation does not exist" contains the statement "the dentate gyrus is not a pattern separator [Paper A, 1990] [Paper C, 2003]"
Paper "Pattern separation might exist" contains the statement "the denate gyrus might pattern separate"

Output:
[1] --> [2][3]
[4] --> [2][5]
[6]-->[N/A]

172 citation IDs emerged.  Only a handful (4) of these papers were not in the original search, and were included because they were citated by a paper that was contained in the original search.  Importantly, however,  was the inclusion of 12 books that were cited, and were not in the original search (which only included peer reviewed articles).

The statements were then judged to be either positive, neutral, or critical towards the hypothesis that pattern separation occurs in the dentate gyrus. For example, a positive statement would use positive qualifiers:

"It is known that pattern separation occurs in the dentate gyrus"

…and a neutral statement:

"It has been suggested that pattern separation occurs in the dentate gyrus"

…and critical:

"Pattern separation does not occur in the dentate gyrus"

This is the most contentious aspect of the analysis, as is involves subjective interpretations of the english language. In addition, context is also important (if a paragraph introduces the topic as a hypothesis, then any positive language is negated as the author’s viewpoints are known to be neutral). Statements were evaluated twice, the second time being blind to any associated information (the statement’s authorship, article title, etc.) There are some obvious holes in this type of evaluation, but this was included purely for interest and can be ignored for the sake of the citation analysis.

Example:

Input:
Paper "Pattern separation exists" contains the statement "the dentate gyrus is a pattern separator [Paper A, 1990] [Paper B, 1984]"
Paper "Pattern separation does not exist" contains the statement "the dentate gyrus is not a pattern separator [Paper A, 1990] [Paper C, 2003]"

Output:
[1] --> [2P][3P]
[4] --> [2C][5C]

From this data set, the number of incoming and outgoing citations was quantified. Each ID was considered to be a node in the network, and a citation from one ID to another was an edge. (see Graph Theory) The resulting graph resembled a small world network: for the most part, nodes were not connected with each other; however, there were few papers with a large amount of edges, and so any paper can “reach” any other paper in a relatively small amount of steps.

Pattern Separation2

 

The direction of the edge in the above image is clockwise (thus, for a given arc that is angled clockwise, the ID at the beginning of the arc cited the ID at the end of the arc). The IDs with the most citations are on the outside of the circle. In-degree (amount of incoming edges to that ID) is indicated by the size of the node (larger node = larger in-degree).

 

PS

In the above image (click to view full size) you can see the IDs organized along a circle according to date starting at the 3 o’clock position and going clockwise, finishing again at the 3 o’clock position. The blue bars indicate the number of incoming edges, or citations to the ID (eg. ID 32 has a large amount of incoming edges, meaning many papers in the network cite it). The size of the red circle at the end of the bar is an indication of the proportion of these edges that were rated as “positive.” Obviously, the positivity measure makes more sense for IDs with a larger number of  citations (greater sample size). Important to note: there were only neutral, or positive qualifications. No statement found was critical towards the hypothesis. Thus, the “positivity” of a particular paper represents the number of positive incoming citations divided by the total number of citations it received.

The image below lists the 10 most citated papers in the network, with their associated positivity index.

Website - In-degree

This list below shows the data broken down according to journal:

Website - Journals

And here it is in graphical format, with positive citations shown in green, and neutral in white:

Website - Journals Graph

The graph below shows the frequency of positive vs. neutral citations each year, and the cumulative amount over the course of the years:

Website - Frequency

 

Discussion

Open to discussion 🙂

Appendix

ID list

 

 

One thought on “Citation analysis

  1. It may be worth doing a similar analysis on Google Scholar as opposed to pubmed, as Google will often be able to return documents who used terms inside their papers that may not have appeared within the abstract or key words. I mention this because the phrases “pattern separation” and “interference” and even some notable figures based on that premise started being bandied about in 2004-2006, which is kind of a dead zone in your analysis above. I believe that one difference was that in those early years of neurogenesis function work (prior to the Leutgeb study a few years later) while the DG was thought to have a strong pattern separation property/feature, that wasn’t generally considered its sole function, at least not in the Treves / Rolls, O’Reilly / McClellland or McNaughton / Morris computational frameworks.

Leave a Reply