Innovation

Research in Ranking

With the considerable advances in technology over the past few years, information overload has become a serious problem in our lives. In a world that continually asks for more space to keep more information, an efficient method for recalling information is an everyday necessity for all computer users.

Activation Based Ranking fulfils that need. It is not simply another attempt at a desktop search engine, but a flexible personal ranking technology that can be used in a variety of information management tasks.

Hebbian develops search and retrieval system, which applies the latest advances in the study of human-computer interaction and human memory processes to various aspects of computer information management. This technology centres on Activation Based Ranking, which ranks search results based on how well the information is "remembered".

Human thought processes are extremely complex and are still far from being understood. We do not claim that Activation ranking exactly matches the functions of the human brain. However, by applying the same general principles, which are hypothesized to work in human thought processes, the technology used by Hebbian Recall parallels the usage of documents in a computer to human memory processes like practice and forgetting. In fact, the user naturally ranks his/her own documents simply through the everyday use of the computer.

About Activation Based Ranking

Traditional ranking algorithms compare the text in the query with the text in the document, and then rank documents solely on similarity or dissimilarity between the two. However it is a very tough task due to the ambiguity of natural language texts. Another common way of ranking documents is arranging by attributes (or metadata). A typical example is sorting files by name or date. This method is efficient only when the volume of information is fairly small or the user is well organized.

However, if you provide people with keywords and ask them to recall documents based on those keywords, their results will not be based on the frequency of those words occurring in a particular documents. Instead, they will rank documents based on how useful a document was to them, how much effort they spent on it, and when they last worked on it. All of these factors merge together to form what human memory researchers refer to as activation.

Activation is well known in the field of neuropsychology. Activation of a particular piece of information reflects the degree of the user's past experience with this information in association with a current context. It indicates how useful this information is at the current moment. In simple terms, things that you recall from the "top of your head" in a particular situation are top activated information items associated with the context of the situation. When information becomes easier to access within the brain as a result of having been used recently, it is more activated. Activation is tightly coupled with remembering (probability of recall) and forgetting.

There are several components of activation:

  • base level activation = depends on the strength of practicing the information (how much we have used it) and the recency of the information (how long since we have used it)
  • partial matching activation = depends on how well the information we are trying to recall matches the clues that we have
  • context OR distributed activation = items which are activated because of the current context or elements of the goal can "spread" their activation to related items

For example, if someone were to imagine a tree, the tree that appears in their mind will either be a tree that they have seen recently in their backyard, or a tree that carries certain significance in their life. If you have just read an article about Maple syrup, however, a maple tree may override other trees in your memory because the information item for "maple" has already been partially activated.

ACT-R and SOAR are just two of a number of well developed academic theories that model activation as a part of human cognition and thought processes.

The spikes on this graph correspond to the moments when the user "practiced" or "used" the particular item of information. The size of the spike is proportional to the depth of processing experienced by the user, as well as the effort they exerted. After each practice point, activation decays due to time or other interference.

When the activation of an item drops below a certain point, the item is considered "forgotten". This decay is very fast at first and then slows down with time to a plateau (hence the label "negatively accelerating" curve). That is why we "forget" a lot of information fast (steep part of the curve), but can still recall the information if given the proper cues. The information is not truly forgotten; rather it is simply lying dormant with a low level of activation. Once retrieval or contextual cues are given, the activation of the item increases, and the item can thus be recalled. It is important to note also that practicing has a cumulative effect. Frequent practicing leads to a less steep decay curve and higher residual activation.

Application of Activation Principles to Enterprise Search

Models for human memory have been based on empirical results, which have been observed and documented over several decades and with thousands of human participants.

One of the most well-known methods for testing this pattern of human memory is the Word Recall Task. In this task, researchers present participants with test sentences, followed by separate words from the sentence. For each word, the participant is asked to recall the original sentence, and activation measurements based on how difficult it was to remember the sentence, are taken. The presentation of the sentences is the practice of information, recall based on the words given is the partial matching component, and previously recalled words and sentences form the activation context.

This same technique is used in Hebbian products to rank and search a user's documents. A document is treated similarly to the sentence in the above example, with words from a query used for the partial match and other activated documents or specific task goals used as a context. In order to build a complete activation model, depth and strength of processing have to be estimated. This estimation is based on one of the following stages in human memorization:

Attention <- Encoding <- Storage <- Retrieval

The processes involved in encoding, storage and retrieval are functions internal to the brain and are therefore difficult to quantify. However, user attention to a particular information item can be measured and approximated. Users can pay attention to a particular item of information by passively viewing or hearing it, by actively interacting with the document (editing for example) or by simply thinking about the information item without having it in from of him or her. The last case is not measurable, and is not essential for our approximation.

The ranking method used is universal and can be applied to any kind of document (or more generally any information presentable to the user, such as a user interface element, fragments of documents, individual words, MP3 songs or movie characters). It does not depend on the document content, type of user activity, or user working habits and does not require specific actions from the user.

While our ranking is a simplified approximation of the processes that may happen in a human brain, it is based on tried-and-tested principles of practicing and forgetting. Given the wide range of information contained in a document (from texts to landscape photos) and the fact that computers are not telepathic/intelligent beings that share user's values - we don't see any better way which is generic enough to model activation of the documents.

Activation calculations can be improved for specific type of documents, specialized texts or tasks with classification or understanding of the document content, context or user intentions, but none of these methods are universal or accurate enough in current state of computer science.

Research in Attention Tracking

Human factors research traditionally deals with the issues of human-computer interactions. There are well established laws that tell how easy or difficult to do certain actions at the computer. These principals are successfully used in user interface design. In our research we reversed this: by observing user behavior we estimate how much efforts user had to spend to do certain actions. We also build number of probabilistic models that estimate amount of attention user paid to certain information items on the screen. These estimates incorporate artifacts from eye tracking studies and human attention research.

Applications of Activation Based Ranking Technology

Search and retrieval, as implemented in Hebbian Recall, is an obvious use of activation data and technology. However, activation-based ranking is also applicable to a wide range of applications, which we are currently developing.

These include:

User Context
Given a fixed number of a user's most activated documents (the ranking determined by their individual activation), a context can be created to accurately predict a user's probable focus of interest. Because the activation of the documents is constantly changing, the context will change as necessary, and will be completely maintenance-free. This context information could be further used to personalize web searches, adjust user interfaces, organize information, and for other applications.

Collaboration
Activation data and user context can be exchanged, collected and mined to improve collaboration over intranets and the Internet. Possible uses include:

  • popularity feedback and ranking of online resources
  • enterprise search, ranking of informational assets
  • people and communities with similar interests
  • capturing and sharing the best resources used by experts in the enterprise

Storage Management
Backing up and moving data to maximize storage capacity is a common practice for all in the information technology field. Activation ranking provides a superior mechanism for storage management. It can be used for both individual users and shared storages, and also provides an effective algorithm for cache management.

Device Synchronization and Priority Information Transmission Activation data will also allow much smarter data and file synchronization. For example, a user can specify a specific query or a few key attributes as clues for essential data to be synchronized to his or her PDA access. Using base activation, partial matching, associative activation and total size of the PDA memory, this technology can create a set of the most relevant documents that the user can then review and transfer to the PDA.