[1]The experiment was run from January through July. The technical memos from June were not used in this study because very few TMs were published in that month.
2Separate document profiles were maintained for the Keyword and LSI matching methods. TMs were used in a document profile only if the TM was originally selected by that method (i.e. a TM was added to the document profile for the LSI match method only if it was selected by the LSI match method). This enabled us to analyze how the Keyword match-Document profile and LSI match-Document profile methods work "as a whole".
3A number of choices had to be made as to how many TMs should be returned by each method, what rating value would be considered relevant, and what percentage of the relevant TMs should be added to a profile in any given month. The choice of values was done taking into account factors such as the number of TMs published per month and estimates of how many TMs would be relevant to a person in a month. Since full rating data on all TMs published each month was collected on some subjects, we could go back and test alternative choices of values. The conclusions described below do not appear to be very sensitive to the choice of cutoff values.
[4]The fact that the filtering methods miss an estimated 50% of the relevant articles is not as poor performance as it might seem. First, the filtered TMs that employees examined represented only 11% of the TMs written. So, 50% of the relevant TMs were retrieved by looking at only 11% of the total TMs. Second, more relevant TMs could be retrieved by simply increasing the number of TMs returned to employees. This is easy to do with any retrieval method that ranks items in decreasing order of similarity.