One page abstract describing the motivation behind the MSR server:
(VekslerEtAl2007.pdf)
Measures of Semantic Relatedness (MSRs) are computational means for calculating the association strength between terms.
MSRs have been used to produce models of human web-browsing behavior (Pirolli, 2005), augmented search engine technology (Dumais, 2003), semantic relevancy maps
(Veksler & Gray, 2007),
essay-grading algorithms for ETS (Landauer, Foltz, & Laham, 1998), and could be useful for any cognitive models or AI agents that have to deal with text.
lead researchers: Vladislav D. Veksler and Wayne D. Gray
(please contact vekslv[at]rpi.edu about collaboration, bugs, feature requests, etc.)
contributors (alphabetical): Stephane Gamard
, Alex Grintsvayg
, Robert Lindsey
, Michael Stipicevic
Many external sources used (Google; LSA from CU Boulder; GLSA from Xerox PARC; Wordnet from Princeton by way of UMN; etc.):
- LSA engine used here is http://lsa.colorado.edu/
for more information about Latent Semantic Analysis please find: Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211-240.
- WordNet engine used here is http://marimba.d.umn.edu; for more information about WordNet please visit http://wordnet.princeton.edu
- GLSA engine used here is glsa.parc.com; please read http://glsa.parc.com/faq.wml and "Generalized Latent Semantic Analysis for Term Representation", I. Matveeva, G. Levow, A. Farahat, Chr. Royer, RANLP 2005 for more information.
- PMI equation may be found in: Turney, P. (2001). Mining the Web for Synonyms: PMI versus LSA on TOEFL. In L. De Raedt & P. Flach (Eds.), Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp. 491-502). Freiburg, Germany.
- NSS stands for Normalized Search Similarity, and is adapted from Normalized Google DIstance, proposed in: Cilibrasi, R. & Vitanyi, P.M.B. (2006). Similarity of objects and the meaning of words. Proc. 3rd Conf. Theory and Applications of Models of Computation (TAMC), J.-Y. Cai, S. B. Cooper, and A. Li (Eds.), Lecture Notes in Computer Science, Vol. 3959, Springer-Verlag, Berlin.
- SA stands for Spreading Activation, and is based on the Spreading Activation formula from Anderson & Pirolli (1984) as cited by Farahat, Pirolli, & Markova (2004). Incremental Methods for Computing Word Pair Similarity. PARC technical report TR-04-6
- VGEM (Vector Generation from Explicitly-defined Multidimensional space) is an experimental measure, developed at CogWorks, that requires a list of terms to define the dimensions of the semantic space (VGG08.pdf)
VGEM199RSUMO uses 100 random terms and 99 terms from the SUMO noun ontology
VGEM-NSS-G has no dimensions. In order to define dimensions for this measure to work, you must enter terms into the Context field.
- All measures that end with "-Y" use the Yahoo search engine.
- All measures that end with "-G" use the Google search engine. -G searches entire web, -Gnytimes searches only *.nytimes.com on Google, -Gwikipedia searches only *.wikipedia.org, -Ggoogle, searches only *.google.com, -Greuters searches only *.reuters.com, -Gyahoo searches only *.yahoo.com, -Ggutenberg searches only *.gutenberg.org, -Ggutenbergfiles searches *.gutenberg.org/files.
- Different corpuses are used by MSRs on this server. Please see (Lindsey, Veksler, Grintsvayg, Gray, submitted) for more information about corpus selection.
- All measures on this server are normalized such that 1 is the highest possible score and 0 signifies a lack of semantic relatedness.
References
Blackmon, M. H., Kitajima, M., & Polson, P. G. (2005). Tool for accurately predicting website navigation problems, non-problems, problem severity, and effectiveness of repairs. In ACM CHI 2005 Conference on Human Factors in Computing Systems (pp. 31-40). New York: ACM Press.
Raluca Budiu, Christiaan Royer, Peter Pirolli (2007), Modeling information scent: A comparison of LSA, PMI and GLSA similarity measures on common tests and corpora. Proceedings of RIAO'07 Pittsburgh, PA, May 2007.
Cilibrasi, R., & Vitanyi, P. M. B. (2007). The Google similarity distance. Ieee Transactions on Knowledge and Data Engineering, 19(3), 370-383.
Dumais, S. (2003). Data-driven approaches to information access. Cognitive Science, 27(3), 491-524.
Farahat, Pirolli, & Markova (2004). Incremental Methods for Computing Word Pair Similarity. PARC technical report TR-04-6
Gounon, P., & Lemaire, B. (2002). Semantic comparison of texts for learning environments. In Advances in Artificial Intelligence - Iberamia 2002, Proceedings (Vol. 2527, pp. 724-733). Berlin: Springer-Verlag Berlin.
Howes, A., & Payne, S. J. (1990). Display-based competence: Towards user models for menu-driven interfaces. International Journal of Man-Machine Studies, 33(6), 637-655.
Juvina, I., van Oostendorp, H., Karbor, P., & Pauw, B. (2005). Towards modeling contextual information in web navigation. In B. G. Bara, L. Barsalou & M. Bucciarelli (Eds.), 27th Annual Meeting of the Cognitive Science Society, CogSci2005 (pp. 1078-1083). Austin, Tx: The Cognitive Science Society, Inc.
Kaur, I., & Hornof, A. J. (2005). A comparison of LSA, WordNet, and PMI-IR for predicting user click behavior. In ACM CHI 2005 Conference on Human Factors in Computing Systems (pp. 51-60). New York: ACM Press.
Landauer, T. K. (2002). On the computational basis of learning and cognition: Arguments from LSA. Psychology of learning and motivation: Advances in research and theory, 41, 43-84.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211-240.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284.
Landauer, T. K., Laham, D., & Derr, M. (2004). From paragraph to graph: Latent semantic analysis for information visualization. Proceedings Of The National Academy Of Sciences Of The United States Of America, 101, 5214-5219.
Lee, M. D., Pincombe, B., & Welsh, M. (2005). An empirical evaluation of models of text document similarity. In B. G. Bara, L. Barsalou & M. Bucciarelli (Eds.), 27th Annual Meeting of the Cognitive Science Society, CogSci2005 (pp. 1254-1259). Austin, Tx: The Cognitive Science Society, Inc.
Lemaire, B., & Denhire, G. (2004). Incremental construction of an associative network from a corpus. In K. D. Forbus, D. Gentner & T. Regier (Eds.), 26th Annual Meeting of the Cognitive Science Society, CogSci2004. Hillsdale, NJ: Lawrence Erlbaum Publisher.
Lemaire, B., Denhiere, G., Bellissens, C., & Jhean-Iarose, S. (2006). A computational model for simulating text comprehension. Behavior Research Methods, 38(4), 628-637.
Lemaire, B., & Dessus, P. (2001). A system to assess the semantic content of student essays. Journal of Educational Computing Research, 24(3), 305-320.
Lindsey, R., Veksler, V. D., Grintsvayg, A., & Gray, W. D. (2007). Effects of Corpus Selection on Semantic Relatedness. 8th Internation Conference of Cognitive Modeling, ICCM2007, Ann Arbor, MI.
Matveeva, I., Levow, G., Farahat, A., & Royer, C. (2005). Term representation with generalized latent semantic analysis. Paper presented at the 2005 Conference on Recent Advances in Natural Language Processing.
Pirolli, P. (2005). Rational analyses of information foraging on the Web. Cognitive Science, 29(3), 343-373.
Pirolli, P., & Fu, W. T. (2003). SNIF-ACT: A model of information foraging on the World Wide Web. Lecture Notes in Computer Science, 2702, 45-54.
Turney, P. (2001). Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In L. De Raedt & P. Flach (Eds.), Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp. 491-502). Freiburg, Germany.
Veksler, V. D., Govostes, R. Z., & Gray, W. D. (submitted). Defining the dimensions of the human semantic space.
Veksler, V. D., & Gray, W. D. (2006). Test case selection for evaluating measures of semantic distance. Paper presented at the 28th Annual Meeting of the Cognitive Science Society, Vacouver, BC.
Veksler, V. D., & Gray, W. D. (2007). Mapping semantic relevancy of information displays. Paper presented at the CHI 2007, San Jose, CA.
Veksler, V. D., Grintsvayg, A., Lindsey, R., & Gray, W. D. (2007). A proxy for all your semantic needs. 29th Annual Meeting of the Cognitive Science Society, CogSci2007, Nashville, TN.
Zampa, V., & Lemaire, B. (2001). Latent semantic analysis for user modeling. Journal of Intelligent Information Systems, 18(1), 15-30.