NLP Reading Group

The Natural Language Processing reading group attempts to keep abreast of interesting research ideas and results that may be useful to us. We typically read and discuss one paper per week. All our past papers are listed below.

The reading group is listed every semester as a 1-credit course, 600.765 ("Selected Topics in NLP"). Contact the instructor (Jason Eisner) to get on the mailing list. At the first course meeting, we brainstorm a bunch of topics for the semester, and vote on which ones to pursue. We then spend about 4 weeks per topic. Although some topics are within NLP, many of them explore potentially relevant work from related fields such as machine learning and linguistics.

During the summer we usually catch up on the latest NLP conference papers.

Instructions on how to present in reading group.

Jason's advice on how to read a paper.

Other reading groups at CLSP are the ones on Text Meaning, Generation, and Translation (Tuesdays) and on Machine Learning (Wednesdays). There is also the weekly Machine Learning Tea.

Spring 2012

Thursdays 12-1:15 in Hackerman 306.

Spectral learning

May 3 (Meher Yeleti)

Apr 26 (Matt Gormley)

Apr 19 (Michael Paul): TBD

Reinforcement learning

Apr 12 (Ves Stoyanov)

Apr 5 (Nathaniel Filardo)

Mar 29 (Jay Feldman)

Non-convex optimization

Mar 15 (Frank Ferraro)

Mar 8 (Tim Vieira, but I'm happy to switch to RL)

Mar 1 (Nicholas Andrews)

Unsupervised/semisupervised learning of linguistic structure

Feb 23 (Olivia Buzek)

Feb 16 (Adam Teichert): TBA

Feb 9 (Jason Smith): Joao V. Graca, Kuzman Ganchev, and Ben Taskar (2007). Expectation Maximization and Posterior Constraints. In Advances in Neural Information Processing Systems, Vol. 20.

A longer treatment is Ganchev et al. (2010), Posterior Regularization for Structured Latent Variable Models, JMLR.

An application to unsupervised dependency parsing is Gillenwater et al. (2011), Posterior Sparsity in Unsupervised Dependency Parsing, JMLR.

Fall 2011

Knowledge representation and reasoning

Dec 1 (Meher Vijay Yeleti): D. Koller, A. Levy, and A. Pfeffer (1997). P-Classic: A Tractable Probabilistic Description Logic. AAAI.

Nov 17 (Ves Stoyanov): Franz Baader and Werner Nutt (2002). Basic Description Logics. In the Description Logic Handbook.

Nov 10 (Nick Andrews): Nir Friedman et al. (1999). Learning Probabilistic Relational Models. IJCAI.

Nov 3 (Matt Gormley): Hector J. Levesque (1986). Knowledge Representation and Reasoning. Annual Review of Computer Science, Vol. 1: 255-287.

Music modeling

Oct 27 (Adam Teichert): Jean-François Paiement, Yves Grandvalet & Samy Bengio (2009). Predictive models for music. Connection Science 21(2-3):253-272.

Oct 20 (Nathaniel Filardo): David Temperley (2010). Modeling Common-Practice Rhythm. Music Perception 27(5):355-376.

Oct 13 (Michael Paul): Gerhard Nierhaus (2008). "Genetic Algorithms in Algorithmic Composition". Algorithmic Composition: Paradigms of Automated Music Generation, Chapter 7.4, pp. 157-186.

Oct 6 (Frank Ferraro): Fred Lerdahl and Ray Jackendoff (1983). "An Overview of Hierarchical Structure in Music." Music Perception: An Interdisciplinary Journal. Vol. 1, No. 2, Hierarchical Structure in Music (Winter 1983/1984), pp. 229-252.

Resources

http://www.musictheory.net/lessons provides a sequence of interactive music theory lessons.
A virtual keyboard: http://www.bgfl.org/bgfl/custom/resources_ftp/client_ftp/ks2/music/piano/.

ML in information retrieval

Sep 29 (Olivia Buzek): Shuang-Hong Yang, Bo Long, Alexander J. Smola, Hongyuan Zha, and Zhaohui Zheng (2011). Collaborative competitive filtering: learning recommender using context of user choice. SIGIR.

Sep 22 (Tim Vieira): Brian McFee and Gert Lanckriet (2010). Metric Learning to Rank. ICML.

Sep 15 (Adam Teichert): P. Carpena, P. Bernaola-Galvan, M. Hackenberg, A.V. Coronado, and J. L. Oliver (2009). Level statistics of words: Finding keywords in literary texts and symbolic sequences. Physical Review.; Rada Mihalcea, Courtney Corley, and Carlo Strapparava (2006). Corpus-based and Knowledge-based Measures of Text Semantic Similarity. AAAI.

Sep 8 (Travis Wolfe): Dafna Shahaf, Carlos Guestrin (2010). Connecting the dots between news articles. Proc. of KDD.

Summer 2011

Summer conference papers

Aug 16 (Matt Gormley): Taylor Berg-Kirkpatrick, Dan Klein (2011). Simple Effective Decipherment via Combinatorial Optimization. Proc. of EMNLP.

Jul 19 (Matt Gormley): Alexander M. Rush and Michael Collins (2011). Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation. Proc. of ACL. (Slides.)

Jul 12 (Wes Filardo): Daniel Gildea (2010). Optimal Parsing Strategies for Linear Context-Free Rewriting Systems. Proc. of NAACL. (Slides.)

Jun 14 (Xiaoxu Kang): Limin Yao, Sebastian Riedel, and Andrew McCallum (2010). Collective Cross-Document Relation Extraction Without Labelled Data. Proc. of EMNLP.

Jun 7 (Nicholas Andrews): Harr Chen, Edward Benson, Tahira Naseem, and Regina Barzilay (2011). In-Domain Relation Discovery with Meta-Constraints via Posterior Regularization. Proc. of ACL.

Spring 2011

Combinatorial optimization

May 5 (Wes Filardo)

Daniel J. Lehmann (1977). Algebraic structures for transitive closure. Theoretical Computer Science 4(1):59-76.

Also consider Tarjan (1981a, 1981b).

Apr 28 (Jason Smith): R. McDonald, F. Pereira, K. Ribarov, and J. Hajic (2005). Non-projective dependency parsing using spanning tree algorithms. In Proc. HLT/EMNLP, pages 523–530

Apr 21 (Byung Gyu Ahn): David Sontag, Amir Globerson, Tommi Jaakkola (2010). Introduction to Dual Decomposition for Inference. To appear in Optimization for Machine Learning, editors S. Sra, S. Nowozin, and S. J. Wright: MIT Press, 2010.

Apr 14 (Adam Teichert): Jack Edmunds (1965). Paths, Trees, and Flowers. Canadian Journal of Mathematics 17: 449--467.

Game-theoretic approaches to discourse pragmatics and to language evolution

Apr 7 (Michael Paul): Paul Vogt (2005). The emergence of compositional structures in perceptually grounded language games. Artificial Intelligence 167(1-2): 206-242.

Mar 31 (Rachael Richardson): David Golland, Percy Liang, Dan Klein (2010). A Game-Theoretic Approach to Generating Spatial Descriptions. EMNLP 2010.

Mar 17 (Xuchen Yao)

Gerhard Jäger (2008). Game theory in semantics and pragmatics. Unpublished manuscript.

Note: This looks quite different from the 2011 manuscript that has the same title and author.

March 10 (Luke Orland): Gerhard Jäger (2008). Applications of Game Theory in Linguistics

Variational inference

March 3 (Nicholas Andrews): Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). The infinite PCFG using hierarchical Dirichlet processes. EMNLP.

Feb 24 (Nathaniel Filardo): Matthew Beal (2003). Variational Bayesian Hidden Markov Models. Appears as Chapter 3 of Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London.

David MacKay (1997). Ensemble Learning for Hidden Markov Models. Unpublished technical report, Cavendish Laboratory, University of Cambridge.

Slides from Mark Johnson (2007). Why doesn't EM find good HMM POS-taggers?. EMNLP.

Feb 17 (Adam Teichert): David M. Blei, Andrew Y. Ng, and Michael I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning.

Feb 10 (Matt Gormley)

Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul (1999). An introduction to variational methods for graphical models. Machine Learning.

To get an intuition first, start with Jason's high-level explanation of variational inference. For another reference, try the ACL 2007 tutorial slides by Percy Liang and Dan Klein.

Fall 2010

Unsupervised discriminative learning

Dec 9 (Adam Teichert): Continue with last week's reading: chapter 3.

Dec 2 (Wes Filardo): Continue with last week's reading: finish chapter 2.

Nov 18 (Jason Smith): Csaba Szepesvári, Algorithms for Reinforcement Learning. This week we'll read the preface, chapter 1, and the first section of chapter 2. If you're trying to access this outside of JHU, try this link.

Nov 11 (Ves Stoyanov): Yves Grandvalet and Yoshua Bengio, Entropy Regularization, in: Semi-Supervised Learning, pages 151--168, MIT Press, 2006

Nov 4 (Michael Paul): Baoxun Wang, Xiaolong Wang, Chengjie Sun, Bingquan Liu, Lin Sun (2010). Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities.

Oct 28 (Adam Teichert): Noah Smith and Jason Eisner (2005). Guiding Unsupervised Grammar Induction Using Contrastive Estimation.

Semantic parsing

Oct 21 (Svitlana Volkova): Mihai Surdeanu, Richard Johansson, Adam Meyers, Llu ́ıs Ma`rquez, Joakim Nivre (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. Slides.

Oct 14 (Xuchen Yao): Wei Lu , Hwee Tou Ng , Wee Sun Lee , Luke S. Zettlemoyer (2008). A Generative Model for Parsing Natural Language to Meaning. EMNLP. Slides.

Oct 7 (Matt Gormley): Dipanjan Das, Nathan Schneider, Desai Chen and Noah A. Smith (2010). Probabilistic Frame-Semantic Parsing. NAACL.

Sep 30 (Nicholas Andrews): Luke S. Zettlemoyer and Michael Collins (2009). Learning Context-Dependent Mappings from Sentences to Logical Form. ACL.

Graph-based methods and random walks

Sep 23 (Adam Teichert): Jie Cai and Michael Strube (2010). End-to-End Coreference Resolution via Hypergraph Partitioning. ACL.

Sep 16 (Delip Rao): Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). A Survey of Statistical Network Models. Foundation and Trends in Machine Learning 2, 2 (Feb.), 129-233.

Sep 9 (Svitlana Volkova): Einat Minkov and William W. Cohen (2008). Learning Graph Walk Based Similarity Measures for Parsed Text. EMNLP.

Summer 2010

Summer conference papers

Aug 12 (Jason Smith): Alexander Clark (2010). Efficient, Correct, Unsupervised Learning for Context-Sensitive Languages. CoNLL.

Aug 5 (Veselin Stoyanov): Hoifung Poon and Pedro Domingos (2010). Unsupervised Ontology Induction from Text. ACL.

Jul 20: General discussion of ACL 2010 papers.

Jul 15 (Nicholas Andrews): Shay B. Cohen, David M. Blei and Noah A. Smith (2010). Variational Inference for Adaptor Grammars. NAACL.

Jul 6 (Veselin Stoyanov): D. Chiang, J. Graehl, K. Knight, A. Pauls, and S. Ravi (2010). Bayesian Inference for Finite-State Transducers. NAACL.

Jun 29 (Matt Gormley): Percy Liang, Michael I. Jordan, and Dan Klein (2010). Type-Based MCMC. NAACL. Slides.

Jun 22 (Spence Green): David Burkett, John Blitzer, and Dan Klein (2010). Joint Parsing and Alignment with Weakly Synchronized Grammars. NAACL. Slides

Relevant background:

David A. Smith and Jason Eisner (2009). Parser adaptation and projection with quasi-synchronous grammar features. EMNLP.
David A. Smith and Jason Eisner (2008). Dependency parsing by belief propagation. EMNLP.
David Burkett and Dan Klein (2008). Two Languages are Better than one (for Syntactic Parsing). EMNLP.

Jun 17 (Ves Stoyanov): Aria Haghighi and Dan Klein (2010). Coreference Resolution in a Modular, Entity-Centered Model. NAACL.

Jun 10: General discussion of NAACL 2010 papers.

Spring 2010

Visual scene parsing

May 6 (Rizwan Chaudhry): S. Fidler, M. Boben, A. Leonardis (2009). Learning Hierarchical Compositional Representations of Object Structure. In Sven J. Dickinson, Alés Leonardis, and Bernt Schiele (eds.), Object Categorization: Computer and Human Vision Perspectives.

See also the talk that Geoff Hinton gave here last week, Deep learning with multiplicative interactions.

April 22 (Zach Pezzementi) and April 29 (Balakrishnan V)

Song-Chun Zhu and David Mumford (2006). A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2(4):259-362. Slides

Official final version is good for screen reading but wastes paper.

April 15 (Nick Andrews): Sven Dickinson (2009). The Evolution of Object Categorization and the Challenge of Image Abstraction. In Sven J. Dickinson, Alés Leonardis, and Bernt Schiele (eds.), Object Categorization: Computer and Human Vision Perspectives.

Generalized A* and related coarse-to-fine ideas

April 8 (Matt Gormley): André F. T. Martins, Noah A. Smith, and Eric P. Xing (2009). Concise Integer Linear Programming Formulations for Dependency Parsing. ACL-IJCNLP.

April 1 (Adam Gerber): Aria Haghighi, John DeNero, and Dan Klein (2007). Approximate Factoring for A* Search. HTL-NAACL 2007. Slides

March 25 (Zhifei Li): Mark Hopkins and Greg Langmead (2009). Cube pruning as heuristic search. EMNLP 2009.

March 11 (Jason Smith): Adam Pauls and Dan Klein (2009). K-Best A* Parsing. ACL. Slides

March 4 (Nathaniel Filardo): Pedro Felzenswalb and David McAllester (2007). The Generalized A* Architecture. Journal of Artificial Intelligence Research. Slides from [1].

Weakly supervised learning of semantics

There's also a nice list of papers at the UT reading group on Connecting Language Acquisition with Machine Perception.

Feb 25 (Nick Andrews): Luke Zettlemoyer and Michael Collins (2005). Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI-05).

~~Feb 11~~ Feb 18 (Ves Stoyanov): S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay (2009). Reinforcement Learning for Mapping Instructions to Actions. ACL-IJCNLP.

Feb 4 (Rachael Richardson): Percy Liang, Michael I. Jordan, and Dan Klein (2009). Learning Semantic Correspondences with Less Supervision. ACL-IJCNLP.

Fall 2009

Bayesian methods

Jan 21 (Zhifei Li): Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). The infinite PCFG using hierarchical Dirichlet processes.. EMNLP 2007.

Jan 14 (Jason Smith)

Matthew J. Beal, Zoubin Ghahramani, and Carl Edward Rasmussen (2002). The Infinite Hidden Markov Model. NIPS.

Also discussed in section 7 of last week's paper.

Jan 7 (Jason Eisner)

Long lecture on the Dirichlet process (infinite) mixture model.

Reading: Yee Whye Teh, Michael Jordan, Matthew Beal and David Blei (2005), Hierarchical Dirichlet Processes.

There's also a stack of relevant slides from Jordan's 2005 NIPS tutorial.

Dec 3 (Jason Smith)

Sharon Goldwater and Thomas L. Griffiths (2007), A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging. ACL.

This paper uses a Gibbs sampler. See also the following papers, which compare Gibbs sampling with Variational Bayes and other methods for the same problem:

Mark Johnson (2007), Why doesn’t EM find good HMM POS-taggers?. EMNLP.
Jianfeng Gao and Mark Johnson (2008), A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. EMNLP.

Nov 26 (Mechanical Turkey): Mary McGlohon (2007), Fried Chicken Bucket Processes. SIGBOVIK.

Nov 19 (Jason Eisner): Lecture on Gibbs sampling and variational Bayes for LDA and its finite-state generalizations.

Nov 12 (Jason Eisner): Yee Whye Teh (2009), Nonparametric Bayesian Models. Video tutorial at Machine Learning Summer School.

Nov 5 (Zhifei Li): David M. Blei, Andrew Y. Ng, & Michael I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003) 993-1022.

Inference methods

Oct 29 (Markus Dreyer): Koller & Friedman, Chapter 11, Optimization as Inference

Oct 22 (Puyang Xu): Koller & Friedman, Chapter 12: Particle-Based Methods

Oct 15 (Ariya Rastrow): Koller & Friedman, Chapters 3 & 4

Oct 1 (Anoop Deoras), Oct 8 (Carolina Parada): MacKay (2003), Monte Carlo Methods and Efficient Monte Carlo Methods. Chapters 29-30 of Information Theory, Inference, and Learning Algorithms.

Multilingual/ Cross-lingual learning

Sep 24 (Omar F. Zaidan): David Burkett and Dan Klein, (2008). Two Languages are Better than One (for Syntactic Parsing). EMNLP, 2008.

Sep 17 (Rachael Richardson): Alexander Fraser, Renjing Wang, and Hinrich Schütze (2009). Rich Bitext Projection Features for Parse Reranking. EACL 2009.

Sep 10 (Delip Rao): Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay (2009). Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach. NAACL 2009.

Summary: There are several approaches to learning syntax in an unsupervised fashion but this paper belongs to the growing notion of exploiting multiple languages to reduce ambiguity in the learning task. The most important take-home message from the paper is, it is possible to consistently reduce the gap between supervised and unsupervised learning by progressively adding more languages to the mix. This is akin to the multi-view learning results in machine learning literature. An earlier work by the same authors (EMNLP'08) showed how by carefully selecting pairs of languages in multilingual learning one can achieve better accuracies. The current paper builds on that result and shows that it is not really necessary to hand-pick the bilingual pairs; robust performance is guaranteed by blindly adding more languages.

Well, not so blindly. Adding more languages to the setup means estimating more parameters in the model. Without careful implementation, such a model can become intractable. Section 3 explains in detail about the generative setup and the inference procedure. Starting with Goldwater's monolingual HMM tagging like setup for each language, the HMMs are stitched together using alignment links and latent variables called "superlingual tags" leading to a product of experts model. The superlingual tags can be considered as tags that generate similar kind of syntactic entities in each of the languages. The inference procedure as with any non-trivial npbayes setup involves computing integrals that don't have a closed form solution. Monte Carlo sampling is a standard approach to solve such problems. Gibbs sampling is one such method. The details of the sampling process is in sections 3.5-3.7. This part is a bit technical and will be discussed either tomorrow and/or the sessions on non-parametric bayesian methods. There are other methods one could use, like variational methods and expectation propagation instead.

Summer 2009

Summer conference papers

July 23 (Zhifei Li): Joris Mooij and Bert Kappen, (2008). Bounds on marginal probability distributions. NIPS, 2008.

July 16 (Markus Dreyer): Fabien Cromierès, Sadao Kurohashi (2009). An Alignment Algorithm Using Belief Propagation and a Structure-Based Distortion Model. EACL 2009.

June 25 (Markus Dreyer): Hoifung Poon, Colin Cherry, Kristina Toutanova (2009). Unsupervised Morphological Segmentation with Log-Linear Models. NAACL 2009.

June 19 (Zhifei Li): David Chiang, Wei Wang and Kevin Knight, (2009). 11,001 new features for statistical machine translation. NAACL 2009.

Spring 2009

Information extraction (relevant to TAC)

Apr 30 (Chuan Liu): Jun Wang (2009). Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval. European Conference on Information Retrieval.

Apr 23 (Wes Filardo): Jun Zhu, Zaiqing Nie, Xiaojing Liu Bo Zhang, Ji-Rong Wen (2009). StatSnowball: a Statistical Approach to Extracting Entity Relationships. WWW 2009.

Apr 16 (Carolina Parada): Julien Ah-Pine, Guillaume Jacquet (2009). Clique-Based Clustering for improving Named Entity Recognition systems. EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30 - April 3, 2009

Apr 9 (Jason Smith): Marius Pasca (2009). Outclassing Wikipedia in Open-Domain Information Extraction: Weakly-Supervised Acquisition of Attributes over Conceptual Hierarchies. EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30 - April 3, 2009

Domain adaptation across text genres

Apr 2 (Arnab Ghoshal): Corinna Cortes, Mehryar Mohri, Michael Riley, Afshin Rostamizadeh. Sample Selection Bias Correction Theory. In Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008).

Mar 26 (Ariya Rastrow): Yishay M, Mehryar M, Afshin R (2008). Domain Adaptation with Multiple Sources. In Proceedings of Advances in Neural Information Processing Systems (NIPS)

Optional Reading John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. Learning Bounds for Domain Adaptation. Neural Information Processing Systems - NIPS 2007

Mar 12 (Delip Rao): Schweikert G, Widmer C, Scholkopf B, Ratsch G (2008) An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Proceedings of Advances in Neural Information Processing Systems (NIPS)

Optional Reading: Marx Z, Rosenstein MT, Dietterich TG, Kaelbling LP (2008) Two algorithms for transfer learning. In: Inductive Transfer: 10 years later

Mar 5 (Omar F. Zaidan): Su-In Lee, Vassil Chatalbashev, David Vickrey, and Daphne Koller (2007). Learning a Meta-Level Prior for Feature Relevance from Multiple Related Tasks. ICML 2007.

Recent good papers

Feb 26 (Zhifei Li): John DeNero, Alex Bouchard, and Dan Klein (2008). Sampling Alignment Structure under a Bayesian Translation Model. EMNLP 2008.

Feb 19 (Jason Eisner): Impromptu lecture on Dirichlet distributions, Dirichlet processes, etc.

Feb 12 (Markus Dreyer): Tom Minka (2005). Divergence measures and message passing. Microsoft Research Technical Report. (slides: pdf slides ppt)

Feb 5 (Delip Rao): David J. Hand (2006). Classifier Technology and The Illusion of Progress. Statistical Science.

Fall 2008

Programming languages for AI

Dec 13-14: NIPS workshop on probabilistic programming (see probabilistic-programming.org), which mentioned a number of other languages and libraries.

Dec 4 (Omar F. Zaidan)

Jeff Bilmes (~2002). The Graphical Models Toolkit (GMTK).

The above link includes a draft of the documentation and a tutorial, as well as the binaries.

Nov 20 (Wren Thornton)

Avi Pfeffer (2006). IBAL Tutorial.

Installed in masters*:~wren/local/bin (linux only, so not masters01 or masters02) and clsp:~wren/local/bin. Add this directory to your PATH.

See also other materials, including this paper: Avi Pfeffer (2007). The design and implementation of IBAL: A general-purpose probabilistic language. In Lise Getoor and Ben Taskar (eds.), Introduction to Statistical Relational Learning.

Nov 13 (Nathaniel Filardo)

Marc Sumner and Pedro Domingos (2007). The Alchemy Tutorial. (slides)

System is installed in masters*:~nwf/public/alchemy. There is a tutorial subdirectory. You should be able to follow along in the tutorial by running commands like

~nwf/public/alchemy/bin/infer \
   -i ~nwf/public/alchemy/tutorial/basics/uniform.mln \
   -e ~nwf/public/alchemy/tutorial/empty.db \
   -r uniform.results \
   -q Heads

Miscellaneous

Oct 30, Nov 6: Discussion of the EMNLP 2008 papers.

Oct 23 (Damianos Karakos)

I. Csiszar and G. Tusnady (1984). Information geometry and alternating minimization procedures. Statistics and Decisions, Suppl. Issue 1, pp. 205-237.

The paper is not online, but there are online course notes from Sanjeev Khudanpur.

Probabilistic relational models

Oct 16 (Nathaniel Filardo): Pedro Domingos et al. (2008). Markov Logic. In L. De Raedt, P. Frasconi, K. Kersting and S. Muggleton (eds.), Probabilistic Inductive Logic Programming (pp. 92-117). New York: Springer.

Oct 1 (Balakrishnan Varadarajan?)

Nir Friedman, Lise Getoor, Daphne Koller, and Avi Pfeffer (1999). Learning Probabilistic Relational Models. In IJCAI.

A longer book chapter version is linked from here, but the link is dead.

Sep 25 (Zhifei Li): David Smith and Jason Eisner (2008). Dependency Parsing by Belief Propagation. In EMNLP.

Creative uses of classifiers in NLP

Sep 18 (Markus Dreyer): D. Rosenberg, D. Klein and B. Taskar (2007). Mixture-of-Parents Maximum Entropy Markov Models. Uncertainty in Artificial Intelligence (UAI), Vancouver, BC, July.

Sep 11 (Nikesh Garera): Yoav Goldberg and Michael Elhadad (2007). SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking. In ACL 2007.

Libin Shen; Aravind K. Joshi (2003) An SVM-based voting algorithm with application to parse reranking. In HLT-NAACL 2003.

Summer 2008

Good current papers

August 19 (Zhifei Li): Ahmad Emami and Frederick Jelinek (2006). A neural syntactic language model. Journal of machine learning, volume 60, numbers 1-3, September, 2005.

August 5 (Zhifei Li): Libin Shen, Jinxi Xu and Ralph Weischedel (2008). A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. In ACL 2008.

July 29 (David Smith): Ronan Collobert and Jason Weston (2008). A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML 2008: Helsinki, Finland.

July 22 (Nikesh Garera): Zornitsa Kozareva, Ellen Riloff and Eduard Hovy (2008). Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. Proc. of ACL-08: HLT, Columbus, OH.

July 15 (Markus Dreyer): Sittichai Jiampojamarn, Colin Cherry, and Grzegorz Kondrak (2008). Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. Proc. of ACL-08: HLT, Columbus, OH.

July 8 (Delip Rao): Liang Sun, Shuiwang Ji, and Jieping Ye (2008). A Least Squares formulation for Canonical Correlation Analysis. Proc. of ICML-08, Helsinki

Hotelling, in 1936, proposed a method to characterize the relationship between two variables which widely became known as "Canonical Correlation Analysis" (CCA). This involves solving the generalized eigenvalue problem of the kind Ax = \lambda Bx, which can further be reduced to the symmetric eigenvalue problem (via Cholesky decomposition) in the CCA case. It is a general interest in statistics literature to connect different statistical models to the least squares problem not only to exploit the simpler solutions for solving such problems but also to relate with other methods. The least squares formulation also allows extending the different models using the regularization framework. The least squares formulation for the CCA model involves tying together an older result showing the equivalence of CCA and the Fisher LDA, and a recent least squares formulation of multi-class LDA.

CCA has been applied traditionally in social sciences and more recently in IR. There is literature applying CCA for problems in cross-lingual IR, image retrieval, and learning lexicons. Interestingly, the ACL'08 paper by Haghighi et. al. on learning bilingual lexicons using CCA is not the first paper to do that. There is at least one paper as early as 2004 by Cancedda & friends from XRCE that does something similar and does not get cited in the ACL paper.

June 12 (Zhifei Li): Hao Zhang, Chris Quirk, Robert C. Moore and Daniel Gildea (2008). Bayesian Learning of Non-compositional Phrases with Synchronous Parsing. Proc. of ACL-08: HLT, Columbus, OH.

June 5 (Markus Dreyer): Kuzman Ganchev, João Graça and Ben Taskar (2008). Better Alignments = Better Translations? Proc. of ACL-08: HLT, Columbus, OH.

May 29 (Nikesh Garera): Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick and Dan Klein (2008). Learning Bilingual Lexicons from Monolingual Corpora. Proc. of ACL-08: HLT, Columbus, OH.

Spring 2008

Dynamic programming speedups

May 15 (David Smith): Geoffrey Zweig and Mukund Padmanabhan (2000). Exact Alpha-Beta Computation in Logarithmic Space with Application to MAP Word Graph Construction. Proc. of ICSLP, Beijing.

This is a specialization to HMMs of the DBN version given earlier by Binder, Murphy & Russell (1997). See also section 3.7.1 of Kevin Murphy's thesis.

Related work: This kind of trick was really pioneered by D. S. Hirschberg (1975), who cut the space requirements of longest common subsequence from quadratic all the way down to linear. Hirschberg's version can be nicely adapted to edit distance. Now, edit distance (and more generally, multiple sequence alignment) is really just a special case of shortest path in a graph. Hirschberg (1975), above, was generalized by Korf (1999)'s "Divide and Conquer Bidirectional Search, which Korf & Zhang (2000) (who discuss all these algorithms) further improved to "Divide and Conquer Frontier Search." Edelkamp & Meyer (2001) give log-space methods for improving A* search for the shortest path in a graph. (Note that A* search often fits in memory for our DP problems; reducing its memory requirements becomes paramount when we are searching trees that branch without rejoining, e.g., chess.) Bidirectional search, which is distantly related to A*, is also pretty well studied, including recent work at JHU's AMS Dept.

May 1 (John Blatz): Pedro Felzenswalb and David McAllester (2006). The Generalized A* Architecture. To appear in the Journal of Artificial Intelligence Research.

Apr. 24 (Zhifei Li): Liang Huang (2008). Forest Reranking: Discriminative Parsing with Non-Local Features. To appear in Proceedings of ACL 2008, Columbus, OH.

Apr. 17 (Arnab Ghoshal): Liang Huang and David Chiang (2005). Better k-best parsing. Proceedings International Workshop on Parsing Technologies.

Grammatical inference

Apr. 10 (Wren Thornton): Carl de Marcken (1996), Linguistic structure as composition and perturbation. ACL.; Also see thesis version.

Apr. 3 (Nathaniel Filardo): A. Clark (2006). Learning Deterministic Context Free Grammars: The Omphalos Competition.

Mar. 27 (Nikesh Garera): Stolcke, A. and Omohundro, S. (1993). Hidden Markov model induction by Bayesian model merging. Advances in Neural Information Processing Systems (Morgan Kaufmann, San Mateo, CA), 5, 11-18.

Inference in graphical models

Mar. 20 (Delip Rao): Jonathan Yedidia, William Freeman, and Yair Weiss (2001). Bethe free energy, Kikuchi approximations and belief propagation algorithms. MERL TR-2001-16.

Mar. 6&13 (Markus Dreyer): M. J. Wainwright, T. Jaakkola and A. S. Willsky (2005). A new class of upper bounds on the log partition function. IEEE Trans. on Information Theory, 51, 2313--2335.

Feb. 28 (David Smith): David MacKay (2003). Variational methods. Chapter 33 of Information Theory, Inference, and Learning Algorithms.

Feb. 21 (David Smith): Michael I. Jordan et al. (1999). An Introduction to Variational Methods for Graphical Models Machine Learning, 37, 183–233.

Feb. 7&14 (Delip Rao): M. I. Jordan and Y. Weiss (2002). Probabilistic Inference in Graphical Models, The Handbook of Brain Theory and Neural Networks (MIT Press).

Fall 2007

Semisupervised learning

Dec. 12 (Delip Rao): M. Belkin, P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, TechReport, UChicago, TR-2002-01; Mikhail Belkin, Partha Niyogi, Vikas Sindhwani, On Manifold Regularization, AISTATS 2005

Nov. 17 (David Smith): X. Zhu, Semi-Supervised Learning Literature Survey

Recent parsing papers

Nov. 3 (Christo Kirov): I. Titov, J. Henderson, Constituent Parsing with Incremental Sigmoid Belief Networks, ACL 2007

Oct. 26 (Christo Kirov): Seginer, Yoav, Fast Unsupervised Incremental Parsing (syntax induction), ACL 2007

Oct. 17 (Markus Dreyer): Nakagawa, Tetsuji, Multilingual Dependency Parsing Using Global Features, EMNLP-CoNLL 2007

Text compression

Oct. 10 (Nathaniel W Filardo): Mahoney, Matthew, Adaptive Weighting of Context Models for Lossless Data Compression, Florida Institute of Technology, CS Department, Technical report CS-2005-16, EMNLP-CoNLL 2007

Some other possible papers that we didn't read (not vetted):

Approaches that consider recursive text structure
- Charikar et al. (2005), The smallest grammar problem
- de Marcken (1996), Linguistic structure as composition and perturbation (thesis version) - read later on 4/10/08
- Katajainen et al. (1986), Syntax-Directed Compression of Program Files

Approaches that learn hidden state
- Cormack & Horspool (1987), Data Compression Using Dynamic Markov Modelling
- Hu et al. (year?), Language Modeling with Stochastic Automata

Approaches that allow searches inside the compressed text
- Antonio Farina Martinez (2005), New Compression Codes for Text Databases (dissertation)
- Culpepper & Moffat (2006), Phrase-Based Pattern Matching in Compressed Text
- Shibata et al. (2000), A Boyer-Moore type algorithm for compressed pattern matching
- Shibata et al. (1999), Byte Pair Encoding: A Text Compression Scheme That Accelerates Pattern Matching
- Udi Manber (1997), A text compression scheme that allows fast searching directly in the compressed file

Domain adaptation

Oct. 3 (David Smith): Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira, Analysis of Representations for Domain Adaptation

Sep. 26 (Omar F Zaidan): J. Blitzer, R. McDonald, F. Pereira, Domain Adaptation with Structural Correspondence Learning, EMNLP 2006

Summer 2007

Good current papers

Aug. 30 (Delip Rao): Gideon S. Mann, Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization, Proceedings of the 24 th International Conference on Machine Learning 2007

Aug. 18 (Markus Dreyer): D. Talbot, M. Osborne, Randomised Language Modelling for Statistical Machine Translation, ACL 2007; They use a space-efficient randomized data structure (Bloom Filter) to store very large n-gram models. There is a companion paper that people might want to have a quick look at as well, for comparison:; D. Talbot, M. Osborne, Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap, ACL 2007

Aug. 11 (Nikesh Garera): L. Shen, G. Satta, A. Joshi., Guided learning for bidirectional sequence classification, ACL 2007

Aug. 3 (Yi Su): M. Galley, K. McKeown, Lexicalized Markov Grammars for Sentence Compression, NAACL-HLT 2007

Jul. 18 (David Smith): P. Liang, S. Petrov, M. Jordan, D. Klein, The Infinite PCFG Using Hierarchical Dirichlet Processes, EMNLP-CoNLL 2007

Jul. 6 (Christopher White)

A. Braunstein, M. Mezard, R. Zecchina., Survey propagation: an algorithm for satisfiability, Random Structures and Algorithms, 2005.

We sent some questions to Zecchina.

Lukas Kroc, Ashish Sabharwal and Bart Selman. Survey propagation revisited: An empirical study. 23rd UAI, 2007.

Jun. 21 (Christopher White)

K. Murphy, Y. Weiss, M. Jordan, Loopy belief propagation for approximate inference: An empirical study, 15th UAI, pages 467-?75, 1999

... discussing (loopy) belief propagation as background for survey propagation, a topic which has been getting more attention lately for its ability to "solve very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas. Chapter 8 of Chris Bishop's textbook is supposed to be a good treatment of graphical models overall. He covers BP in section 8.4.4 after first presenting factor graphs in 8.4.3., David MacKay's treatment of BP, also in terms of factor graphs, is in chapter 26 of his book [2]. It's worth reading this chapter in full, perhaps first reading chapter 16. ... the update equations are given as (26.11) and (26.12) ... [substantial further discussion by Jason was here] Some people may prefer Bishop's style, others MacKay's.

Jun. 14 (David Smith): X. Zhu, Z. Ghahramani,J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, ICML 2003

Jun. 6 (Nikesh Garera): A. Alexandrescu, K. Kirchhoff, Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP, HLT/NAACL 2007

Jun. 2 (Erin Fitzgerald): J. Jiang, C. Zhai, A Systematic Exploration of the Feature Space for Relation Extraction, HLT/NAACL 2007

May 17 (Markus Dreyer): M. Galley, K. McKeown, Lexicalized Markov Grammars for Sentence Compression, HLT/NAACL 2007

May 10 (David Smith ): M. Johnson, T. Griffiths, and S. Goldwater, Bayesian Inference for PCFGs via Markov Chain Monte Carlo, HLT/NAACL 2007

Spring 2007

Integrating search and learning

Apr. 19 (John Blatz): A. Prieditis, Machine discovery of Effective Admissible Heuristics , Machine Learning Journal, 1993

Apr. 12 (Markus Dreyer): A. Haghighi, J. DeNero and D. Klein, Approximate Factoring for A* Search, NAACL-HLT 2007

Mar. 29 & Apr. 5 (Zhifei Li): H. Daume III, J. Langford, and D. Marcu, Search-based structured prediction, Machine Learning Journal, forthcoming

Mar. 8 (David Smith): H. Daume III & D. Marcu, Learning as search optimization: approximate large margin methods for structured prediction, ICML 2005

Recent IR/QA papers (with an NLP or multilingual focus)

Mar. 1 (Wei Chen)

M. Kaisser, S. Scheible, and B. Webber, Experiments at the University of Edinburgh for the TREC 2006 QA track, TREC-15

They do some fairly deep interpretation of sentences, extracting their predicate-argument structure.

Feb. 22 (Eric Harley): K. Kan Lo & W. Lam, Using Semantic Relations with World Knowledge for Question Answering, TREC-15

Unsupervised learning of morphology

Feb. 15 (Nikhil Bojja): C. Monson et. al., Unsupervised Induction of Natural Language Morphology Inflection Classes, ACL Student Workshop '04

Feb. 8 (Delip Rao)

P. Schone and D. Jurafsky, Knowledge-free induction of morphology using latent semantic analysis , CoNLL 2000

However, there was an extension of this work reported in NAACL-2001 that looks at circumfixes and prefix/affix combinations. [3]

Feb. 1 (Nikesh Garera)

D. Yarowsky and R. Wicentowski, Minimally supervised morphological analysis by multimodal alignment,ACL 2000

For more details refer to Chapter 4 of Wicentowski's thesis.

Fall 2006

Syntax-based MT

Dec. 13 (Delip Rao): J. Carbonell et. al., Context-based machine translation, AMTA 2006

Dec. 6 (Jason Smith): M. Galley et. al., Scalable Inference and Training of Context-Rich Syntactic Translation Models, ACL 2006; It may also be helpful to look at:; M. Galley et. al., What's in a translation rule?, HLT/NAACL 2004

Nov. 29 (Balakrishnan V): D. Marcu et. al., SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , EMNLP 2006

Nov. 15 (Eric Harley): D. Chiang, An introduction to synchronous grammars, ACL 2006 Tutorial; Slides from the talk are also available. [4]

Linguistics: Syntactic formalisms

Nov. 8 (Elliott Drabek)

K.Shklovsky, A Grammatical Sketch of Petalcingo Tzeltal, Undergraduate Thesis, Reed College, 2005

It is 77 pages long, but not dense, and I will be skipping the following sections: Pages

01-14 Phonetics and phonology
18-18 Polyvalence
21-21 Inherent possession and ...
46-55 Tense and aspect and other sections

Nov. 1 (Yi Su): M. Steedman, Gapping as Constituent Coordination, Linguistics and Philosophy, Vol. 13, 1990, pp.207-264.; See Yi for photocopies.

Oct. 25 (Markus Dreyer): S. Reizler et. al., Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques, ACL 2002

Oct. 18 (Erin Fitzgerald)

J. Bresnan & R.M. Kaplan, Lexical-Functional Grammar: A Formal System for Grammatical Representation , The Mental Representation of Grammatical Relations, MIT Press, 1982

The edited collection that this appears in is generally interesting. Bresnan defends and develops lexicalized grammars in general; the idea of separate surface and semantic roles; and Bresnan & Kaplan's LFG in particular. You should know that she originated (in 1978) the extremely influential idea of lexicalized syntax -- the idea that a grammar is simply a collection of lexical entries to be assembled in standard language-independent ways, but that there are also "lexical redundancy rules" that relate, e.g., active and passive entries for the same verb. Some chapters address morphological and cognitive issues pertaining to lexicalization, including an essay by Pinker on lexicalist learning., Slides from Erin's presentation can be found here.

Machine learning: Margin methods and structured classification

Oct. 11 (John Blatz): L.Xu, D. Wilkinson, F. Southey, & D. Schuurmans, Discriminative Unsupervised Learning of Structured Predictors , ICML 2006

Oct. 4 (Nikesh Garera): A. Culotta & J. Sorensen, Dependency Tree Kernels for Relation Extraction , ACL 2004

D. Zelenko, C. Aone, & A. Richardella, Kernel Methods for Relation Extraction, JMLR, Volume 3, 2003

Sep. 27 (David Smith)

C. Cortes, P. Haffner, & M. Mohri, Rational Kernels , NIPS 2003

Papers extending rational kernels, including results on positive semidefinite cases, are at: [5], For the record, and not to be read, is an interesting parallel line of research in Fisher Kernels over strings, e.g. this paper by Saunders, Shawe-Taylor and Vinokourov: [6]

Sep. 20 (Elliot Drabek): K.Q. Weinberger, F. Sha, & L.K. Saul, Learning a kernel matrix for nonlinear dimensionality reduction , ICML 2004

S.T. Roweis & L.K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding , Science, 22 December 2000

J.B. Tenenbaum, V. De Silva, & J.C. Langford, A global geometric framework for nonlinear dimensionality reduction , Science, 22 December 2000

Sep. 13 (Roy Tromble): L. Xu, J. Neufeld, B. Larson, & D. Schuurmans, Maximum Margin Clustering , NIPS 2004

Summer 2006

Recent HLT-NAACL papers

Aug. 4 (David Smith)

Sharon Goldwater, Thomas L. Griffiths, Mark Johnson, Contextual Dependencies in Unsupervised Word Segmentation, ACL 2006

Anyone looking for a more straight-up language modeling discussion can compare:

Yee Whye Teh, A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes, ACL 2006

More resources:

Machine Learning MLPedia page on Dirichlet Processes
Michael Jordan's NIPS 2005 tutorial: Nonparametric Bayesian Methods: Dirichlet Processes, Chinese Restaurant Processes and All That
Y. Teh, M. Jordan, M. Beal, and D. Blei, Hierarchical Dirichlet processes, Journal of the American Statistical Association, 2006

Jul. 20 (Roy Tromble): Mehryar Mohri, Brian Roark, Probabilistic Context-Free Grammar Induction Based on Structural Zeros, HLT-NAACL, 2006

Jul. 6 (Keith Hall): Charles Sutton, Michael Sindelar, Andrew McCallum, Reducing Weight Undertraining in Structured Discriminative Learning, HLT-NAACL, 2006

Jun. 31 (Markus Dreyer): Joakim Nivre, Johan Hall et al, Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines, CoNLL 2006; J. Nivre, J. Nilsson, Pseudo-Projective Dependency Parsing, ACL 2005

Jun. 24 (David Smith): Percy Liang, Ben Taskar, Dan Klein, Alignment by Agreement, HLT-NAACL 2006

Spring 2006

Algorithms for NLP (mostly)

May 18 (Markus Dreyer): Jonathan May, Kevin Knight, A Better N-Best List: Practical Determinization of Weighted Finite Tree Automata, Proc. NAACL-HLT, 2006

May 11 (John Blatz): M. Gengler, An introduction to parallel dynamic programming, Lecture Notes in Computer Science, 1996

May 4 (David Smith): C. E. R. Alves, E. N. C′aceres F. Dehne, Parallel dynamic programming for solving the string editing problem on a CGM/BSP, SPAA 2002

Apr. 20 (Balakrishnan V): Richard M. Karp, Michael 0. Rabin, Efficient randomized Pattern matching Algorithms, IBM Journal of Research and Development, 1987

Mar. 31, Apr. 6 (Eric Harley): Ben Taskar, Lacoste-Julien Simon, Klein Dan, A Discriminative Matching Approach to Word Alignment, ACL 2005; A related paper is; Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic, Non-projective Dependency Parsing using Spanning Tree Algorithms, HLT-EMNLP 2005

Mar.17 (Elliott Franco Drabek): Necip Fazil Ayan, Bonnie J. Dorr, Christof Monz, Alignment Link Projection Using Transformation-Based Learning, HLT-EMNLP 2005

Mar.10 (Roy Tromble): Terry Koo, Michael Collins, Hidden-Variable Models for Discriminative Reranking, HLT-EMNLP 2005

Mar.3 (Jason Riesa): Hal Daume III, Daniel Marcu, Domain Adaptation for Statistical Classifiers, Journal of Artificial Intelligence Research, 2006; J. Gorman, J. Curran, Approximate Searching for Distributional Similarity, Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, 2005

Feb. 23 (Omar F. Zaidan): Ravichandran, Pantel, Hovy, Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering, ACL 2005

Consensus decoding

Feb. 16 (Noah A Smith): Khalil Sima'an, Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars, COLING 1996; Francisco Casacuberta, Colin de la Higuera, Computational complexity of problems on probabilistic grammars and transducers, LNAI 1981; For a longer and more HMM/compbio view and extended results, see; Rune B. Lyngsoe, Christian N. S. Pederson, The Consensus String Problem and the Complexity of Comparing Hidden Markov Models, Journal of Computer and System Sciences 65:545-69, 2002

Extracting idioms

Feb. 9 (John Blatz): Dominic Widdows, Beate Dorow, Automatic Extraction of Idioms using Graph Analysis and Asymmetric Lexicosyntactic Patterns, Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, 2005; Afsaneh Fazly, Suzanne Stevenson, Automatic Acquisition of Knowledge about Multiword Predicates, Proceedings of the 19th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2005).

Fall 2005

Good recent papers

Nov. 23 (Roy Tromble): Sutton, Charles and McCallum, Andrew, Composition of Conditional Random Fields for Transfer Learning, HLT-EMNLP 2005

Nov. 16 (Safiullah Shareef): Hassan Sawaf, Jörg Zaplo, Hermann Ney, Statistical Classification Methods for Arabic News Articles

Nov. 4 (Jason Riesa): Luke S. Zettlemoyer, Michael Collins., Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial, UAI 2005

Oct. 27 (Markus Dreyer): D. Roth and W. Yih, Integer Linear Programming Inference for Conditional Random Fields, ICML 2005

Oct. 20 (Roy Tromble): Sheila M. Reynolds, Jeff A. Bilmes, Part-of-Speech Tagging using Virtual Evidence and Negative Training, HLT-EMNLP 2005

Statistical learning theory

Sep. 21 (Arnab Ghoshal): M. Jordan,Statistical Learning Theory, Chapters 2-3

Sep. 14 (Nikesh Garera): M. Jordan,Statistical Learning Theory, Chapter 8 (Exponential family and Generalized linear models)

Summer 2005

Gibbs sampling

Sep. 1 (John, Markus, & Nikesh): B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581, version 26 April 2004

Aug. 26 (Roy Tromble): Jenny Rose Finkel, Trond Grenager, Christopher Manning, Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling, ACL 2005

AI

Aug. 19 (John Blatz): Niyogi, Sourabh, Steps Toward Deep Lexical Acquisition, ACL 2005

Unsupervised or semi-supervised EM

Aug. 5 (Adam): Duh, Kevin and Kirchhoff, Katrin, Tagging of Dialectal Arabic: A Minimally Supervised Approach, ACL 2005

Jul. 28 (Zak): Takuya Matsuzaki, Yusuke Miyao, Jun'ichi Tsujii, Probabilistic CFG with Latent Annotations, ACL 2005

Jul. 21 (Keith): Sharon Goldwater and Mark Johnson, Representational Bias in Unsupervised Learning of Syllable Structure, ACL 2005

Jul. 21 (Damianos): Ando, Rie and Zhang, Tong, A High-Performance Semi-Supervised Learning Method for Text Chunking, ACL 2005

Learning optimality-theoretic grammars

Jul. 14 (John Blatz): Ying Lin, Learning Stochastic OT Grammars: A Bayesian Approach using Data Augmentation and Gibbs Sampling, ACL 2005

Jul. 14 (Roy Tromble): Sharon Goldwater and Mark Johnson, Learning OT Constraint Rankings Using a Maximum Entropy Model, Proceedings of the Workshop on Variation within Optimality Theory, 2003

Spring 2005

May 7 (Markus Dreyer): M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, M. Gori, Focused Crawling Using Context Graphs, 26th International Conference on Very Large Databases, VLDB 2000; Adam Kilgarriff, Gregory Grefenstette, Introduction to the Special Issue on the Web as Corpus, Computational Lingustics, 2003

Apr. 28 (Damianos Karakos): Alessandro Moschitti and Roberto Basili, Complex Linguistic Features for Text Classification: A comprehensive study, Proceedings of the 26th European Conference on Information Retrieval Research (ECIR 2004)

Apr. 21 (Omar F. Zaidan): Tin Kam Ho, Jonathan J. Hull, Sargur N. Stihari, Decision Combination in Multiple Classifier Systems, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.16. No I. Jan. 1994; Dan Klein, Kristina Toutanova, H. Tolga Ilhan, Sepandar D. Kamvar and Christopher D. Manning, Combining Heterogeneous Classifiers for Word-Sense Disambiguation, ACL 2002

Apr. 16 (Brock Pytlik)

V. Lavrenko, S.L Feng, R. Manmatha, Statistical models for automatic video annotation and retrieval, ICASSP 2004

S.L Feng, R. Manmatha, V. Lavrenko, Multiple Bernoulli Relevance Models for Image and Video Annotation

The first is a short paper about the relevance model. The second is a follow up paper that details a subsequent model based on the CRM.

Apr. 9 (Noah A Smith): G. Elidan, N. Friedman., The Information Bottleneck EM Algorithm, UAI 2003; G. Elidan, N. Friedman, Learning Hidden Variable Networks, JMLR 2005

Feb. 25, Mar. 4, Mar. 11, Apr. 2 (David Smith): M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, Learning in Graphical Models, MIT Press, 1999

Fall 2004

Nov. 27 (Jia Cui): David M. Blei, Andrew Y. Ng, Michael I. Jordan, Latent Dirichlet Allocation, JMLR 2003; Other papers on LDA: [www.cs.toronto.edu/~ywteh/research/npbayes/report.pdf], [7]

Nov. 20 (David Smith): Olle Häggström and Karin Nelander, On Exact Simulation of Markov Random Fields Using Coupling from the Past, Foundation of the Scandinavian Journal of Statistics, 1999; James Fill and Mark Huber, The Randomness Recycler: A New Technique for erfect Sampling, FOCS 2000

Nov. 13 (Charles Schafer)

Endika Bengoextea, Inexact Graph Matching Using Estimation of Distribution Algorithms, Chapter 2: The graph matching problem, Ph.D dissertation, 2002

This chapter is general to the field although pretty sweeping and unspecific as a result. It probably makes a good introduction, since it gives an idea of the scope and diversity of the problem and proposed techniques ...

Yakov Keselman, Ali Shokoufandeh, M. Fatih Demirci, Sven Dickinson, Many-to-Many Graph Matching via Metric Embedding, Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE

This is a state of the art paper which is quite dense but quite interesting. solves a very general formulation of inexact graph matching by first imbedding graphs into a normed space ...

Nov. 5 (Michelle Vanni): Robert S. Swier and Suzanne Stevenson, Unsupervised Semantic Role Labelling, EMNLP 2004; Nianwen Xue and Martha Palmer, Calibrating Features for Semantic Role Labelling, EMNLP 2004

Oct. 29 (Eric Goldlust): Stephen Clark and James Curran, Parsing the WSJ using CCG and Log-Linear Models, ACL 2004

Oct. 22 (Michelle Vanni): Dekang Lin and Franz Och, Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence, ACL 2004; Babych and Hartley, Extending the BLEU MT Evaluation Method with Frequency Weightings, ACL 2004

Oct. 15 (John Blatz): Daichi Mochihashi, Genichiro Kikui, Kenji Kita, Learning Nonstructural Distance Metric by Minimum Cluster Distortions, EMNLP 2004

Oct. 2 (Nguyen Bach)

Background knowledge on SVM and Graphical Models:

Sep. 24, Oct. 7 (Roy Tromble): B. Taskar, C. Guestrin and D. Koller, Max-Margin Markov Networks, Neural Information Processing Systems Conference (NIPS03), 2003; B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning, Max-Margin Parsing, EMNLP 2004

Sep. 9 (John Blatz): Pascale Fung and Percy Cheung, Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM, ACL 2004; Dragos Stefan Munteanu, Alexander Fraser and Daniel Marcu, Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora, ACL 2004

Sep. 2 (Gideon Mann): Xin Li, Paul Morie, and Dan Roth, Robust Reading: Identification and Tracing of Ambiguous Names, ACL 2004; Cheng Niu, Wei Li, Rohini K. Srihari, Weakly Supervised Learning for Cross-Document Person-Name Disambiguation Supported by Information Extraction, ACL 2004

Aug. 27 (David Smith): I. Dan Melamed, Statistical Machine Translation by Parsing, ACL 2004; Daniel Gildea, Dependencies vs. Constituents for Tree-Based Alignment, ACL 2004

Aug. 20 (Damianos Karakos, Charles Schafer): P. Pantel and D. Lin, Discovering word senses from text, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002; Diana McCarthy, Rob Koeling, Julie Weeds, John Carroll, Finding Predominant Word Senses in Untagged Text, 2004

Spring 2004

Information extraction

May 15 (Roy Tromble): Fuchun Peng, Andrew McCallum, Accurate Information Extraction from Research Papers using Conditional Random Fields,2004

May 1 (Izhak Shafran): Eric J. Friedman, Strong Monotonicity in Surplus Sharing, 1999; Used Tom Dietterich has a web page on probabilistic relational models:, [8]

Apr. 24 (David Smith)

McCallum and Jensen, Extraction and Data Mining using Conditional-Probability Relational Models, IJCAI'03 Workshop on Learning Statistical Models from Relational Data, 2003

The paper is a survey of recent trends in IE and data mining (biased of course towards the authors' work) and a proposal to unify them with conditional random fields.

Combinatorial optimization

Apr. 17 (Elliott Franco Drabek): Rina Dechter, Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning, 2001

Apr. 10 (Noah Ashton Smith): Denys Duchier, Axiomatizing Dependency Parsing Using Set Constraints, Sixth Meeting on Mathematics of Language, 2000

Apr. 3 (Roy Tromble): Roman Bartak, Constraint Programming: In Pursuit of the Holy Grail, 1999

Learning how to search

Mar. 25 (Eric Goldlust): Boyan and Moore, Learning Evaluation Functions to Improve Optimization by Local Search, Journal of Machine Learning Research, 2000

Discourse, summarization, paraphrase

Mar. 18 (Markus Dreyer): Eugene Charniak, Niyu Ge, John Hale, A Statistical Approach to Anaphora Resolution, Proceedings of the Sixth Workshop on Very Large Corpora, 1998

Mar. 5 (Charles Schafer): Daniel Marcu, Theory and Practice of Discourse Parsing and Summarization, Chapters 2 & 3, The MIT Press, 2000

Feb. 19 (David Smith): Barzilay and Lee, Learning to Paraphrase: An Unsupervise Approach Using Multiple-Sequence Alignment, HLT 2003

Optimality theory

Feb. 12 (Brock Pytlik): Bob Frank, Giorgio Satta, Optimality theory and the Generative Complexity of Constraint Violability, MIT Press

Feb. 5 (Brock Pytlik): Jessica A. Barlow and Judith A. Gierut, Optimality theory in phonological acquisition, Journal of Speech, Language and Hearing 42, 1999; Paul Boersma, Joost Dekkers and Jeroen van de WeijerIntroduction. In Optimality Theory: Phonology, Syntax and Acquisition, Oxford University Press 2000

Fall 2003

Dec. 12 (Paola Virga): Kamal Nigam and Rayid Ghani, Analyzing the Effectiveness and Applicability of Co-training, Ninth International Conference on Information and Knowledge Management 2000

Nov. 20 (Noah A. Smith): Rebecca Hwa, Miles Osborne, Anoop Sarkar, Mark Steedman, Corrected Co-training for Statistical Parsers, ICML 2003

Nov. 13 (Markus Dreyer): Goldman and Zhou, Enhancing Supervised Learning with Unlabeled Data, ICML 2000; An additional paper with some experiments:; Clark, Curran and Osborne, Bootstrapping POS taggers using Unlabelled Data, CoNLL 2003

Nov. 6 (Brock Pytlik): Stuart M. Shieber, Transducers as a Substrate for Natural Language Processing

Oct. 31 (Roy Tromble): Dekai Wu, An algorithm for simultaneously bracketing parallel texts by aligning words, ACL 1995

Oct. 24 (Markus Dreyer): Stuart M. Shieber, Yves Schabes, Synchronous Tree-Adjoining Grammars, Coling 1990; An additional closely related paper: Stuart M. Shieber, Yves Schabes, Generation and Synchronous Tree-Adjoining Grammars, Fifth International Workshop on Natural Language Generation

Oct. 10 (David Smith): Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 6-7, Blackwell (1989)

Oct. 3 (Michelle Vanni): Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 4-6, Blackwell (1989)

Sep. 18 (David Smith): Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 2-3, Blackwell (1989)

Sep. 11 (Elliott Franco Drabek): Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 1, Blackwell (1989)

Spring 2003

May 15 (Chal Haithaidharm): V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 7B, 8, 9

May 8 (Noah Smith): V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 6B - 7A

May 1 (Noah Smith): V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 5B - 6A

Apr. 24 (Paola Virga): V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 4B - 5A

Apr. 17 (Roy Tromble): V. N. Vapnik, The Nature of Statistical Learning Theory,Chapters 2B - 4A

Apr. 10: V. N. Vapnik, The Nature of Statistical Learning Theory, Intro and Chapters 1, 2A

Mar.20 (Roy Tromble): Nikita Schmid, Ahmed Patel, Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data

Mar.6 (Paola Virga): Carl M. Kadie, Christopher Meek, David Heckerman, A Collaborative Filtering System Using Posteriors Over Weights of Evidence, Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.

Feb. 26 (Elliott Drabek): Steven Abney, Bootstrapping, ACL'02

Feb. 19 (Elliott Drabek): A. Lopez, M. Nossal, R. Hwa, P. Resnik, Word-level Alignment for Multilingual Resource Acquisition, Proceedings of the 2002 LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data

Feb. 13 (David Smith): K. Church, Empirical Estimates of Adaptation: The chance of Two Noriega's is closer to p/2 than p², COLING 2000, pp. 173-179

Fall 2002

Jul. 31 (Paola Virga): Kenji Yamada, Kevin Knight, A decoder for Syntax-based Statistical MT, ACL 2002

Jul. 24 (Michelle Vanni): Paola Merlo, A Multilingual Paradigm for Automatic Verb Classification, ACL 2002

Dec. 5 (Silviu Cucerzan): Darren Pearce, A Comparative Evaluation of Collocation Extraction Techniques, LREC 2002; D. Lin, Automatic identification of non-compositional phrases, ACL 1999

Nov. 21 (Silviu Cucerzan): Ueda, Nakano, Ghahramani, Hinton, SMEM Algorithm for Mixture Models, Neural Information Processing Systems 1998

Nov. 14 (Michelle Vanni): Marti Hearst, Untangling Text Data Mining, ACL 1999

Nov. 7 (Neda Khalili): Yamamoto, Church, Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus, Computational Linguistics 2001; A related paper: Kageura, Bigram Statistics Revisited A Comparative Examination of Some Statistical Measures in Morphological Analysis of Japanese Kanji Sequences

Nov. 1 (Chalaporn Hathaidharm): J. Gao, J. Goodman, M. Li, K. Lee, Toward A Unified Approach To Statistical Language Modeling For Chinese, ACM Transactions on Asian Language Information Processing, Vol. 1, No. 1, pp 3-33. 2002.

Oct. 24 (Roy Tromble): Han, Benjamin, Building a Bilingual Dictionary with Scarce Resources: A Genetic Algorithm Approach

Oct. 17 (David Smith): Cotton, Bird, An Integrated Framework for Treebanks and Multilayer Annotations, LREC 2002

Oct. 8 (Elliott Franco Drabek): Ravichandran, Hovy, Learning Surface Text Patterns for a Question Answering System, ACL 2001; A similar paper: Lin, Pantel, Discovery of Inference Rules for Question Answering, KDD 2001

Oct. 2 (Gideon Mann): Gildea, Jurafsky, Automatic Labeling of Semantics Roles, ACL 2001

Sep. 26 (Paul Ruhlen): Hwa, Resnik, Weinberg, Kolak, Evaluating Translational Correspondence using Annotation Projection, ACL 2002

Sep. 19 (Paola Virga): Yamada, Knight, A decoder for Syntax-based Statistical MT, ACL 2002

Sep. 10 (Noah A. Smith): Collins, Duffy., New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron, ACL 2002

Spring 2002

Apr. 25 (Paul Ruhlen): H. Al-Adhaileh, Kong, Melamed, Malay-English Bitext Mapping and Alignment Using SIMR/GSA Algorithms, Malaysian National Conference on Research and Development on Lingustics 2001

Apr. 18 (Paul Ruhlen): N. A. Rao, K. Rose, Deterministically annealed design of hidden Markov model speech recognizers, IEEE Trans. on Speech and Audio Processing, vol. 9, (no. 2), Feb. 2001

Apr. 11 (Paola Virga): Neal, Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in Graphical Models, 1999; And this article builds on the above. It tests an incremental version of EM (carefully choosing how incremental it will be), as well as a "lazy EM" version that visits "significant" cases more often.

Mar. 28 (Swapna Somasundaran): Crestan, El-Beze, Improving supervised WSD by including rough semantic features in a Multilevel view of the Context, SEMPRO Workshop, Edinburgh, 2001.

Mar. 14 (Noah A. Smith): Ratnaparkhi, A Simple Introduction to Maximum Entropy Models for NLP, Institute for Research in Cognitive Science, Univ. of Penn.

Feb. 28 (Silviu Cucerzan): Marcu, Towards a Unified Approach to Memory- and Statistical-Based Machine Translation, Annual Meeting of the ACL, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics '2001

Feb. 21 (Jia Cui): Barzilay, McKeown, Extracting Paraphrases from a Parallel Corpus, Computer Science Department, Columbia Univ.

Feb. 14 (Charles Schafer): Yaser, Germann, Translating with Scarce Resources, American Association for Artificial Intelligence 2000

Feb. 7 (Paola Virga): Knight, Graehl, Machine Transliteration, ACL-EACL 1997

Fall 2001

Dec. 14 (Jia Cui): Jerome Bellegarda, Exploiting latent semantic information in statistical language models, Proceedings of the IEEE, 88:8, Aug. 2000

Nov. 29 (Silviu Cucerzan): Mike Collins, Yoram Singer, Unsupervised Models for Named Entity Classification, EMNLP/VLC'99

Nov. 20 (Radu Florian): Blum, Mitchell, Combining Labeled and Unlabeled Data with Co-Training, COLT 1998

Nov. 16 (Richard Wicentowski)

Eisner, Satta, Efficient parsing for bilexical context-free grammars and head automaton grammars, ACL 1999

Plagiarism detection systems might be relevant to bitext alignment. A message to the Corpora list yesterday announced the following review paper: [9]

Nov. 2 (Paul Ruhlen): Manning, Schuetze, Foundations of Statistical Natural Language Processing, Section 14 (clustering), pp. 495-527, MIT Press

Oct. 26 (Gideon Mann)

Tishby, Pereira, Bialek, The information bottleneck method

The paper describes a clustering method which is a generalization of their earlier work on "Distributional Clustering of English Words" (Pereira, Tishby and Lee '93).