NLP Reading Group
The Natural Language Processing reading group attempts to keep abreast of interesting research ideas and results that may be useful to us. We typically read and discuss one paper per week. All our past papers are listed below.
The reading group is listed every semester as a 1credit course, 601.865 ("Selected Topics in NLP"). Contact the instructor (Jason Eisner) to get on the mailing list. At the first course meeting, we brainstorm a bunch of topics for the semester, and vote on which ones to pursue. We then spend about 4 weeks per topic. Although some topics are within NLP, many of them explore potentially relevant work from related fields such as machine learning and linguistics.
During the summer we usually catch up on the latest NLP conference papers.
 Instructions on how to present in reading group.
 Jason's advice on how to read a paper.
Spring 2019
Wednesdays 121:15pm, Hackerman 306.
Dataset Shift in NLP
 Organizer: Desh Raj
 We may also look at changepoint detection.
 May 1 (Suzanna Sia)
 Long, M., Cao, Y., Wang, J., & Jordan, M. I. (2015) Learning Transferable Features with Deep Adaptation Networks. ICML.
 Apr 24 (David Mueller and Craig Guo)
 Hal Daumé III (2007) Frustratingly Easy Domain Adaptation. ACL.
 YoungBum Kim, Karl Stratos, and Ruhi Sarikaya (2016) Frustratingly Easy Neural Domain Adaptation. COLING.
 Apr 17 (Desh Raj)
 Amos Storkey. When Training and Test Sets are Different: Characterising Learning Transfer
Grounded Language
 Organizer: Mitchell Gordon
 Apr 10 (Xiao Liu)
 Apr 3 (Mitchell Gordon)
 Hessel, Jack and Mimno, David and Lee, Lillian (2018). Quantifying the visual concreteness of words and topics in multimodal datasets. NAACL.
 Mar 27 (Elias StengelEskin)
 Bisk, Yonatan and Shih, Kevin J. and Choi, Yejin and Marcu, Daniel (2018). Learning Interpretable Spatial Operations in a Rich 3D Blocks World. AAAI.
Linguistic Content of Word Embeddings
 Organizer: Shijie Wu
 Mar 13 (Shijie Wu)
 Kelly W. Zhang, Samuel R. Bowman (2018). Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis. EMNLP.
 Mar 6 (Arya McCarthy)
 Conneau, A., Kruszewski, G., Lample, G., Barrault, L., & Baroni, M. (2018). What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. ACL.
 Feb 27 (Lisa Li)
 Matthew E. Peters, Mark Neumann, Luke Zettlemoyer, Wentau Yih (2018). Dissecting Contextual Word Embeddings: Architecture and Representation. EMNLP.
Multitask Learning and Transfer Learning
 Organizers: Fei Wu & Oliver Adams
 Feb 20 (Yunmo Chen)
 Jacob Devlin, MingWei Chang, Kenton Lee, and Kristina Toutanova(2018). BERT: Pretraining of Deep Bidirectional Transformer for Language Understanding. AI.
 Feb 13 (Fei Wu)
 Jeremy Howard and Sebastian Ruder (2018). Universal Language Model Finetuning for Text Classification. ACL.
 Feb 6 (Fei Wu and Oliver Adams)
 Joachim Bingel and Anders Søgaard (2017). Identifying beneficial task relations for multitask learning in deep neural networks. EACL.
Fall 2018
Random Interesting Papers
 Dec 5 (David Mueller & Suzanna Sia)
 Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum (2018). Linguistically Informed SelfAttention for Semantic Role Labeling. EMNLP.
 Nov 28 (ChuCheng Lin)
 Hao Peng, Roy Schwartz, Sam Thomson, and Noah A. Smith (2018). Rational Recurrences. EMNLP.
 Preceding paper: Schwartz, Thomson, and Smith (2018). SoPa: Bridging CNNs, RNNs and Weighted FiniteState Machines. ACL.
 Nov 14 (Arya McCarthy)
 Olivia Winn and Smaranda Muresan (2018). ‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions. ACL. Best short paper award.
 Noga Zaslavsky, Charles Kemp, Terry Regier, and Naftali Tishby (2018). Efficient compression in color naming and its evolution. PNAS.
 Nov 7
 EMNLP Debrief.
ML Scholarship
 Organizer: Patrick Xia
 Oct 31 (Xuan Zhang)
 Yoav Goldberg (2017). An Adversarial Review of “Adversarial Generation of Natural Language”. Medium.
 Joshua Goodman (2002) Extended Comment on Language Trees and Zipping. arXiv.
 Oct 24 (Patrick Xia)
 Zachary C. Lipton and Jacob Steinhardt (2018). Troubling Trends in Machine Learning Scholarship (pdf), (html). ICML Debates.
 summary on Medium; online discussion on HackerNews and Reddit
 D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, and Michael Young (2014). Machine Learning: The High Interest Credit Card of Technical Debt. SE4ML.
Deep Generative Modeling
 Organizer: Sebastian Mielke
 Oct 17 (Kelly Marchisio)
 Sam R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Raffal Jozefowicz, Samy Bengio (2016). Generating Sentences from a Continuous Space. CONLL.
 Oct 10 (Suzanna Sia)
 Stanislaus Lauly, Yin Zheng, Alexandre Allauzen, Hugo Larochelle (2017). Document Neural Autoregressive Distribution Estimation. JMLR.
 Expands on ideas in Larochelle & Lauly (2017).
 Sep 26 (Sebastian Mielke)
 Benigno Uria, MarcAlexandre Côté, Karol Gregor, Iain Murray, Hugo Larochelle (2016). Neural Autoregressive Distribution Estimation. JMLR.
Test of Time Award Papers
 Organizer: Arya McCarthy
 Sep 19 (Desh Raj)
 Michael Collins (2002). Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002.
 Sep 12 (David Mueller)
 Regina Barzilay and Mirella Lapata (2005). Modeling Local Coherence: An Entitybased Approach. ACL 2005.
 Sep 5 (Arya McCarthy)
 Dan Roth and Wentau Yih (2004). A Linear Programming Formulation for Global Inference in Natural Language Tasks. CoNLL 2004.
Summer 2018
 Aug 23 (Chenxi Liu)
 Adam Santoro, Felix Hill, David Barrett, Ari Morcos, and Timothy Lillicrap (2018). Measuring abstract reasoning in neural networks. ICML 2018.
 Aug 16 (Sebastian Mielke)
 André F. T. Martins and Ramón F. Astudillo (2016). From Softmax to Sparsemax: A Sparse Model of Attention and MultiLabel Classification. ICML 2016.
 Vlad Niculae, André F. T. Martins, Mathieu Blondel, and Claire Cardie (2018). SparseMAP: Differentiable Sparse Structured Inference. ICML 2018.
 Aug 9 (Jacob Buckman)
 Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom (2018). Neural Arithmetic Logic Units. arXiv.
 Aug 2 (Garrett Nicolai)
 Daniel Deutsch, John Hewitt and Dan Roth (2018). A Distributional and Orthographic Aggregation Model for English Derivational Morphology. ACL. slides
 Jul 26
 ACL debriefing session.
 Jul 19 (ChuCheng Lin)
 ChuCheng Lin and Jason Eisner (2018). Neural Particle Smoothing for Sampling from Conditional Sequence Models. NAACL. poster
 Jul 12 (Xuan Zhang)
 Yoshua Bengio, Jerome Louradour, Ronan Collobert, and Jason Weston (2009). Curriculum Learning. ICML. slides
 Jul 5 (Pamela Shapiro)
 Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer (2018). Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. NAACL.
 Jun 29 (Xuan Zhang)
 Alane Suhr, Srinivasan Iyer, and Yoav Artzi (2018). Learning to Map ContextDependent Sentences to Executable Formal Queries. NAACL (outstanding paper award). slides
 Jun 21 (Arya McCarthy)
 Matthew E. Peters et al. (2018). Deep Contextualized Word Representations. NAACL (outstanding paper award).
 This is the ELMo paper.
 Jun 14 (Sebastian Mielke)
 Chaitanya Malaviya, Matthew R. Gormley, and Graham Neubig (2018). Neural Factor Graph Models for Crosslingual Morphological Tagging. ACL.
 Bonus paper: Austin Matthews, Graham Neubig, and Chris Dyer (2018). Using Morphological Knowledge in OpenVocabulary Neural Language Models. NAACL.
 Jun 8
 NAACL debriefing session.
Spring 2018
Optimal Transport
 Organizer: Matthew FrancisLandau
 May 3 (Patrick Xia)
 Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, Bernhard Schoelkopf (2018). Wasserstein AutoEncoders. ICLR.
 Apr 26 (ChuCheng Lin)
 Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun(2017). Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction. EMNLP.
 Apr 19 (Matthew FrancisLandau)
 Gabriel Peyré and Marco Cuturi (2018). Computational Optimal Transport, sections 22.3, 66.2, 4.2 and 9.1. [1] slides
Inference Networks / Stochastic Inversion
 Organizer: Sebastian Mielke
 Apr 12 (Annabelle Carrell)
 Lifu Tu and Kevin Gimpel (2018). Learning Approximate Inference Networks for Structured Prediction. ICLR.
 Apr 5 (Shijie Wu)
 Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu (2017).. Neural Discrete Representation Learning. NIPS.
 Mar 28 (Sebastian Mielke)
 Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song (2018). SyntaxDirected Variational Autoencoder for Structured Data. ICLR.
Cooperative Dialog and Emergence of Language
 Organizers: Patrick Xia and Tom McCoy
 Mar 15 (Annabelle Carrell)
 He He, Anusha Balakrishnan, Mihail Eric, and Percy Liang (2017). Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings. ACL.
 Mar 8 (Tom McCoy)
 Florencia Reali, Nick Chater, and Morten H. Christiansen (2018). Simpler grammar, larger vocabulary: How population size affects language. Proceedings of the Royal Society B.
 Simon Kirby, Hannah Cornish, and Kenny Smith (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. PNAS.
 Mar 1 (Patrick Xia)
 Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni (2017). MultiAgent Cooperation and the Emergence of (Natural) Language. 2017. ICLR.
 Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra (2017). Natural Language Does Not Emerge ‘Naturally’ in MultiAgent Dialog. EMNLP.
Computational Historical Linguistics
 Organizer: Arya McCarthy
 Feb 22 (Tom McCoy)
 William A. Hamilton, Jure Leskovec, and Dan Jurafsky (2016). Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. ACL.
 Feb 15 (Arya McCarthy)
 David Hall and Dan Klein (2010). Finding Cognate Groups using Phylogenies. ACL.
 Feb 8 (Arya McCarthy)
 Lyle Campbell (2013). Historical Linguistics: An Introduction, chapter 5. (See also chapter 1.)
Fall 2017
Inducing "Syntax" for Semantics
Organizer: Adam Poliak
 Dec 14
 NIPS debriefing session
 Nov 30 (ChuCheng Lin)
 Franklin Chang, Gary S. Dell, and Kathryn Bock (2006). Becoming syntactic. Psychological Review. followup
 Nov 16 (Adam Poliak)
 Gormley, Mitchell, Van Durme, Dredze (2014). Lowresource semantic role labeling. ACL.
 Williams, Drozdov, Bowman (2018) Learning to parse from a semantic objective: It works. Is it syntax?. TACL.
 Other suggested papers
 Swabha Swayamdipta, Sam Thomson, Chris Dyer, and Noah A. Smith (2017). Framesemantic parsing with softmaxmargin segmental RNNs and a syntactic scaffold. arXiv.
 Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer (2017). Deep Semantic Role Labeling: What Works and What's Next. ACL.
Evaluation Metrics
Organizer: Pamela Shapiro
 Nov 9 (Becky Marvin)
 Philipp Koehn (2004). Statistical Significance Tests for Machine Translation Evaluation. EMNLP.
 Ying Zhang, Stephan Vogel, and Alex Waibel (2004). Interpreting BLEU/NIST Scores: How Much Improvement Do We Need to Have a Better System? LREC.
 Nov 2 (Pamela Shapiro)
 Chris CallisonBurch, Miles Osborne, and Philipp Koehn (2006). Reevaluating the Role of BLEU in Machine Translation Research. EACL.
 Yvette Graham, Timothy Baldwin, Alistair Moffat, and Justin Zobel (2014). Is Machine Translation Getting Better over Time? EACL.
 Oct 19 (Harrison Huh)
 Neha Nayak, Gabor Angeli, and Christopher D. Manning (2016). Evaluating Word Embeddings Using a Representative Suite of Practical Tasks. ACL.
 Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer (2016). Problems With Evaluation of Word Embeddings Using Word Similarity Tasks. ACL.
Derivational Morphology
Organizer: Arya McCarthy
 Oct 26 (Garrett Nicolai)
 Angeliki Lazaridou, Marco Marelli, Roberto Zamparelli, and Marco Baroni (2013). Compositionally (sic) Derived Representations of Morphologically Complex Words in Distributional Semantics. ACL.
 Max Kisselew, Sebastian Pado, Alexis Palmer, and Jan Snajder (2015). Obtaining a Better Understanding of Distributional Models of German Derivational Morphology. Proceedings of the 11th International Conference on Computational Semantics.
 Oct 12 (Shijie Wu)
 Noam Chomsky (1968). Remarks on Nominalization. Linguistics Club, Indiana University.
 Oct 5 (Arya McCarthy)
 Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, and David Yarowsky (2017). Paradigm Completion for Derivational Morphology. EMNLP.
 Ekaterina Vylomova, Ryan Cotterell, and Timothy Baldwin (2016). ContextAware Prediction of Derivational Wordforms. ACL.
Meaning Representation Formalisms
Organizer: Sebastian Mielke
Paper ideas and suggestions: Google doc
 Sep 28 (Seth Ebner)
 Baldridge and Kruijff (2002). Coupling CCG and Hybrid Logic Dependency Semantics. ACL.
 Sep 21 (Brian Leonard)
 Emily Bender, Dan Flickinger, Stephan Oepen, Woodley Packard, and Ann Copestake (2015). Layers of Interpretation: On Grammar and Compositionality. 11th International Conference on Computational Semantics.
 Sep 14 (Sebastian Mielke)
 Angelina Ivanova, Stephan Oepen, Lilja Øvrelid, and Dan Flickinger (2012). Who Did What to Whom? A Contrastive Study of SyntactoSemantic Dependencies. 6th Linguistic Annotation Workshop.
 Omri Abend and Ari Rappoport (2017). The State of the Art in Semantic Representation. ACL.
Summer 2017
 August 25 (Jason/group)
 Jacob Andreas, Anca Dragan, Dan Klein (2017). Translating Neuralese. ACL.
 August 18 (Keisuke Sakaguchi)
 Keisuke Sakaguchi, Matt Post, Benjamin Van Durme (2017). Errorrepair Dependency Parsing for Ungrammatical Texts. ACL. slides.
 Second segment (Various)
 Ilya Sutskever (2013). Training Recurrent Neural Networks. University of Toronto.
 First segment (Various)
 Percy Liang (2011). Learning DependencyBased Compositional Semantics. UC Berkeley.
(other dissertations considered for discussion)
Spring 2017
Point Processes
Organizer: Ryan Cotterell
 May 4 (Keisuke Sakaguchi)
 Alex Kulesza and Ben Taskar (2010). Structured Determinantal Point Processes. NIPS.
 Apr 27 (Hongyuan Mei)
 Hongyuan Mei and Jason Eisner (2016). The Neural Hawkes Process: A Neurally SelfModulating Multivariate Point Process. arXiv.
 Apr 20 (Ryan Cotterell)
 Ben Taskar. Determinantal Point Processes (tutorial).
Transfer Learning
Organizer: Becky Marvin
 Apr 13 (ChuCheng Lin)
 Mikhail Kozhevnikov and Ivan Titov (2013). Crosslingual Transfer of Semantic Role Labeling Models. ACL.
 Apr 6 (Xiaochen Li)
 Oscar Tackstrom, Ryan McDonald, and Jakob Uszkoreit (2012). Crosslingual Word Clusters for Direct Transfer of Linguistic Structure. NAACL.
 Mar 30 (Becky Marvin)
 Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight (2016). Transfer Learning for LowResource Neural Machine Translation. EMNLP.
Dialog
Applications of the two previous topics below. Organizer: Patrick Xia.
 Mar 16 (Matthew FrancisLandau)
 TsungHsien Wen (2016). A Networkbased EndtoEnd Trainable Taskoriented Dialogue System.
 Mar 9 (Patrick Xia)
 Jiwei Li (2017). Adversarial Learning for Neural Dialogue Generation.
Deep Reinforcement Learning
Organizers: Hongyuan Mei, Tim Vieira
 Mar 2 (Shijie Wu)
 Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel (2016). Value Iteration Networks. NIPS.
 David Silver's lecture on value iteration from his course might be helpful.
 Feb 23 (Hongyuan Mei and Tim Vieira)
 David Silver (2016). Tutorial: Deep Reinforcement Learning. ICML. Slides, video.
Generative Adversarial Nets
Organizers: Ryan Cotterell, Dingquan Wang
 Feb 9 and Feb 16 (Ryan Cotterell, Dingquan Wang)
 Ian Goodfellow (2016). NIPS 2016 Tutorial: Generative Adversarial Networks. slides
 Ian J. Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014). Generative Adversarial Nets. arXiv.
 Martin Arjovsky, Soumith Chintala, Léon Bottou (2017). Wasserstein GAN. arXiv. slides slides_pdf
 Martin Arjovsky, Léon Bottou (2017). Towards Principled Methods for Training Generative Adversarial Networks. ICLR.
Fall 2016
Interpretation and visualization of deep networks
Organizer: Nanyun (Violet) Peng
 Dec 8 (Zach WoodDoughty)
 Li, Yixuan, et al. (2015) Convergent Learning: Do different neural networks learn the same representations? [slides http://s.yosinski.com/yosinski_160503_iclr_convergent.pdf]
 Bonus paper:
 Kádár, Ákos, Grzegorz Chrupała, and Afra Alishahi. Representation of linguistic form and function in recurrent neural networks.
 Dec 1 (Pamela Shapiro)
 Andrej Karpathy, Justin Johnson and Li FeiFei (2016) Visualizing and understanding recurrent networks. ICLR.
 Nov 17 (Nanyun (Violet) Peng)
 Tao Lei, Regina Barzilay and Tommi Jaakkola (2016) Rationalizing Neural Predictions. EMNLP.
Neural MT and generation
Organizer: ChuCheng Lin
 Nov 10
 EMNLP debriefing session.
 Nov 3 (Becky Marvin)
 Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka (2016). TreetoSequence Attentional Neural Machine Translation. ACL.
 Oct 27 (ChuCheng Lin)
 Ilya Sutskever, Oriol Vinyals, and Quoc V. Le (2014). Sequence to Sequence Learning with Neural Networks. NIPS.
 Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio (2015). Neural Machine Translation by Jointly Learning to Align and Translate. ICLR.
Deep learning in structured prediction
Organizer: Tim Vieira
 Sep 29 (Patrick Xia and Matthew FrancisLandau)
 Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros and Noah A. Smith (2016) Recurrent Neural Network Grammars. NAACL.
 Sep 22 (ChuCheng Lin and Hongyuan Mei)
 David Belanger and Andrew McCallum (2016). Structured Prediction Energy Networks. ICML.
 Sep 15 (Matthew FrancisLandau)
 Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein (2016). Learning to Compose Neural Networks for Question Answering. NAACL.
Hypergraph algorithms
Organizer: Travis Wolfe
 Oct 13 (Becky Marvin)
 Alexander M. Rush, YinWen Chang, and Michael Collins (2013). Optimal Beam Search for Machine Translation. EMNLP.
 Oct 6 (Tim Vieira and Zach WoodDoughty)
 Zhifei Li and Jason Eisner (2009). First and SecondOrder Expectation Semirings with Applications to MinimumRisk Training on Translation Forests. EMNLP.
 Sep 8 (Travis Wolfe)
 Liang Huang (2008). Advanced Dynamic Programming in Semiring and Hypergraph Frameworks. COLING tutorial notes.
Spring 2016
Interpretable ML
 Apr 28
 Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling (2015). Bayesian Dark Knowledge. Submitted to NIPS.
 Apr 21
 Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin. (2016). Why Should I Trust You?” Explaining the Predictions of Any Classifier. CHI Workshop on HumanCentred Machine Learning (HCML).
 Optional background reading: Bayesian Learning via Stochastic Gradient Langevin Dynamics (ICML 2011).
 Apr 14
 Letham, Rudin, McCormick, and Madigan (2012). Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model.
OpenDomain Information Extraction
Nanyun will organize this unit.
 Apr 7 (Nanyun Peng)
 Jayant Krishnamurthy and Tom M Mitchell (2015). Learning a Compositional Semantics for Freebase with an Open Predicate Vocabulary. TACL.
 Mar 31 (Dingquan Wang)
 Sebastian Riedel, Limin Yao, Benjamin M. Marlin and Andrew McCallum (2013). Relation Extraction with Matrix Factorization and Universal Schemas. NAACL.
 Mar 24 (Nanyun Peng)
 T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J. Welling (2015). NeverEnding Language Learning. AAAI.
Reinforcement learning
Keisuke and Tim will organize this unit.
 Mar 10 (Keisuke Sakaguchi, Tim Vieira)
 Sergey Levine and Vladlen Koltun (2013). Guided Policy Search. ICML.
 Mar 3 (Nick Andrews)
 David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller (2014). Deterministic Policy Gradient Algorithms. ICML.
 Feb 11, 18, 25 (Tim Vieira, Travis Wolfe)
 Léon Bottou, Jonas Peters, Joaquin QuiñoneroCandela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard and Ed Snelson (2013). Counterfactual Reasoning and Learning Systems. arxiv. slides
 Feb 4 (Keisuke Sakaguchi)
 Merwan Barlier, Julien Perolat, Romain Laroche, and Olivier Pietquin (2015). HumanMachine Dialogue as a Stochastic Game. SIGDIAL. slides
 Optional background: Verena Rieser and Oliver Lemon (2011). Reinforcement Learning. In Reinforcement Learning for Adaptive Dialogue Systems, chapter 3.
Fall 2015
Tensor Decomp
 December 17
 NIPS debriefing session.
 December 3 (Pushpendre Rastogi)
 Schein et al. (2015). Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts. International Conference on Knowledge Discovery and Data Mining (KDD). blog post
 November 19 (Satya Prateek)
 Singh et al. (2015). Towards Combined Matrix and Tensor Factorization for Universal Schema Relation Extraction NAACL.
 November 12 (Pushpendre Rastogi)
 Tao Lei et al (2014). Low Rank Tensors For Scoring Dependency Structures. ACL. Best paper award.
Abstract Meaning Representation (AMR)
 Nov 5 (Darcey Riley)
 Frank Drewes, HansJorg Kreowski, and Annegret Habel (1997). Hyperedge Replacement Graph Grammars. In Handbook of Graph Grammars and Computing by Graph Transformation, pp. 95162.
 Oct 29 (Darcey Riley)
 Jones, Bevan, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, and Kevin Knight (2012). SemanticsBased Machine Translation with Hyperedge Replacement Grammars. Proc. COLING.
 Oct 22 (Darcey Riley)
 Banarescu et al. (2013). Abstract Meaning Representation for Sembanking. Proc. Linguistic Annotation Workshop.
Deep + Probabilistic / Deep + Attention
 Oct 15 (Kevin Duh)
 Deep learning tutorial talk.
 October 8 (ChuCheng Lin)
 Andriy Mnih and Karol Gregor (2014). Neural Variational Inference and Learning in Belief Networks. ICML.
Adaptive Inference
 October 1 (Tim Vieira)
 S. M. Ali Eslami, Daniel Tarlow, Pushmeet Kohli, and John Winn (2014). JustInTime Learning for Fast and Flexible Inference. NIPS. [http://arkitus.com/files/nips14eslamijustintimesupplementary.zip supplementary material, poster
 September 24
 EMNLP debriefing session.
 September 17 (Pushpendre Rastogi)
 David Weiss and Ben Taskar (2013). Learning Adaptive Value of Information for Structured Prediction. NIPS.
 September 10 (Travis Wolfe)
 Jacob Steinhardt and Percy Liang (2015). Reified Context Models.
 September 3 (Tim Vieira and Adam Teichert)
 Shi, Tianlin, Jacob Steinhardt, and Percy Liang (2015). Learning Where to Sample in Structured Prediction. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics.
Spring 2015
Thursdays 121:15pm, Hackerman 306.
Extreme Learning Machine & Computational Learning Theory (w/ practical applications)
 Apr 23 (Mozhi Zhang)
 Maclaurin et al. (2015) Gradientbased Hyperparameter Optimization through Reversible Learning. arXiv.
 Apr 16 (Tim Vieira)
 Ross et al. (2011) A Reduction of Imitation Learning and Structured Prediction to NoRegret Online Learning. AISTATS.
 Apr 9 (Dingquan Wang)
 Long et al. (2010) Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate. ICML.
 Apr 2 (Tongfei Chen)
 Blum et al. (1999) Beating the HoldOut: Bounds for Kfold and Progressive CrossValidation. COLT.
 Mar 26 (Satya Prateek)
 Yosinski et al. (2014) How transferable are features in deep neural networks? arXiv.
 Mar 12 (Travis Wolfe)
 Huang et al. (2006) Extreme Learning Machine: Theory and Applications. Neurocomputing.
Transitionbased parsing
 Feb 19 (Mo Yu)
 Huang et al. (2012) Structured Perceptron with Inexact Search. NAACL.
 Feb 12 (Keisuke Sakaguchi)
 Sartorio et al. (2013) A TransitionBased Dependency Parser Using a Dynamic Parsing Strategy. ACL.
 Feb 5 (Travis Wolfe)
 Yamada and Matsumoto (2003) Statistical Dependency Analysis With Support Vector Machines. IWPT.
Fall 2014
Thursdays 121:15pm, Hackerman 306.
Scientific practice
 Dec 4 (Michael Paul)  Open science and publishing models
 Jason Priem (2013). Beyond the paper. Nature.
 Timothy Gowers and Michael Nielsen (2009). Massively collaborative mathematics. Nature.
 Yann LeCun (2011?). A new publishing model in computer science. Blog post.
 Donald Geman (2007). Ten reasons why conference papers should be abolished. Manuscript.
 Eric Price (2014). The NIPS experiment. Blog post.
 Bert Huang (2014). On the NIPS experiment and review process. Blog post.
 Nov 20 (Dingquan Wang)
 Eisenstein (2013) What to do about bad language on the internet. NAACL.
 Nov 6 (Matt Gormley)
 Clark et al. (2011) Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability. ACL.
 Søgaard et al. (2014) What's a pvalue in NLP?. CoNLL.
Probabilistic semantics
 Oct 30 (Elan HourticolonRetzler)
 Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, Luke Zettlemoyer (2013). Scaling Semantic Parsers with Onthefly Ontology Matching. EMNLP.
 Oct 23 (Violet Nanyun Peng)
 Jonathan Berant, Percy Liang, (2014). Semantic Parsing via Paraphrasing. ACL.
 Oct 16 (Darcey Riley)
 Noah D. Goodman and Daniel Lassiter. Probabilistic Semantics and Pragmatics: Uncertainty in Language and Thought. Chapter for Handbook of Semantics.
Probabilistic programming
 Oct 9 (Adam Teichert)
 Examples section of Noah Goodman and Andreas Stuhlmüller (2014). The Design and Implementation of Probabilistic Programming Languages. Electronic book at http://dippl.org.
 Oct 2 (Travis Wolfe)
 Chapters 4,5,6 of Noah Goodman and Andreas Stuhlmüller (2014). The Design and Implementation of Probabilistic Programming Languages. Electronic book at http://dippl.org.
 Sep 25 (Pushpendre Rastogi)
 Chapters 2,3 of Noah Goodman and Andreas Stuhlmüller (2014). The Design and Implementation of Probabilistic Programming Languages. Electronic book at http://dippl.org.
Beyond MCMC
 Sep 18 (Chandler May)
 Aaron Li, Amr Ahmed, Sujith Ravi, and Alexander J Smola (2014). Reducing the Sampling Complexity of Topic Models. KDD.
 Background: alias sampling
 Sep 11 (Frank Ferraro)
 Luke Bornn, Yutian Chen, Nando de Freitas, Mareija Eskelin, Jing Fang, and Max Welling (2013). Herded Gibbs Sampling. ICLR.
 Sep 4 (Nicholas Andrews)
 Anoop Korattikara, Yutian Chen, and Max Welling (2014). Austerity in MCMC Land: Cutting the MetropolisHastings Budget ICML.
Summer 2014
 Aug 14 (Adam Teichert)
 Joseph Gonzalez, Yucheng Low, Arthur Gretton, and Carlos Guestrin (2011). Parallel gibbs sampling: From colored fields to thin junction trees. AISTATS.
 July 24 (Tim Vieira)
 Alexandre BouchardCôté, Slav Petrov, and Dan Klein (2009). Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs. NIPS.
 July 24 (Matt Gormley)
 Michael U. Gutmann and Aapo Hyvärinen (2010). A new estimation principle for unnormalized statistical models. AISTATS.
 July 17 (Juneki Hong)
 TBA
 May 15 (Tim Vieira)
 TBA
 May 8 (Travis Wolfe)
 Percy Liang, Hal Daume, and Dan Klein (2008). Structure Compilation: Trading Structure for Features. ICML.
Spring 2014
Recent papers
 May 1 (Michael Paul)
 Thang Nguyen, Yuening Hu, and Jordan BoydGraber (2014). Anchors Regularized: Adding Robustness and Extensibility to Scalable TopicModeling Algorithms. ACL.
 Apr 24 (Adam Teichert)
 Dani Yogatama and Noah A. Smith (2014). Linguistic Structured Sparsity in Text Categorization. ACL.
Semantic parsing
 Apr 17 (Juneki Hong)
 Dipanjan Das, Andre F. T. Martins, and Noah Smith (2012). An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints. *SEM. slides
 Apr 10 (Keisuke Sakaguchi & Yiran Zhang)
 Yoav Artzi and Luke Zettlemoyer (2013). Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions. TACL.
 Apr 3 (Xuchen Yao)
 Percy Liang, Michael I. Jordan, and Dan Klein (2011). Learning dependencybased compositional semantics.. ACL.
Clever MT algorithms
 Mar 27 (Matt Gormley)
 Michel Galley, Chris Quirk, Colin Cherry, and Kristina Toutanova (2013). Regularized Minimum Error Rate Training. EMNLP.
 Mar 13 (Dan Deutsch)
 Adam Pauls and Dan Klein (2009). KBest A* Parsing. ACL.
 Mar 6 (Nanyun Peng)
 Andrei Simion, Michael Collins, and Clifford Stein (2013). A Convex Alternative to IBM Model 2. EMNLP.
Online inference
 Feb 27 (Nicholas Andrews)
 Michael Bryant and Erik B. Sudderth (2012). Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes. NIPS. Nick's slides
 Feb 13 (Frank Ferraro), Feb 20 (Ryan Cotterell)
 Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley (2013). Stochastic variational inference. JMLR.
 Feb 6 (Ryan Cotterell)
 Percy Liang and Dan Klein (2009). Online EM for unsupervised models. NAACL. slides
Fall 2013
Recent Papers
 Dec 12
 NIPS debriefing.
 Dec 5 (Xuchen Yao)
 Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang (2013). Semantic parsing on Freebase from questionanswer pairs. EMNLP. supplement
 See also: Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer (2013). Scaling Semantic Parsers with OntheFly Ontology Matching. EMNLP.
 Nov 21 (Adam Teichert)
 Alexander M. Rush, YinWen Chang, and Michael Collins (2013). Optimal Beam Search for Machine Translation. EMNLP.
 Nov 14 (Frank Ferraro)
 Yi Yang and Jacob Eisenstein (2013). A LogLinear Model for Unsupervised Text Normalization. EMNLP.
ML for Annotation / Active Learning
 Nov 7 (Michael Paul)
 Dan Garrette and Jason Baldridge (2013). Learning a PartofSpeech Tagger from Two Hours of Annotation. NAACL.
 Oct 31 (Ryan Cotterell)
 Burr Settles (2012). Active Learning, chapters 35. Synthesis Lectures on Artificial Intelligence and Machine Learning.
 Oct 24
 EMNLP debriefing.
 Oct 17 (Tim Vieira)
 Burr Settles (2012). Active Learning, chapters 13. Synthesis Lectures on Artificial Intelligence and Machine Learning.
Informal Domains
 Oct 10 (Naomi Saphra)
 Jacob Eisenstein (2012). Phonological Factors in Social Media Writing. Proceedings of NAACL Workshop on Language Analysis in Social Media.
 Oct 3 (Juneki Hong)
 Alan Ritter, Sam Clark, Mausam, and Oren Etzioni (2011). Named Entity Recognition in Tweets: An Experimental Study. EMNLP. slides
Deep Learning for NLP
 Sep 26 (Nick Andrews)
 Richard Socher and Christopher Manning (2013). Deep Learning for NLP (without Magic). Tutorial at NAACL, continued.
 Sep 19 (Matt Gormley)
 Richard Socher and Christopher Manning (2013). Deep Learning for NLP (without Magic). Tutorial at NAACL.
 Sep 12 (Travis Wolfe)
 Ronan Collobert and Jason Weston (2008). A Uniﬁed Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML.
Summer 2013
 Aug 15
 ACL debriefing.
 Jun 20
 NAACL debriefing.
 Jun 13 (Nicholas Andrews)
 Marta Recasens, MarieCatherine de Marneffe, and Christopher Potts (2013). The Life and Death of Discourse Entities: Identifying Singleton Mentions. NAACL (short paper).
Spring 2013
Thursdays 121:15pm in Hackerman 306.
Recent NLP papers
 May 2 (Gaurav Kumar)
 Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, and Joakim Nivre (2013). Token and Type Constraints for CrossLingual PartofSpeech Tagging. TACL.
 Apr 25 (Nicholas Andrews)
 J Gillenwater, A Kulesza, and B Taskar (2012). Discovering Diverse and Salient Threads in Document Collections. EMNLP.
 Apr 18 (Michael Paul)
 R Socher, M Ganjoo, H Sridhar, O Bastani, CD Manning, and AY Ng (2013). ZeroShot Learning Through CrossModal Transfer. arXiv, March.
Inference for NLP
 Apr 11 (Matt Gormley)
 J. Domke (2011). Dual decomposition for marginal inference. AAAI.
 Apr 4 (Tim Vieira)
 J. Paisley, D. Blei, and M. Jordan (2012). Variational Bayesian Inference with Stochastic Search. ICML.
 Mar 28 (Adam Teichert)
 D. Weiss, B. Sapp, and B. Taskar (2012). Structured Prediction Cascades. arXiv, August.
Semantics in NLP
 Mar 14 (Violet Nanyun Peng)
 Dipanjan Das and Noah A. Smith (2011). SemiSupervised FrameSemantic Parsing for Unknown Predicates. ACL.
 Mar 7 (Frank Ferraro)
 David Chen (2012). Fast Online Lexicon Learning for Grounded Language Acquisition. ACL.
 Feb 28 (Darcey Riley)
 Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox (2012). A Joint Model of Language and Perception for Grounded Attribute Learning. ICML.
Alignment
 Feb 21 (Henry Pao)
 Chris Dyer, Jonathan Clark, Alon Lavie, and Noah A. Smith (2011). Unsupervised Word Alignment with Arbitrary Features. ACL.
 Feb 14 (Travis Wolfe)
 Adam Pauls, Dan Klein, David Chiang, and Kevin Knight (2010). Unsupervised Syntactic Alignment with Inversion Transduction Grammars. NAACL.
 Feb 7 (Xuchen Yao)
 Mohit Bansal, Chris Quirk, and Robert C. Moore (2011). Gappy Phrasal Alignment By Agreement. ACL.
Fall 2012
Good recent ML papers
 Jan 24 (Nick Andrews)
 Tony Jebara and Anna Choromanska (2012). Majorization for CRFs and Latent Likelihoods. NIPS.
 Jan 17 (Adam Teichert)
 PoLing Loh and Martin Wainwright (2012). Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. NIPS.
 Jan 10 (Tim Vieira)
 Thomas Furmston and David Barber (2012). A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes. NIPS.
 Jan 3 (Jason Eisner)
 Robert Gens and Pedro Domingos (2012). Discriminative Learning of SumProduct Networks. NIPS. Slides.
Good recent NLP papers
 Dec 13 (Nathaniel Filardo)
 Sebastian Riedel, David Smith, and Andrew McCallum (2012). Parse, Price and Cut: Delayed Column and Row Generation for Graph Based Parsers. ACL. background
 Dec 6 (Gaurav Kumar)
 Liang Huang, Suphan Fayong, and Yang Guo (2012). Structured Perceptron with Inexact Search. NAACL.
 Nov 29 (Frank Ferraro)
 Jason Naradowsky, Sebastian Riedel, and David Smith (2012). Improving NLP through Marginalization of Hidden Syntactic Structure. EMNLP.
 Nov 15 (Henry Pao)
 Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng (2012). Semantic compositionality through recursive matrixvector spaces. ACL.
Human sentence processing
 Nov 8 (Olivia Buzek)
 Steven T. Piantadosi, Harry Tily, and Edward Gibson (2011). The communicative function of ambiguity in language. Cognition.
 Nov 1 (Aric Velbel)
 Roger Levy and T. Florian Jaeger (2007). Speakers optimize information density through syntactic reduction. Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems.
 Oct 25 (Keith Levin)
 Bock, K., & Levelt, W. J. M. (1994). Language production: Grammatical encoding. In M.A. Gernsbacher (Ed.), Handbook of Psycholinguistics (pp. 945984). London: Academic Press.
Streaming/online algorithms in NLP
 Oct 18 (Travis Wolfe)
 Martins, Gimpel, Smith, Xing, Figueiredo, and Aguiar (2010). Aggressive Online Learning of Structured Classiﬁers. Tech report.
 (also seen online as "Learning Structured Classiﬁers with Dual Coordinate Ascent")
 Oct 11 (Violet (Nanyun) Peng)
 Graham Cormode (2011). Sketch Techniques for Approximate Query Processing. Foundations and Trends in Database.
 Oct 4 (Matt Gormley)
 Benjamin Van Durme (2012). Streaming Analysis of Discourse Participants. EMNLP.
Events/Narratives in text
 Sep 27 (Xuchen Yao)
 Quang Do, Wei Lu, Dan Roth (2012). Joint Inference for Event Timeline Construction. EMNLP.
 Sep 20 (Adam Teichert)
 Roi Reichart and Regina Barzilay (2012). Multi Event Extraction Guided by Global Constraints. NAACL.
 Sep 13 (Michael Paul)
 Nathanael Chambers and Dan Jurafsky (2009). Unsupervised Learning of Narrative Schemas and their Participants. ACL.
Summer 2012
Summer conference papers
 Aug 30 (Darcey Riley)
 Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku (2012). Learning to "Read Between the Lines" using Bayesian Logic Programs. ACL.
 Aug 23 (Wes Filardo)
 Zhiheng Huang et al. (2012). Iterative Viterbi A* Algorithm for KBest Sequential Decoding. ACL.
 Aug 16 (Travis Wolfe)
 Alex Kulesza and Ben Taskar (2011). Learning Determinantal Point Processes. UAI.
 Aug 10 (Nick Andrews)
 David Hall and Dan Klein (2012). Training Factored PCFGs with Expectation Propagation. EMNLP.
 Aug 3 (Michael Paul)
 Quang Do; Wei Lu; Dan Roth (2012). Joint Inference for Event Timeline Construction. EMNLP.
 Jul 5 (Tim Vieira)
 David Burkett and Dan Klein (2012). Fast Inference in Phrase Extraction Models with Belief Propagation. NAACL. Slides.
 Jun 29 (Adam Teichert)
 Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit (2012). Crosslingual Word Clusters for Direct Transfer of Linguistic Structure. NAACL.
Spring 2012
Spectral learning
 May 3 (Xuchen Yao)
 Paramveer Dhillon, Dean Foster and Lyle Ungar (2011). MultiView Learning of Word Embeddings via CCA. NIPS 24 , Granada, Spain, Dec. 2011
 Apr 26 (Matt Gormley)
 Franco M. Luque, Ariadna Quattoni, Borja Balle, and Xavier Carreras (2012). Spectral Learning for NonDeterministic Dependency Parsing. EACL 2012. Best paper award.
 Apr 19 (Michael Paul)
 Daniel Hsu, Sham M. Kakade, and Tong Zhang (2009). A Spectral Algorithm for Learning Hidden Markov Models. TwentySecond Annual Conference on Learning Theory (COLT).
Reinforcement learning
 Apr 12 (Travis Wolfe)
 Wilson, Fern, Ray, and Tadepalli (2007). MultiTask Reinforcement Learning: A Hierarchical Bayesian Approach. ICML.
 Apr 5 (Nathaniel Filardo)
 Wingate, David et al. (2011). Bayesian Policy Search with Policy Priors. International Joint Conference on Artificial Intelligence (IJCAI).
 Mar 29 (Jay Feldman)
 Gergely Neu and Csaba Szepesvári (2009). Training parsers by inverse reinforcement learning, Machine Learning Volume 77, Issue 2. Published online by Springer Netherlands.
Nonconvex optimization
 Mar 15 (Frank Ferraro)
Main reading: Robert Michael Lewis, Virginia Torczon, and Michael W. Trosset (2000). Direct search methods: then and now. Journal of Computational and Applied Mathematics, Volume 124, Issues 12, December, pp. 191207.
Optional/supplemental reading: Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon (2003). Optimization by direct search: new perspectives on some classical and modern methods. SIAM Review, Vol. 45, Issue 3, pages 385482.
 Mar 8 (Tim Vieira)
Eric Brochu, Vlad M. Cora and Nando de Freitas (2009). A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. pages 123.
 Mar 1 (Nicholas Andrews)
 Main reading (Part 1): M. Ebden (2008). Gaussian Processes for Regression: A Quick Introduction. TR.
 Extra reading (Chapter 2): Carl Edward Rasmussen and Christopher K. I. Williams (2006). Gaussian Processes for Machine Learning. MIT Press.
 Extra extra reading (Chapter 45): David J.C. MacKay (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press.
Unsupervised/semisupervised learning of linguistic structure
 Feb 23 (Olivia Buzek)
 Sharon Goldwater, Thomas L. Griffiths, Mark Johnson (2009). A Bayesian framework for word segmentation: Exploring the effects of context. Cognition 112 (1), pp. 2154.
 Feb 16 (Adam Teichert)
 Tahira Naseem, Harr Chen, Regina Barzilay, and Mark Johnson (2010). Using Universal Linguistic Knowledge to Guide Grammar Induction, EMNLP.
 Feb 9 (Jason Smith)
 Joao V. Graca, Kuzman Ganchev, and Ben Taskar (2007). Expectation Maximization and Posterior Constraints. In Advances in Neural Information Processing Systems, Vol. 20.
 A longer treatment is Ganchev et al. (2010), Posterior Regularization for Structured Latent Variable Models, JMLR.
 An application to unsupervised dependency parsing is Gillenwater et al. (2011), Posterior Sparsity in Unsupervised Dependency Parsing, JMLR.
Fall 2011
Knowledge representation and reasoning
 Dec 1 (Meher Vijay Yeleti)
 D. Koller, A. Levy, and A. Pfeffer (1997). PClassic: A Tractable Probabilistic Description Logic. AAAI.
 Nov 17 (Ves Stoyanov)
 Franz Baader and Werner Nutt (2002). Basic Description Logics. In the Description Logic Handbook.
 Nov 10 (Nick Andrews)
 Nir Friedman et al. (1999). Learning Probabilistic Relational Models. IJCAI.
 Nov 3 (Matt Gormley)
 Hector J. Levesque (1986). Knowledge Representation and Reasoning. Annual Review of Computer Science, Vol. 1: 255287.
Music modeling
 Oct 27 (Adam Teichert)
 JeanFrançois Paiement, Yves Grandvalet & Samy Bengio (2009). Predictive models for music. Connection Science 21(23):253272.
 Oct 20 (Nathaniel Filardo)
 David Temperley (2010). Modeling CommonPractice Rhythm. Music Perception 27(5):355376.
 Oct 13 (Michael Paul)
 Gerhard Nierhaus (2008). "Genetic Algorithms in Algorithmic Composition". Algorithmic Composition: Paradigms of Automated Music Generation, Chapter 7.4, pp. 157186.
 Oct 6 (Frank Ferraro)
 Fred Lerdahl and Ray Jackendoff (1983). "An Overview of Hierarchical Structure in Music." Music Perception: An Interdisciplinary Journal. Vol. 1, No. 2, Hierarchical Structure in Music (Winter 1983/1984), pp. 229252.
 Resources
 http://www.musictheory.net/lessons provides a sequence of interactive music theory lessons.
 A virtual keyboard: http://www.bgfl.org/bgfl/custom/resources_ftp/client_ftp/ks2/music/piano/.
ML in information retrieval
 Sep 29 (Olivia Buzek)
 ShuangHong Yang, Bo Long, Alexander J. Smola, Hongyuan Zha, and Zhaohui Zheng (2011). Collaborative competitive filtering: learning recommender using context of user choice. SIGIR.
 Sep 22 (Tim Vieira)
 Brian McFee and Gert Lanckriet (2010). Metric Learning to Rank. ICML.
 Sep 15 (Adam Teichert)
 P. Carpena, P. BernaolaGalvan, M. Hackenberg, A.V. Coronado, and J. L. Oliver (2009). Level statistics of words: Finding keywords in literary texts and symbolic sequences. Physical Review.
 Rada Mihalcea, Courtney Corley, and Carlo Strapparava (2006). Corpusbased and Knowledgebased Measures of Text Semantic Similarity. AAAI.
 Sep 8 (Travis Wolfe)
 Dafna Shahaf, Carlos Guestrin (2010). Connecting the dots between news articles. Proc. of KDD.
Summer 2011
Summer conference papers
 Aug 16 (Matt Gormley)
 Taylor BergKirkpatrick, Dan Klein (2011). Simple Effective Decipherment via Combinatorial Optimization. Proc. of EMNLP.
 Jul 19 (Matt Gormley)
 Alexander M. Rush and Michael Collins (2011). Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation. Proc. of ACL. Slides.
 Jul 12 (Wes Filardo)
 Daniel Gildea (2010). Optimal Parsing Strategies for Linear ContextFree Rewriting Systems. Proc. of NAACL. Slides.
 Jun 14 (Xiaoxu Kang)
 Limin Yao, Sebastian Riedel, and Andrew McCallum (2010). Collective CrossDocument Relation Extraction Without Labelled Data. Proc. of EMNLP.
 Jun 7 (Nicholas Andrews)
 Harr Chen, Edward Benson, Tahira Naseem, and Regina Barzilay (2011). InDomain Relation Discovery with MetaConstraints via Posterior Regularization. Proc. of ACL.
Spring 2011
Combinatorial optimization
 May 5 (Wes Filardo)
 Daniel J. Lehmann (1977). Algebraic structures for transitive closure. Theoretical Computer Science 4(1):5976.
 Apr 28 (Jason Smith)
 R. McDonald, F. Pereira, K. Ribarov, and J. Hajic (2005). Nonprojective dependency parsing using spanning tree algorithms. In Proc. HLT/EMNLP, pages 523–530
 Apr 21 (Byung Gyu Ahn)
 David Sontag, Amir Globerson, Tommi Jaakkola (2010). Introduction to Dual Decomposition for Inference. To appear in Optimization for Machine Learning, editors S. Sra, S. Nowozin, and S. J. Wright: MIT Press, 2010.
 Apr 14 (Adam Teichert)
 Jack Edmunds (1965). Paths, Trees, and Flowers. Canadian Journal of Mathematics 17: 449467.
Gametheoretic approaches to discourse pragmatics and to language evolution
 Apr 7 (Michael Paul)
 Paul Vogt (2005). The emergence of compositional structures in perceptually grounded language games. Artificial Intelligence 167(12): 206242.
 Mar 31 (Rachael Richardson)
 David Golland, Percy Liang, Dan Klein (2010). A GameTheoretic Approach to Generating Spatial Descriptions. EMNLP 2010.
 Mar 17 (Xuchen Yao)
 Gerhard Jäger (2008). Game theory in semantics and pragmatics. Unpublished manuscript.
 Note: This looks quite different from the 2011 manuscript that has the same title and author.
 March 10 (Luke Orland)
 Gerhard Jäger (2008). Applications of Game Theory in Linguistics
Variational inference
 March 3 (Nicholas Andrews)
 Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). The infinite PCFG using hierarchical Dirichlet processes. EMNLP.
 Feb 24 (Nathaniel Filardo)
 Matthew Beal (2003). Variational Bayesian Hidden Markov Models. Appears as Chapter 3 of Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London.
 David MacKay (1997). Ensemble Learning for Hidden Markov Models. Unpublished technical report, Cavendish Laboratory, University of Cambridge.
 Slides from Mark Johnson (2007). Why doesn't EM find good HMM POStaggers?. EMNLP.
 Feb 17 (Adam Teichert)
 David M. Blei, Andrew Y. Ng, and Michael I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning.
 Feb 10 (Matt Gormley)
 Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul (1999). An introduction to variational methods for graphical models. Machine Learning.
 To get an intuition first, start with Jason's highlevel explanation of variational inference. For another reference, try the ACL 2007 tutorial slides by Percy Liang and Dan Klein.
Fall 2010
Unsupervised discriminative learning
 Dec 9 (Adam Teichert)
 Continue with last week's reading: chapter 3.
 Dec 2 (Wes Filardo)
 Continue with last week's reading: finish chapter 2.
 Nov 18 (Jason Smith)
 Csaba Szepesvári, Algorithms for Reinforcement Learning. This week we'll read the preface, chapter 1, and the first section of chapter 2. If you're trying to access this outside of JHU, try this link.
 Nov 11 (Ves Stoyanov)
 Yves Grandvalet and Yoshua Bengio, Entropy Regularization, in: SemiSupervised Learning, pages 151168, MIT Press, 2006
 Nov 4 (Michael Paul)
 Baoxun Wang, Xiaolong Wang, Chengjie Sun, Bingquan Liu, Lin Sun (2010). Modeling Semantic Relevance for QuestionAnswer Pairs in Web Social Communities.
 Oct 28 (Adam Teichert)
 Noah Smith and Jason Eisner (2005). Guiding Unsupervised Grammar Induction Using Contrastive Estimation.
Semantic parsing
 Oct 21 (Svitlana Volkova)
 Mihai Surdeanu, Richard Johansson, Adam Meyers, Llu ́ıs Ma`rquez, Joakim Nivre (2008). The CoNLL2008 shared task on joint parsing of syntactic and semantic dependencies. Slides.
 Oct 14 (Xuchen Yao)
 Wei Lu , Hwee Tou Ng , Wee Sun Lee , Luke S. Zettlemoyer (2008). A Generative Model for Parsing Natural Language to Meaning. EMNLP. Slides.
 Oct 7 (Matt Gormley)
 Dipanjan Das, Nathan Schneider, Desai Chen and Noah A. Smith (2010). Probabilistic FrameSemantic Parsing. NAACL.
 Sep 30 (Nicholas Andrews)
 Luke S. Zettlemoyer and Michael Collins (2009). Learning ContextDependent Mappings from Sentences to Logical Form. ACL.
Graphbased methods and random walks
 Sep 23 (Adam Teichert)
 Jie Cai and Michael Strube (2010). EndtoEnd Coreference Resolution via Hypergraph Partitioning. ACL.
 Sep 16 (Delip Rao)
 Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). A Survey of Statistical Network Models. Foundation and Trends in Machine Learning 2, 2 (Feb.), 129233.
 Sep 9 (Svitlana Volkova)
 Einat Minkov and William W. Cohen (2008). Learning Graph Walk Based Similarity Measures for Parsed Text. EMNLP.
Summer 2010
Summer conference papers
 Aug 12 (Jason Smith)
 Alexander Clark (2010). Efficient, Correct, Unsupervised Learning for ContextSensitive Languages. CoNLL.
 Aug 5 (Veselin Stoyanov)
 Hoifung Poon and Pedro Domingos (2010). Unsupervised Ontology Induction from Text. ACL.
 Jul 20
 General discussion of ACL 2010 papers.
 Jul 15 (Nicholas Andrews)
 Shay B. Cohen, David M. Blei and Noah A. Smith (2010). Variational Inference for Adaptor Grammars. NAACL.
 Jul 6 (Veselin Stoyanov)
 D. Chiang, J. Graehl, K. Knight, A. Pauls, and S. Ravi (2010). Bayesian Inference for FiniteState Transducers. NAACL.
 Jun 29 (Matt Gormley)
 Percy Liang, Michael I. Jordan, and Dan Klein (2010). TypeBased MCMC. NAACL. Slides.
 Jun 22 (Spence Green)
 David Burkett, John Blitzer, and Dan Klein (2010). Joint Parsing and Alignment with Weakly Synchronized Grammars. NAACL. Slides.
 Relevant background:
 David A. Smith and Jason Eisner (2009). Parser adaptation and projection with quasisynchronous grammar features. EMNLP.
 David A. Smith and Jason Eisner (2008). Dependency parsing by belief propagation. EMNLP.
 David Burkett and Dan Klein (2008). Two Languages are Better than one (for Syntactic Parsing). EMNLP.
 Relevant background:
 Jun 17 (Ves Stoyanov)
 Aria Haghighi and Dan Klein (2010). Coreference Resolution in a Modular, EntityCentered Model. NAACL.
 Jun 10
 General discussion of NAACL 2010 papers.
Spring 2010
Visual scene parsing
 May 6 (Rizwan Chaudhry)
 S. Fidler, M. Boben, A. Leonardis (2009). Learning Hierarchical Compositional Representations of Object Structure. In Sven J. Dickinson, Alés Leonardis, and Bernt Schiele (eds.), Object Categorization: Computer and Human Vision Perspectives.
 See also the talk that Geoff Hinton gave here last week, Deep learning with multiplicative interactions.
 April 22 (Zach Pezzementi) and April 29 (Balakrishnan V)
 SongChun Zhu and David Mumford (2006). A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2(4):259362. Slides.
 Official final version is good for screen reading but wastes paper.
 April 15 (Nick Andrews)
 Sven Dickinson (2009). The Evolution of Object Categorization and the Challenge of Image Abstraction. In Sven J. Dickinson, Alés Leonardis, and Bernt Schiele (eds.), Object Categorization: Computer and Human Vision Perspectives.
 April 8 (Matt Gormley)
 André F. T. Martins, Noah A. Smith, and Eric P. Xing (2009). Concise Integer Linear Programming Formulations for Dependency Parsing. ACLIJCNLP.
 April 1 (Adam Gerber)
 Aria Haghighi, John DeNero, and Dan Klein (2007). Approximate Factoring for A* Search. HTLNAACL 2007. Slides.
 March 25 (Zhifei Li)
 Mark Hopkins and Greg Langmead (2009). Cube pruning as heuristic search. EMNLP 2009.
 March 11 (Jason Smith)
 Adam Pauls and Dan Klein (2009). KBest A* Parsing. ACL. Slides.
 March 4 (Nathaniel Filardo)
 Pedro Felzenswalb and David McAllester (2007). The Generalized A* Architecture. Journal of Artificial Intelligence Research. Slides from [2].
Weakly supervised learning of semantics
 There's also a nice list of papers at the UT reading group on Connecting Language Acquisition with Machine Perception.
 Feb 25 (Nick Andrews)
 Luke Zettlemoyer and Michael Collins (2005). Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI05).

Feb 11Feb 18 (Ves Stoyanov)  S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay (2009). Reinforcement Learning for Mapping Instructions to Actions. ACLIJCNLP.
 Feb 4 (Rachael Richardson)
 Percy Liang, Michael I. Jordan, and Dan Klein (2009). Learning Semantic Correspondences with Less Supervision. ACLIJCNLP.
Fall 2009
Bayesian methods
 Jan 21 (Zhifei Li)
 Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). The infinite PCFG using hierarchical Dirichlet processes.. EMNLP 2007.
 Jan 14 (Jason Smith)
 Matthew J. Beal, Zoubin Ghahramani, and Carl Edward Rasmussen (2002). The Infinite Hidden Markov Model. NIPS.
 Also discussed in section 7 of last week's paper.
 Jan 7 (Jason Eisner)
 Long lecture on the Dirichlet process (infinite) mixture model.
 Reading: Yee Whye Teh, Michael Jordan, Matthew Beal and David Blei (2005), Hierarchical Dirichlet Processes.
 There's also a stack of relevant slides from Jordan's 2005 NIPS tutorial.
 Dec 3 (Jason Smith)
 Sharon Goldwater and Thomas L. Griffiths (2007), A Fully Bayesian Approach to Unsupervised PartofSpeech Tagging. ACL.
 This paper uses a Gibbs sampler. See also the following papers, which compare Gibbs sampling with Variational Bayes and other methods for the same problem:
 Mark Johnson (2007), Why doesn’t EM find good HMM POStaggers?. EMNLP.
 Jianfeng Gao and Mark Johnson (2008), A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. EMNLP.
 This paper uses a Gibbs sampler. See also the following papers, which compare Gibbs sampling with Variational Bayes and other methods for the same problem:
 Nov 26 (Mechanical Turkey)
 Mary McGlohon (2007), Fried Chicken Bucket Processes. SIGBOVIK.
 Nov 19 (Jason Eisner)
 Lecture on Gibbs sampling and variational Bayes for LDA and its finitestate generalizations.
 Nov 12 (Jason Eisner)
 Yee Whye Teh (2009), Nonparametric Bayesian Models. Video tutorial at Machine Learning Summer School.
 Nov 5 (Zhifei Li)
 David M. Blei, Andrew Y. Ng, & Michael I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003) 9931022.
Inference methods
 Oct 29 (Markus Dreyer)
 Koller & Friedman, Chapter 11, Optimization as Inference
 Oct 22 (Puyang Xu)
 Koller & Friedman, Chapter 12: ParticleBased Methods
 Oct 15 (Ariya Rastrow)
 Koller & Friedman, Chapters 3 & 4
 Oct 1 (Anoop Deoras), Oct 8 (Carolina Parada)
 MacKay (2003), Monte Carlo Methods and Efficient Monte Carlo Methods. Chapters 2930 of Information Theory, Inference, and Learning Algorithms.
Multilingual/ Crosslingual learning
 Sep 24 (Omar F. Zaidan)
 David Burkett and Dan Klein, (2008). Two Languages are Better than One (for Syntactic Parsing). EMNLP, 2008.
 Sep 17 (Rachael Richardson)
 Alexander Fraser, Renjing Wang, and Hinrich Schütze (2009). Rich Bitext Projection Features for Parse Reranking. EACL 2009.
 Sep 10 (Delip Rao)
 Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay (2009). Adding More Languages Improves Unsupervised Multilingual PartofSpeech Tagging: a Bayesian NonParametric Approach. NAACL 2009.
 Summary: There are several approaches to learning syntax in an unsupervised fashion but this paper belongs to the growing notion of exploiting multiple languages to reduce ambiguity in the learning task. The most important takehome message from the paper is, it is possible to consistently reduce the gap between supervised and unsupervised learning by progressively adding more languages to the mix. This is akin to the multiview learning results in machine learning literature. An earlier work by the same authors (EMNLP'08) showed how by carefully selecting pairs of languages in multilingual learning one can achieve better accuracies. The current paper builds on that result and shows that it is not really necessary to handpick the bilingual pairs; robust performance is guaranteed by blindly adding more languages.
 Well, not so blindly. Adding more languages to the setup means estimating more parameters in the model. Without careful implementation, such a model can become intractable. Section 3 explains in detail about the generative setup and the inference procedure. Starting with Goldwater's monolingual HMM tagging like setup for each language, the HMMs are stitched together using alignment links and latent variables called "superlingual tags" leading to a product of experts model. The superlingual tags can be considered as tags that generate similar kind of syntactic entities in each of the languages. The inference procedure as with any nontrivial npbayes setup involves computing integrals that don't have a closed form solution. Monte Carlo sampling is a standard approach to solve such problems. Gibbs sampling is one such method. The details of the sampling process is in sections 3.53.7. This part is a bit technical and will be discussed either tomorrow and/or the sessions on nonparametric bayesian methods. There are other methods one could use, like variational methods and expectation propagation instead.
Summer 2009
Summer conference papers
 July 23 (Zhifei Li)
 Joris Mooij and Bert Kappen, (2008). Bounds on marginal probability distributions. NIPS, 2008.
 July 16 (Markus Dreyer)
 Fabien Cromierès, Sadao Kurohashi (2009). An Alignment Algorithm Using Belief Propagation and a StructureBased Distortion Model. EACL 2009.
 June 25 (Markus Dreyer)
 Hoifung Poon, Colin Cherry, Kristina Toutanova (2009). Unsupervised Morphological Segmentation with LogLinear Models. NAACL 2009.
 June 19 (Zhifei Li)
 David Chiang, Wei Wang and Kevin Knight, (2009). 11,001 new features for statistical machine translation. NAACL 2009.
Spring 2009
Information extraction (relevant to TAC)
 Apr 30 (Chuan Liu)
 Jun Wang (2009). MeanVariance Analysis: A New Document Ranking Theory in Information Retrieval. European Conference on Information Retrieval.
 Apr 23 (Wes Filardo)
 Jun Zhu, Zaiqing Nie, Xiaojing Liu Bo Zhang, JiRong Wen (2009). StatSnowball: a Statistical Approach to Extracting Entity Relationships. WWW 2009.
 Apr 16 (Carolina Parada)
 Julien AhPine, Guillaume Jacquet (2009). CliqueBased Clustering for improving Named Entity Recognition systems. EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30  April 3, 2009
 Apr 9 (Jason Smith)
 Marius Pasca (2009). Outclassing Wikipedia in OpenDomain Information Extraction: WeaklySupervised Acquisition of Attributes over Conceptual Hierarchies. EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30  April 3, 2009
Domain adaptation across text genres
 Apr 2 (Arnab Ghoshal)
 Corinna Cortes, Mehryar Mohri, Michael Riley, Afshin Rostamizadeh. Sample Selection Bias Correction Theory. In Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008).
 Mar 26 (Ariya Rastrow)
 Yishay M, Mehryar M, Afshin R (2008). Domain Adaptation with Multiple Sources. In Proceedings of Advances in Neural Information Processing Systems (NIPS)
 Optional Reading John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. Learning Bounds for Domain Adaptation. Neural Information Processing Systems  NIPS 2007
 Mar 12 (Delip Rao)
 Schweikert G, Widmer C, Scholkopf B, Ratsch G (2008) An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Proceedings of Advances in Neural Information Processing Systems (NIPS)
 Optional Reading: Marx Z, Rosenstein MT, Dietterich TG, Kaelbling LP (2008) Two algorithms for transfer learning. In: Inductive Transfer: 10 years later
 Mar 5 (Omar F. Zaidan)
 SuIn Lee, Vassil Chatalbashev, David Vickrey, and Daphne Koller (2007). Learning a MetaLevel Prior for Feature Relevance from Multiple Related Tasks. ICML 2007.
Recent good papers
 Feb 26 (Zhifei Li)
 John DeNero, Alex Bouchard, and Dan Klein (2008). Sampling Alignment Structure under a Bayesian Translation Model. EMNLP 2008.
 Feb 19 (Jason Eisner)
 Impromptu lecture on Dirichlet distributions, Dirichlet processes, etc.
 Feb 12 (Markus Dreyer)
 Tom Minka (2005). Divergence measures and message passing. Microsoft Research Technical Report. Slides: pdf, ppt.
 Feb 5 (Delip Rao)
 David J. Hand (2006). Classifier Technology and The Illusion of Progress. Statistical Science.
Fall 2008
Programming languages for AI
 Dec 1314
 NIPS workshop on probabilistic programming (see probabilisticprogramming.org), which mentioned a number of other languages and libraries.
 Dec 4 (Omar F. Zaidan)
 Jeff Bilmes (~2002). The Graphical Models Toolkit (GMTK).
 The above link includes a draft of the documentation and a tutorial, as well as the binaries.
 Nov 20 (Wren Thornton)
 Avi Pfeffer (2006). IBAL Tutorial.
 Installed in
masters*:~wren/local/bin
(linux only, so not masters01 or masters02) andclsp:~wren/local/bin
. Add this directory to yourPATH
.  See also other materials, including this paper: Avi Pfeffer (2007). The design and implementation of IBAL: A generalpurpose probabilistic language. In Lise Getoor and Ben Taskar (eds.), Introduction to Statistical Relational Learning.
 Installed in
 Nov 13 (Nathaniel Filardo)
 Marc Sumner and Pedro Domingos (2007). The Alchemy Tutorial. Slides.
 System is installed in
masters*:~nwf/public/alchemy
. There is atutorial
subdirectory. You should be able to follow along in the tutorial by running commands like
 System is installed in
~nwf/public/alchemy/bin/infer \ i ~nwf/public/alchemy/tutorial/basics/uniform.mln \ e ~nwf/public/alchemy/tutorial/empty.db \ r uniform.results \ q Heads
Miscellaneous
 Oct 30, Nov 6
 Discussion of the EMNLP 2008 papers.
 Oct 23 (Damianos Karakos)
 I. Csiszar and G. Tusnady (1984). Information geometry and alternating minimization procedures. Statistics and Decisions, Suppl. Issue 1, pp. 205237.
 The paper is not online, but there are online course notes from Sanjeev Khudanpur.
Probabilistic relational models
 Oct 16 (Nathaniel Filardo)
 Pedro Domingos et al. (2008). Markov Logic. In L. De Raedt, P. Frasconi, K. Kersting and S. Muggleton (eds.), Probabilistic Inductive Logic Programming (pp. 92117). New York: Springer.
 Oct 1 (Balakrishnan Varadarajan?)
 Nir Friedman, Lise Getoor, Daphne Koller, and Avi Pfeffer (1999). Learning Probabilistic Relational Models. In IJCAI.
 A longer book chapter version is linked from here, but the link is dead.
 Sep 25 (Zhifei Li)
 David Smith and Jason Eisner (2008). Dependency Parsing by Belief Propagation. In EMNLP.
Creative uses of classifiers in NLP
 Sep 18 (Markus Dreyer)
 D. Rosenberg, D. Klein and B. Taskar (2007). MixtureofParents Maximum Entropy Markov Models. Uncertainty in Artificial Intelligence (UAI), Vancouver, BC, July.
 Sep 11 (Nikesh Garera)
 Yoav Goldberg and Michael Elhadad (2007). SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking. In ACL 2007.
 Libin Shen; Aravind K. Joshi (2003) An SVMbased voting algorithm with application to parse reranking. In HLTNAACL 2003.
Summer 2008
Good current papers
 August 19 (Zhifei Li)
 Ahmad Emami and Frederick Jelinek (2006). A neural syntactic language model. Journal of machine learning, volume 60, numbers 13, September, 2005.
 August 5 (Zhifei Li)
 Libin Shen, Jinxi Xu and Ralph Weischedel (2008). A New StringtoDependency Machine Translation Algorithm with a Target Dependency Language Model. In ACL 2008.
 July 29 (David Smith)
 Ronan Collobert and Jason Weston (2008). A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML 2008: Helsinki, Finland.
 July 22 (Nikesh Garera)
 Zornitsa Kozareva, Ellen Riloff and Eduard Hovy (2008). Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. Proc. of ACL08: HLT, Columbus, OH.
 July 15 (Markus Dreyer)
 Sittichai Jiampojamarn, Colin Cherry, and Grzegorz Kondrak (2008). Joint Processing and Discriminative Training for LettertoPhoneme Conversion. Proc. of ACL08: HLT, Columbus, OH.
 July 8 (Delip Rao)
 Liang Sun, Shuiwang Ji, and Jieping Ye (2008). A Least Squares formulation for Canonical Correlation Analysis. Proc. of ICML08, Helsinki
 Hotelling, in 1936, proposed a method to characterize the relationship between two variables which widely became known as "Canonical Correlation Analysis" (CCA). This involves solving the generalized eigenvalue problem of the kind Ax = \lambda Bx, which can further be reduced to the symmetric eigenvalue problem (via Cholesky decomposition) in the CCA case. It is a general interest in statistics literature to connect different statistical models to the least squares problem not only to exploit the simpler solutions for solving such problems but also to relate with other methods. The least squares formulation also allows extending the different models using the regularization framework. The least squares formulation for the CCA model involves tying together an older result showing the equivalence of CCA and the Fisher LDA, and a recent least squares formulation of multiclass LDA.
 CCA has been applied traditionally in social sciences and more recently in IR. There is literature applying CCA for problems in crosslingual IR, image retrieval, and learning lexicons. Interestingly, the ACL'08 paper by Haghighi et. al. on learning bilingual lexicons using CCA is not the first paper to do that. There is at least one paper as early as 2004 by Cancedda & friends from XRCE that does something similar and does not get cited in the ACL paper.
 June 12 (Zhifei Li)
 Hao Zhang, Chris Quirk, Robert C. Moore and Daniel Gildea (2008). Bayesian Learning of Noncompositional Phrases with Synchronous Parsing. Proc. of ACL08: HLT, Columbus, OH.
 June 5 (Markus Dreyer)
 Kuzman Ganchev, João Graça and Ben Taskar (2008). Better Alignments = Better Translations? Proc. of ACL08: HLT, Columbus, OH.
 May 29 (Nikesh Garera)
 Aria Haghighi, Percy Liang, Taylor BergKirkpatrick and Dan Klein (2008). Learning Bilingual Lexicons from Monolingual Corpora. Proc. of ACL08: HLT, Columbus, OH.
Spring 2008
Dynamic programming speedups
 May 15 (David Smith)
 Geoffrey Zweig and Mukund Padmanabhan (2000). Exact AlphaBeta Computation in Logarithmic Space with Application to MAP Word Graph Construction. Proc. of ICSLP, Beijing.
 This is a specialization to HMMs of the DBN version given earlier by Binder, Murphy & Russell (1997). See also section 3.7.1 of Kevin Murphy's thesis.
 Related work: This kind of trick was really pioneered by D. S. Hirschberg (1975), who cut the space requirements of longest common subsequence from quadratic all the way down to linear. Hirschberg's version can be nicely adapted to edit distance. Now, edit distance (and more generally, multiple sequence alignment) is really just a special case of shortest path in a graph. Hirschberg (1975), above, was generalized by Korf (1999)'s "Divide and Conquer Bidirectional Search, which Korf & Zhang (2000) (who discuss all these algorithms) further improved to "Divide and Conquer Frontier Search." Edelkamp & Meyer (2001) give logspace methods for improving A* search for the shortest path in a graph. (Note that A* search often fits in memory for our DP problems; reducing its memory requirements becomes paramount when we are searching trees that branch without rejoining, e.g., chess.) Bidirectional search, which is distantly related to A*, is also pretty well studied, including recent work at JHU's AMS Dept.
 May 1 (John Blatz)
 Pedro Felzenswalb and David McAllester (2006). The Generalized A* Architecture. To appear in the Journal of Artificial Intelligence Research.
 Apr. 24 (Zhifei Li)
 Liang Huang (2008). Forest Reranking: Discriminative Parsing with NonLocal Features. To appear in Proceedings of ACL 2008, Columbus, OH.
 Apr. 17 (Arnab Ghoshal)
 Liang Huang and David Chiang (2005). Better kbest parsing. Proceedings International Workshop on Parsing Technologies.
Grammatical inference
 Apr. 10 (Wren Thornton)
 Carl de Marcken (1996), Linguistic structure as composition and perturbation. ACL.
 Also see thesis version.
 Apr. 3 (Nathaniel Filardo)
 A. Clark (2006). Learning Deterministic Context Free Grammars: The Omphalos Competition.
 Mar. 27 (Nikesh Garera)
 Stolcke, A. and Omohundro, S. (1993). Hidden Markov model induction by Bayesian model merging. Advances in Neural Information Processing Systems (Morgan Kaufmann, San Mateo, CA), 5, 1118.
Inference in graphical models
 Mar. 20 (Delip Rao)
 Jonathan Yedidia, William Freeman, and Yair Weiss (2001). Bethe free energy, Kikuchi approximations and belief propagation algorithms. MERL TR200116.
 Mar. 6&13 (Markus Dreyer)
 M. J. Wainwright, T. Jaakkola and A. S. Willsky (2005). A new class of upper bounds on the log partition function. IEEE Trans. on Information Theory, 51, 23132335.
 Feb. 28 (David Smith)
 David MacKay (2003). Variational methods. Chapter 33 of Information Theory, Inference, and Learning Algorithms.
 Feb. 21 (David Smith)
 Michael I. Jordan et al. (1999). An Introduction to Variational Methods for Graphical Models Machine Learning, 37, 183–233.
 Feb. 7&14 (Delip Rao)
 M. I. Jordan and Y. Weiss (2002). Probabilistic Inference in Graphical Models, The Handbook of Brain Theory and Neural Networks (MIT Press).
Fall 2007
Semisupervised learning
 Dec. 12 (Delip Rao)
 M. Belkin, P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, TechReport, UChicago, TR200201
 Mikhail Belkin, Partha Niyogi, Vikas Sindhwani, On Manifold Regularization, AISTATS 2005
 Nov. 17 (David Smith)
 X. Zhu, SemiSupervised Learning Literature Survey
Recent parsing papers
 Nov. 3 (Christo Kirov)
 I. Titov, J. Henderson, Constituent Parsing with Incremental Sigmoid Belief Networks, ACL 2007
 Oct. 26 (Christo Kirov)
 Seginer, Yoav, Fast Unsupervised Incremental Parsing (syntax induction), ACL 2007
 Oct. 17 (Markus Dreyer)
 Nakagawa, Tetsuji, Multilingual Dependency Parsing Using Global Features, EMNLPCoNLL 2007
Text compression
 Oct. 10 (Nathaniel W Filardo)
 Mahoney, Matthew, Adaptive Weighting of Context Models for Lossless Data Compression, Florida Institute of Technology, CS Department, Technical report CS200516, EMNLPCoNLL 2007
Some other possible papers that we didn't read (not vetted):
 Approaches that consider recursive text structure
 Charikar et al. (2005), The smallest grammar problem
 de Marcken (1996), Linguistic structure as composition and perturbation (thesis version)  read later on 4/10/08
 Katajainen et al. (1986), SyntaxDirected Compression of Program Files
 Approaches that learn hidden state
 Cormack & Horspool (1987), Data Compression Using Dynamic Markov Modelling
 Hu et al. (year?), Language Modeling with Stochastic Automata
 Approaches that allow searches inside the compressed text
 Antonio Farina Martinez (2005), New Compression Codes for Text Databases (dissertation)
 Culpepper & Moffat (2006), PhraseBased Pattern Matching in Compressed Text
 Shibata et al. (2000), A BoyerMoore type algorithm for compressed pattern matching
 Shibata et al. (1999), Byte Pair Encoding: A Text Compression Scheme That Accelerates Pattern Matching
 Udi Manber (1997), A text compression scheme that allows fast searching directly in the compressed file
Domain adaptation
 Oct. 3 (David Smith)
 Shai BenDavid, John Blitzer, Koby Crammer, Fernando Pereira, Analysis of Representations for Domain Adaptation
 Sep. 26 (Omar F Zaidan)
 J. Blitzer, R. McDonald, F. Pereira, Domain Adaptation with Structural Correspondence Learning, EMNLP 2006
Summer 2007
Good current papers
 Aug. 30 (Delip Rao)
 Gideon S. Mann, Simple, Robust, Scalable Semisupervised Learning via Expectation Regularization, Proceedings of the 24 th International Conference on Machine Learning 2007
 Aug. 18 (Markus Dreyer)
 D. Talbot, M. Osborne, Randomised Language Modelling for Statistical Machine Translation, ACL 2007
 They use a spaceefficient randomized data structure (Bloom Filter) to store very large ngram models. There is a companion paper that people might want to have a quick look at as well, for comparison:
 D. Talbot, M. Osborne, Smoothed Bloom Filter Language Models: TeraScale LMs on the Cheap, ACL 2007
 Aug. 11 (Nikesh Garera)
 L. Shen, G. Satta, A. Joshi., Guided learning for bidirectional sequence classification, ACL 2007
 Aug. 3 (Yi Su)
 M. Galley, K. McKeown, Lexicalized Markov Grammars for Sentence Compression, NAACLHLT 2007
 Jul. 18 (David Smith)
 P. Liang, S. Petrov, M. Jordan, D. Klein, The Infinite PCFG Using Hierarchical Dirichlet Processes, EMNLPCoNLL 2007
 Jul. 6 (Christopher White)
 A. Braunstein, M. Mezard, R. Zecchina., Survey propagation: an algorithm for satisfiability, Random Structures and Algorithms, 2005.
 We sent some questions to Zecchina.
 Lukas Kroc, Ashish Sabharwal and Bart Selman. Survey propagation revisited: An empirical study. 23rd UAI, 2007.
 Jun. 21 (Christopher White)
 K. Murphy, Y. Weiss, M. Jordan, Loopy belief propagation for approximate inference: An empirical study, 15th UAI, pages 467?75, 1999
 ... discussing (loopy) belief propagation as background for survey propagation, a topic which has been getting more attention lately for its ability to "solve very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas. Chapter 8 of Chris Bishop's textbook is supposed to be a good treatment of graphical models overall. He covers BP in section 8.4.4 after first presenting factor graphs in 8.4.3., David MacKay's treatment of BP, also in terms of factor graphs, is in chapter 26 of his book [3]. It's worth reading this chapter in full, perhaps first reading chapter 16. ... the update equations are given as (26.11) and (26.12) ... [substantial further discussion by Jason was here] Some people may prefer Bishop's style, others MacKay's.
 Jun. 14 (David Smith)
 X. Zhu, Z. Ghahramani,J. Lafferty, Semisupervised learning using Gaussian fields and harmonic functions, ICML 2003
 Jun. 6 (Nikesh Garera)
 A. Alexandrescu, K. Kirchhoff, DataDriven Graph Construction for SemiSupervised GraphBased Learning in NLP, HLT/NAACL 2007
 Jun. 2 (Erin Fitzgerald)
 J. Jiang, C. Zhai, A Systematic Exploration of the Feature Space for Relation Extraction, HLT/NAACL 2007
 May 17 (Markus Dreyer)
 M. Galley, K. McKeown, Lexicalized Markov Grammars for Sentence Compression, HLT/NAACL 2007
 May 10 (David Smith )
 M. Johnson, T. Griffiths, and S. Goldwater, Bayesian Inference for PCFGs via Markov Chain Monte Carlo, HLT/NAACL 2007
Spring 2007
Integrating search and learning
 Apr. 19 (John Blatz)
 A. Prieditis, Machine discovery of Effective Admissible Heuristics , Machine Learning Journal, 1993
 Apr. 12 (Markus Dreyer)
 A. Haghighi, J. DeNero and D. Klein, Approximate Factoring for A* Search, NAACLHLT 2007
 Mar. 29 & Apr. 5 (Zhifei Li)
 H. Daume III, J. Langford, and D. Marcu, Searchbased structured prediction, Machine Learning Journal, forthcoming
 Mar. 8 (David Smith)
 H. Daume III & D. Marcu, Learning as search optimization: approximate large margin methods for structured prediction, ICML 2005
Recent IR/QA papers (with an NLP or multilingual focus)
 Mar. 1 (Wei Chen)
 M. Kaisser, S. Scheible, and B. Webber, Experiments at the University of Edinburgh for the TREC 2006 QA track, TREC15
 They do some fairly deep interpretation of sentences, extracting their predicateargument structure.
 Feb. 22 (Eric Harley)
 K. Kan Lo & W. Lam, Using Semantic Relations with World Knowledge for Question Answering, TREC15
Unsupervised learning of morphology
 Feb. 15 (Nikhil Bojja)
 C. Monson et. al., Unsupervised Induction of Natural Language Morphology Inflection Classes, ACL Student Workshop '04
 Feb. 8 (Delip Rao)
 P. Schone and D. Jurafsky, Knowledgefree induction of morphology using latent semantic analysis , CoNLL 2000
 However, there was an extension of this work reported in NAACL2001 that looks at circumfixes and prefix/affix combinations. [4]
 Feb. 1 (Nikesh Garera)
 D. Yarowsky and R. Wicentowski, Minimally supervised morphological analysis by multimodal alignment,ACL 2000
 For more details refer to Chapter 4 of Wicentowski's thesis.
Fall 2006
Syntaxbased MT
 Dec. 13 (Delip Rao)
 J. Carbonell et. al., Contextbased machine translation, AMTA 2006
 Dec. 6 (Jason Smith)
 M. Galley et. al., Scalable Inference and Training of ContextRich Syntactic Translation Models, ACL 2006
 It may also be helpful to look at:
 M. Galley et. al., What's in a translation rule?, HLT/NAACL 2004
 Nov. 29 (Balakrishnan V)
 D. Marcu et. al., SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , EMNLP 2006
 Nov. 15 (Eric Harley)
 D. Chiang, An introduction to synchronous grammars, ACL 2006 Tutorial
 Slides from the talk are also available. [5]
Linguistics: Syntactic formalisms
 Nov. 8 (Elliott Drabek)
 K.Shklovsky, A Grammatical Sketch of Petalcingo Tzeltal, Undergraduate Thesis, Reed College, 2005
 It is 77 pages long, but not dense, and I will be skipping the following sections: Pages
 0114 Phonetics and phonology
 1818 Polyvalence
 2121 Inherent possession and ...
 4655 Tense and aspect and other sections
 Nov. 1 (Yi Su)
 M. Steedman, Gapping as Constituent Coordination, Linguistics and Philosophy, Vol. 13, 1990, pp.207264.
 See Yi for photocopies.
 Oct. 25 (Markus Dreyer)
 S. Reizler et. al., Parsing the Wall Street Journal using a LexicalFunctional Grammar and Discriminative Estimation Techniques, ACL 2002
 Oct. 18 (Erin Fitzgerald)
 J. Bresnan & R.M. Kaplan, LexicalFunctional Grammar: A Formal System for Grammatical Representation , The Mental Representation of Grammatical Relations, MIT Press, 1982
 The edited collection that this appears in is generally interesting. Bresnan defends and develops lexicalized grammars in general; the idea of separate surface and semantic roles; and Bresnan & Kaplan's LFG in particular. You should know that she originated (in 1978) the extremely influential idea of lexicalized syntax  the idea that a grammar is simply a collection of lexical entries to be assembled in standard languageindependent ways, but that there are also "lexical redundancy rules" that relate, e.g., active and passive entries for the same verb. Some chapters address morphological and cognitive issues pertaining to lexicalization, including an essay by Pinker on lexicalist learning., Slides from Erin's presentation can be found here.
Machine learning: Margin methods and structured classification
 Oct. 11 (John Blatz)
 L.Xu, D. Wilkinson, F. Southey, & D. Schuurmans, Discriminative Unsupervised Learning of Structured Predictors , ICML 2006
 Oct. 4 (Nikesh Garera)
 A. Culotta & J. Sorensen, Dependency Tree Kernels for Relation Extraction , ACL 2004
 D. Zelenko, C. Aone, & A. Richardella, Kernel Methods for Relation Extraction, JMLR, Volume 3, 2003
 Sep. 27 (David Smith)
 C. Cortes, P. Haffner, & M. Mohri, Rational Kernels , NIPS 2003
 Sep. 20 (Elliot Drabek)
 K.Q. Weinberger, F. Sha, & L.K. Saul, Learning a kernel matrix for nonlinear dimensionality reduction , ICML 2004
 S.T. Roweis & L.K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding , Science, 22 December 2000
 J.B. Tenenbaum, V. De Silva, & J.C. Langford, A global geometric framework for nonlinear dimensionality reduction , Science, 22 December 2000
 Sep. 13 (Roy Tromble)
 L. Xu, J. Neufeld, B. Larson, & D. Schuurmans, Maximum Margin Clustering , NIPS 2004
Summer 2006
Recent HLTNAACL papers
 Aug. 4 (David Smith)
 Sharon Goldwater, Thomas L. Griffiths, Mark Johnson, Contextual Dependencies in Unsupervised Word Segmentation, ACL 2006
 Anyone looking for a more straightup language modeling discussion can compare:
 Yee Whye Teh, A Hierarchical Bayesian Language Model Based On PitmanYor Processes, ACL 2006
 More resources:
 Machine Learning MLPedia page on Dirichlet Processes
 Michael Jordan's NIPS 2005 tutorial: Nonparametric Bayesian Methods: Dirichlet Processes, Chinese Restaurant Processes and All That
 Y. Teh, M. Jordan, M. Beal, and D. Blei, Hierarchical Dirichlet processes, Journal of the American Statistical Association, 2006
 Jul. 20 (Roy Tromble)
 Mehryar Mohri, Brian Roark, Probabilistic ContextFree Grammar Induction Based on Structural Zeros, HLTNAACL, 2006
 Jul. 6 (Keith Hall)
 Charles Sutton, Michael Sindelar, Andrew McCallum, Reducing Weight Undertraining in Structured Discriminative Learning, HLTNAACL, 2006
 Jun. 31 (Markus Dreyer)
 Joakim Nivre, Johan Hall et al, Labeled PseudoProjective Dependency Parsing with Support Vector Machines, CoNLL 2006
 J. Nivre, J. Nilsson, PseudoProjective Dependency Parsing, ACL 2005
 Jun. 24 (David Smith)
 Percy Liang, Ben Taskar, Dan Klein, Alignment by Agreement, HLTNAACL 2006
Spring 2006
Algorithms for NLP (mostly)
 May 18 (Markus Dreyer)
 Jonathan May, Kevin Knight, A Better NBest List: Practical Determinization of Weighted Finite Tree Automata, Proc. NAACLHLT, 2006
 May 11 (John Blatz)
 M. Gengler, An introduction to parallel dynamic programming, Lecture Notes in Computer Science, 1996
 May 4 (David Smith)
 C. E. R. Alves, E. N. C′aceres F. Dehne, Parallel dynamic programming for solving the string editing problem on a CGM/BSP, SPAA 2002
 Apr. 20 (Balakrishnan V)
 Richard M. Karp, Michael 0. Rabin, Efficient randomized Pattern matching Algorithms, IBM Journal of Research and Development, 1987
 Mar. 31, Apr. 6 (Eric Harley)
 Ben Taskar, LacosteJulien Simon, Klein Dan, A Discriminative Matching Approach to Word Alignment, ACL 2005
 A related paper is
 Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic, Nonprojective Dependency Parsing using Spanning Tree Algorithms, HLTEMNLP 2005
 Mar.17 (Elliott Franco Drabek)
 Necip Fazil Ayan, Bonnie J. Dorr, Christof Monz, Alignment Link Projection Using TransformationBased Learning, HLTEMNLP 2005
 Mar.10 (Roy Tromble)
 Terry Koo, Michael Collins, HiddenVariable Models for Discriminative Reranking, HLTEMNLP 2005
 Mar.3 (Jason Riesa)
 Hal Daume III, Daniel Marcu, Domain Adaptation for Statistical Classifiers, Journal of Artificial Intelligence Research, 2006
 J. Gorman, J. Curran, Approximate Searching for Distributional Similarity, Proceedings of the ACLSIGLEX Workshop on Deep Lexical Acquisition, 2005
 Feb. 23 (Omar F. Zaidan)
 Ravichandran, Pantel, Hovy, Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering, ACL 2005
Consensus decoding
 Feb. 16 (Noah A Smith)
 Khalil Sima'an, Computational Complexity of Probabilistic Disambiguation by means of TreeGrammars, COLING 1996
 Francisco Casacuberta, Colin de la Higuera, Computational complexity of problems on probabilistic grammars and transducers, LNAI 1981
 For a longer and more HMM/compbio view and extended results, see
 Rune B. Lyngsoe, Christian N. S. Pederson, The Consensus String Problem and the Complexity of Comparing Hidden Markov Models, Journal of Computer and System Sciences 65:54569, 2002
Extracting idioms
 Feb. 9 (John Blatz)
 Dominic Widdows, Beate Dorow, Automatic Extraction of Idioms using Graph Analysis and Asymmetric Lexicosyntactic Patterns, Proceedings of the ACLSIGLEX Workshop on Deep Lexical Acquisition, 2005
 Afsaneh Fazly, Suzanne Stevenson, Automatic Acquisition of Knowledge about Multiword Predicates, Proceedings of the 19th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2005).
Fall 2005
Good recent papers
 Nov. 23 (Roy Tromble)
 Sutton, Charles and McCallum, Andrew, Composition of Conditional Random Fields for Transfer Learning, HLTEMNLP 2005
 Nov. 16 (Safiullah Shareef)
 Hassan Sawaf, Jörg Zaplo, Hermann Ney, Statistical Classification Methods for Arabic News Articles
 Nov. 4 (Jason Riesa)
 Luke S. Zettlemoyer, Michael Collins., Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial, UAI 2005
 Oct. 27 (Markus Dreyer)
 D. Roth and W. Yih, Integer Linear Programming Inference for Conditional Random Fields, ICML 2005
 Oct. 20 (Roy Tromble)
 Sheila M. Reynolds, Jeff A. Bilmes, PartofSpeech Tagging using Virtual Evidence and Negative Training, HLTEMNLP 2005
Statistical learning theory
 Sep. 21 (Arnab Ghoshal)
 M. Jordan,Statistical Learning Theory, Chapters 23
 Sep. 14 (Nikesh Garera)
 M. Jordan,Statistical Learning Theory, Chapter 8 (Exponential family and Generalized linear models)
Summer 2005
Gibbs sampling
 Sep. 1 (John, Markus, & Nikesh)
 B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581, version 26 April 2004
 Aug. 26 (Roy Tromble)
 Jenny Rose Finkel, Trond Grenager, Christopher Manning, Incorporating Nonlocal Information into Information Extraction Systems by Gibbs Sampling, ACL 2005
AI
 Aug. 19 (John Blatz)
 Niyogi, Sourabh, Steps Toward Deep Lexical Acquisition, ACL 2005
Unsupervised or semisupervised EM
 Aug. 5 (Adam)
 Duh, Kevin and Kirchhoff, Katrin, Tagging of Dialectal Arabic: A Minimally Supervised Approach, ACL 2005
 Jul. 28 (Zak)
 Takuya Matsuzaki, Yusuke Miyao, Jun'ichi Tsujii, Probabilistic CFG with Latent Annotations, ACL 2005
 Jul. 21 (Keith)
 Sharon Goldwater and Mark Johnson, Representational Bias in Unsupervised Learning of Syllable Structure, ACL 2005
 Jul. 21 (Damianos)
 Ando, Rie and Zhang, Tong, A HighPerformance SemiSupervised Learning Method for Text Chunking, ACL 2005
Learning optimalitytheoretic grammars
 Jul. 14 (John Blatz)
 Ying Lin, Learning Stochastic OT Grammars: A Bayesian Approach using Data Augmentation and Gibbs Sampling, ACL 2005
 Jul. 14 (Roy Tromble)
 Sharon Goldwater and Mark Johnson, Learning OT Constraint Rankings Using a Maximum Entropy Model, Proceedings of the Workshop on Variation within Optimality Theory, 2003
Spring 2005
 May 7 (Markus Dreyer)
 M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, M. Gori, Focused Crawling Using Context Graphs, 26th International Conference on Very Large Databases, VLDB 2000
 Adam Kilgarriff, Gregory Grefenstette, Introduction to the Special Issue on the Web as Corpus, Computational Lingustics, 2003
 Apr. 28 (Damianos Karakos)
 Alessandro Moschitti and Roberto Basili, Complex Linguistic Features for Text Classification: A comprehensive study, Proceedings of the 26th European Conference on Information Retrieval Research (ECIR 2004)
 Apr. 21 (Omar F. Zaidan)
 Tin Kam Ho, Jonathan J. Hull, Sargur N. Stihari, Decision Combination in Multiple Classifier Systems, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.16. No I. Jan. 1994
 Dan Klein, Kristina Toutanova, H. Tolga Ilhan, Sepandar D. Kamvar and Christopher D. Manning, Combining Heterogeneous Classifiers for WordSense Disambiguation, ACL 2002
 Apr. 16 (Brock Pytlik)
 V. Lavrenko, S.L Feng, R. Manmatha, Statistical models for automatic video annotation and retrieval, ICASSP 2004
 S.L Feng, R. Manmatha, V. Lavrenko, Multiple Bernoulli Relevance Models for Image and Video Annotation
 The first is a short paper about the relevance model. The second is a follow up paper that details a subsequent model based on the CRM.
 Apr. 9 (Noah A Smith)
 G. Elidan, N. Friedman., The Information Bottleneck EM Algorithm, UAI 2003
 G. Elidan, N. Friedman, Learning Hidden Variable Networks, JMLR 2005
 Feb. 25, Mar. 4, Mar. 11, Apr. 2 (David Smith)
 M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, Learning in Graphical Models, MIT Press, 1999
Fall 2004
 Nov. 27 (Jia Cui)
 David M. Blei, Andrew Y. Ng, Michael I. Jordan, Latent Dirichlet Allocation, JMLR 2003
 Other papers on LDA: [www.cs.toronto.edu/~ywteh/research/npbayes/report.pdf], [8]
 Nov. 20 (David Smith)
 Olle Häggström and Karin Nelander, On Exact Simulation of Markov Random Fields Using Coupling from the Past, Foundation of the Scandinavian Journal of Statistics, 1999
 James Fill and Mark Huber, The Randomness Recycler: A New Technique for erfect Sampling, FOCS 2000
 Nov. 13 (Charles Schafer)
 Endika Bengoextea, Inexact Graph Matching Using Estimation of Distribution Algorithms, Chapter 2: The graph matching problem, Ph.D dissertation, 2002
 This chapter is general to the field although pretty sweeping and unspecific as a result. It probably makes a good introduction, since it gives an idea of the scope and diversity of the problem and proposed techniques ...
 Yakov Keselman, Ali Shokoufandeh, M. Fatih Demirci, Sven Dickinson, ManytoMany Graph Matching via Metric Embedding, Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE
 This is a state of the art paper which is quite dense but quite interesting. solves a very general formulation of inexact graph matching by first imbedding graphs into a normed space ...
 Nov. 5 (Michelle Vanni)
 Robert S. Swier and Suzanne Stevenson, Unsupervised Semantic Role Labelling, EMNLP 2004
 Nianwen Xue and Martha Palmer, Calibrating Features for Semantic Role Labelling, EMNLP 2004
 Oct. 29 (Eric Goldlust)
 Stephen Clark and James Curran, Parsing the WSJ using CCG and LogLinear Models, ACL 2004
 Oct. 22 (Michelle Vanni)
 Dekang Lin and Franz Och, Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence, ACL 2004
 Babych and Hartley, Extending the BLEU MT Evaluation Method with Frequency Weightings, ACL 2004
 Oct. 15 (John Blatz)
 Daichi Mochihashi, Genichiro Kikui, Kenji Kita, Learning Nonstructural Distance Metric by Minimum Cluster Distortions, EMNLP 2004
 Oct. 2 (Nguyen Bach)
 Background knowledge on SVM and Graphical Models:
 Sep. 24, Oct. 7 (Roy Tromble)
 B. Taskar, C. Guestrin and D. Koller, MaxMargin Markov Networks, Neural Information Processing Systems Conference (NIPS03), 2003
 B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning, MaxMargin Parsing, EMNLP 2004
 Sep. 9 (John Blatz)
 Pascale Fung and Percy Cheung, Mining VeryNonParallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM, ACL 2004
 Dragos Stefan Munteanu, Alexander Fraser and Daniel Marcu, Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora, ACL 2004
 Sep. 2 (Gideon Mann)
 Xin Li, Paul Morie, and Dan Roth, Robust Reading: Identification and Tracing of Ambiguous Names, ACL 2004
 Cheng Niu, Wei Li, Rohini K. Srihari, Weakly Supervised Learning for CrossDocument PersonName Disambiguation Supported by Information Extraction, ACL 2004
 Aug. 27 (David Smith)
 I. Dan Melamed, Statistical Machine Translation by Parsing, ACL 2004
 Daniel Gildea, Dependencies vs. Constituents for TreeBased Alignment, ACL 2004
 Aug. 20 (Damianos Karakos, Charles Schafer)
 P. Pantel and D. Lin, Discovering word senses from text, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002
 Diana McCarthy, Rob Koeling, Julie Weeds, John Carroll, Finding Predominant Word Senses in Untagged Text, 2004
Spring 2004
Information extraction
 May 15 (Roy Tromble)
 Fuchun Peng, Andrew McCallum, Accurate Information Extraction from Research Papers using Conditional Random Fields,2004
 May 1 (Izhak Shafran)
 Eric J. Friedman, Strong Monotonicity in Surplus Sharing, 1999
 Used Tom Dietterich has a web page on probabilistic relational models:, [9]
 Apr. 24 (David Smith)
 McCallum and Jensen, Extraction and Data Mining using ConditionalProbability Relational Models, IJCAI'03 Workshop on Learning Statistical Models from Relational Data, 2003
 The paper is a survey of recent trends in IE and data mining (biased of course towards the authors' work) and a proposal to unify them with conditional random fields.
Combinatorial optimization
 Apr. 17 (Elliott Franco Drabek)
 Rina Dechter, MiniBuckets: A General Scheme for Generating Approximations in Automated Reasoning, 2001
 Apr. 10 (Noah Ashton Smith)
 Denys Duchier, Axiomatizing Dependency Parsing Using Set Constraints, Sixth Meeting on Mathematics of Language, 2000
 Apr. 3 (Roy Tromble)
 Roman Bartak, Constraint Programming: In Pursuit of the Holy Grail, 1999
Learning how to search
 Mar. 25 (Eric Goldlust)
 Boyan and Moore, Learning Evaluation Functions to Improve Optimization by Local Search, Journal of Machine Learning Research, 2000
Discourse, summarization, paraphrase
 Mar. 18 (Markus Dreyer)
 Eugene Charniak, Niyu Ge, John Hale, A Statistical Approach to Anaphora Resolution, Proceedings of the Sixth Workshop on Very Large Corpora, 1998
 Mar. 5 (Charles Schafer)
 Daniel Marcu, Theory and Practice of Discourse Parsing and Summarization, Chapters 2 & 3, The MIT Press, 2000
 Feb. 19 (David Smith)
 Barzilay and Lee, Learning to Paraphrase: An Unsupervise Approach Using MultipleSequence Alignment, HLT 2003
Optimality theory
 Feb. 12 (Brock Pytlik)
 Bob Frank, Giorgio Satta, Optimality theory and the Generative Complexity of Constraint Violability, MIT Press
 Feb. 5 (Brock Pytlik)
 Jessica A. Barlow and Judith A. Gierut, Optimality theory in phonological acquisition, Journal of Speech, Language and Hearing 42, 1999
 Paul Boersma, Joost Dekkers and Jeroen van de WeijerIntroduction. In Optimality Theory: Phonology, Syntax and Acquisition, Oxford University Press 2000
Fall 2003
 Dec. 12 (Paola Virga)
 Kamal Nigam and Rayid Ghani, Analyzing the Effectiveness and Applicability of Cotraining, Ninth International Conference on Information and Knowledge Management 2000
 Nov. 20 (Noah A. Smith)
 Rebecca Hwa, Miles Osborne, Anoop Sarkar, Mark Steedman, Corrected Cotraining for Statistical Parsers, ICML 2003
 Nov. 13 (Markus Dreyer)
 Goldman and Zhou, Enhancing Supervised Learning with Unlabeled Data, ICML 2000
 An additional paper with some experiments:
 Clark, Curran and Osborne, Bootstrapping POS taggers using Unlabelled Data, CoNLL 2003
 Nov. 6 (Brock Pytlik)
 Stuart M. Shieber, Transducers as a Substrate for Natural Language Processing
 Oct. 31 (Roy Tromble)
 Dekai Wu, An algorithm for simultaneously bracketing parallel texts by aligning words, ACL 1995
 Oct. 24 (Markus Dreyer)
 Stuart M. Shieber, Yves Schabes, Synchronous TreeAdjoining Grammars, Coling 1990
 An additional closely related paper: Stuart M. Shieber, Yves Schabes, Generation and Synchronous TreeAdjoining Grammars, Fifth International Workshop on Natural Language Generation
 Oct. 10 (David Smith)
 Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 67, Blackwell (1989)
 Oct. 3 (Michelle Vanni)
 Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 46, Blackwell (1989)
 Sep. 18 (David Smith)
 Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 23, Blackwell (1989)
 Sep. 11 (Elliott Franco Drabek)
 Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 1, Blackwell (1989)
Spring 2003
 May 15 (Chal Haithaidharm)
 V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 7B, 8, 9
 May 8 (Noah Smith)
 V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 6B  7A
 May 1 (Noah Smith)
 V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 5B  6A
 Apr. 24 (Paola Virga)
 V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 4B  5A
 Apr. 17 (Roy Tromble)
 V. N. Vapnik, The Nature of Statistical Learning Theory,Chapters 2B  4A
 Apr. 10
 V. N. Vapnik, The Nature of Statistical Learning Theory, Intro and Chapters 1, 2A
 Mar.20 (Roy Tromble)
 Nikita Schmid, Ahmed Patel, Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data
 Mar.6 (Paola Virga)
 Carl M. Kadie, Christopher Meek, David Heckerman, A Collaborative Filtering System Using Posteriors Over Weights of Evidence, Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.
 Feb. 26 (Elliott Drabek)
 Steven Abney, Bootstrapping, ACL'02
 Feb. 19 (Elliott Drabek)
 A. Lopez, M. Nossal, R. Hwa, P. Resnik, Wordlevel Alignment for Multilingual Resource Acquisition, Proceedings of the 2002 LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data
 Feb. 13 (David Smith)
 K. Church, Empirical Estimates of Adaptation: The chance of Two Noriega's is closer to p/2 than p^{2}, COLING 2000, pp. 173179
Fall 2002
 Jul. 31 (Paola Virga)
 Kenji Yamada, Kevin Knight, A decoder for Syntaxbased Statistical MT, ACL 2002
 Jul. 24 (Michelle Vanni)
 Paola Merlo, A Multilingual Paradigm for Automatic Verb Classification, ACL 2002
 Dec. 5 (Silviu Cucerzan)
 Darren Pearce, A Comparative Evaluation of Collocation Extraction Techniques, LREC 2002
 D. Lin, Automatic identification of noncompositional phrases, ACL 1999
 Nov. 21 (Silviu Cucerzan)
 Ueda, Nakano, Ghahramani, Hinton, SMEM Algorithm for Mixture Models, Neural Information Processing Systems 1998
 Nov. 14 (Michelle Vanni)
 Marti Hearst, Untangling Text Data Mining, ACL 1999
 Nov. 7 (Neda Khalili)
 Yamamoto, Church, Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus, Computational Linguistics 2001
 A related paper: Kageura, Bigram Statistics Revisited A Comparative Examination of Some Statistical Measures in Morphological Analysis of Japanese Kanji Sequences
 Nov. 1 (Chalaporn Hathaidharm)
 J. Gao, J. Goodman, M. Li, K. Lee, Toward A Unified Approach To Statistical Language Modeling For Chinese, ACM Transactions on Asian Language Information Processing, Vol. 1, No. 1, pp 333. 2002.
 Oct. 24 (Roy Tromble)
 Han, Benjamin, Building a Bilingual Dictionary with Scarce Resources: A Genetic Algorithm Approach
 Oct. 17 (David Smith)
 Cotton, Bird, An Integrated Framework for Treebanks and Multilayer Annotations, LREC 2002
 Oct. 8 (Elliott Franco Drabek)
 Ravichandran, Hovy, Learning Surface Text Patterns for a Question Answering System, ACL 2001
 A similar paper: Lin, Pantel, Discovery of Inference Rules for Question Answering, KDD 2001
 Oct. 2 (Gideon Mann)
 Gildea, Jurafsky, Automatic Labeling of Semantics Roles, ACL 2001
 Sep. 26 (Paul Ruhlen)
 Hwa, Resnik, Weinberg, Kolak, Evaluating Translational Correspondence using Annotation Projection, ACL 2002
 Sep. 19 (Paola Virga)
 Yamada, Knight, A decoder for Syntaxbased Statistical MT, ACL 2002
 Sep. 10 (Noah A. Smith)
 Collins, Duffy., New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron, ACL 2002
Spring 2002
 Apr. 25 (Paul Ruhlen)
 H. AlAdhaileh, Kong, Melamed, MalayEnglish Bitext Mapping and Alignment Using SIMR/GSA Algorithms, Malaysian National Conference on Research and Development on Lingustics 2001
 Apr. 18 (Paul Ruhlen)
 N. A. Rao, K. Rose, Deterministically annealed design of hidden Markov model speech recognizers, IEEE Trans. on Speech and Audio Processing, vol. 9, (no. 2), Feb. 2001
 Apr. 11 (Paola Virga)
 Neal, Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in Graphical Models, 1999
 And this article builds on the above. It tests an incremental version of EM (carefully choosing how incremental it will be), as well as a "lazy EM" version that visits "significant" cases more often.
 Mar. 28 (Swapna Somasundaran)
 Crestan, ElBeze, Improving supervised WSD by including rough semantic features in a Multilevel view of the Context, SEMPRO Workshop, Edinburgh, 2001.
 Mar. 14 (Noah A. Smith)
 Ratnaparkhi, A Simple Introduction to Maximum Entropy Models for NLP, Institute for Research in Cognitive Science, Univ. of Penn.
 Feb. 28 (Silviu Cucerzan)
 Marcu, Towards a Unified Approach to Memory and StatisticalBased Machine Translation, Annual Meeting of the ACL, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics '2001
 Feb. 21 (Jia Cui)
 Barzilay, McKeown, Extracting Paraphrases from a Parallel Corpus, Computer Science Department, Columbia Univ.
 Feb. 14 (Charles Schafer)
 Yaser, Germann, Translating with Scarce Resources, American Association for Artificial Intelligence 2000
 Feb. 7 (Paola Virga)
 Knight, Graehl, Machine Transliteration, ACLEACL 1997
Fall 2001
 Dec. 14 (Jia Cui)
 Jerome Bellegarda, Exploiting latent semantic information in statistical language models, Proceedings of the IEEE, 88:8, Aug. 2000
 Nov. 29 (Silviu Cucerzan)
 Mike Collins, Yoram Singer, Unsupervised Models for Named Entity Classification, EMNLP/VLC'99
 Nov. 20 (Radu Florian)
 Blum, Mitchell, Combining Labeled and Unlabeled Data with CoTraining, COLT 1998
 Nov. 16 (Richard Wicentowski)
 Eisner, Satta, Efficient parsing for bilexical contextfree grammars and head automaton grammars, ACL 1999
 Plagiarism detection systems might be relevant to bitext alignment. A message to the Corpora list yesterday announced the following review paper: [10]
 Nov. 2 (Paul Ruhlen)
 Manning, Schuetze, Foundations of Statistical Natural Language Processing, Section 14 (clustering), pp. 495527, MIT Press
 Oct. 26 (Gideon Mann)
 Tishby, Pereira, Bialek, The information bottleneck method
 The paper describes a clustering method which is a generalization of their earlier work on "Distributional Clustering of English Words" (Pereira, Tishby and Lee '93).