NLP Reading Group: Difference between revisions

From CLSP Wiki
Jump to navigation Jump to search
Mwang (talk | contribs)
Shager (talk | contribs)
No edit summary
 
Line 1: Line 1:
The reading group attempts to keep abreast of current trends in natural language processing research. We typically read one or two recent NLP conference papers each week, and occasionally look at material from the machine learning, statistics, and linguistics communities as well.
The Natural Language Processing reading group attempts to keep abreast of interesting research ideas and results that may be useful to us. We typically read and discuss one paper per week.  All our past papers are listed below.


Starting in 2008, we will be posting the weekly readings herePast readings since 2001 are being filled in presently.
The reading group is listed every semester as a 1-credit course, 601.865 ("Selected Topics in NLP").  The instructor is Jason Eisner; contact him to get on the mailing list.  At the first course meeting, we brainstorm a bunch of topics for the semester, and vote on which ones to pursue.  We then spend about 4 weeks per topicAlthough some topics are within NLP, many of them explore potentially relevant work from related fields such as machine learning and linguistics.


== Spring 2008 ==
During the summer we usually catch up on the latest NLP conference papers.


First meeting of the term will be on Thursday, Jan. 31, at noon in NEB 317.  Feel free to bring lunch.
:''Instructions on [[NLP Reading Group/Presenting|how to present in reading group]].''


==  Fall 2007 ==
:''Jason's advice on [http://cs.jhu.edu/~jason/advice/how-to-read-a-paper.html how to read a paper].''
 
:''Other weekly reading groups led by [http://www.clsp.jhu.edu CLSP] faculty are listed on the CLSP wiki's [[Main Page]].'' <!-- ... are the ones on [https://groups.google.com/forum/?fromgroups#!forum/text-choreography semantics] (Fridays), [http://isis.jhu.edu/classes/EN.600.771 probabilistic formal languages], and [http://isis.jhu.edu/classes/EN.600.775 machine learning] (Mondays).  There is also the weekly [http://ml.jhu.edu/wiki/index.php?title=Machine_Learning_Tea Machine Learning Tea]. -->
__NOTOC__
 
== Fall 2024 ==
 
''Wednesdays 12pm, Hackerman 306.''
 
=== Adversarial exploitation of LLMs ===
 
;Dec 4 (Henry Li)
 
:Nicholas Carlini et al. (2024). [https://arxiv.org/abs/2403.06634 Stealing Part of a Production Language Model]. CIML.
 
;Nov 20 (Sophia Hager)
 
:Nicholas Carlini et al. (2023). [https://arxiv.org/pdf/2307.15043 Universal and Transferable Adversarial Attacks on Aligned Language Models].
 
;Nov 13 (TJ Bai)
 
:Nicholas Carlini et al. (2023). [https://arxiv.org/abs/2306.15447 Are aligned neural networks adversarially aligned?] NeurIPS.
 
=== Fine-tuning methods ===
 
;Nov 6 (Brian Lu)
:Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian (2024). [https://arxiv.org/abs/2402.14083 Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping].
:DiJia Su, Sainbayar Sukhbaatar, Michael Rabbat, Yuandong Tian, Qinqing Zheng (2024). [https://arxiv.org/abs/2410.09918v1 Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces].
 
;Oct 30 (Jiahui Li)
:Alisa Liu et al. (2024). [https://arxiv.org/abs/2401.08565 Tuning Language Models by Proxy]. COLM.
 
 
;Oct 23 (Leo Du)
:Tobias Schnabel, Jennifer Neville (2024). [https://arxiv.org/abs/2404.02319 Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization].
 
=== LLMs for scientific discovery ===
 
;Oct 16 (Pristina W)
: Chenglei Si et al. (2024). [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]. CL.
;Oct 9 (Yu Lu Liu)
: Qingyun Wang, Doug Downey, Heng Ji, Tom Hope (2024). [https://aclanthology.org/2024.acl-long.18.pdf SCIMON : Scientific Inspiration Machines Optimized for Novelty]. ACL.
 
;Oct 2 (Cole Molloy)
: Nils Dycke, Matej Zečević, Ilia Kuznetsov, Beatrix Suess, Kristian Kersting, Iryna Gurevych (2024). [https://arxiv.org/abs/2409.05367 Diagnostic Reasoning in Natural Language: Computational Model and Application].
 
=== Training agentic workflows ===
 
;Sep 25 (Nikhil Sharma)
:Yucheng Jiang, Yijia Shao, Dekun Ma, Sina J. Semnani, Monica S. Lam (2024). [https://arxiv.org/abs/2408.15232 Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations].
: Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su (2024). [https://arxiv.org/abs/2405.14831 HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models].
 
;Sep 18 (Tom Wang)
: Yongchao Chen, Jacob Arkin, Yilun Hao, Yang Zhang, Nicholas Roy, Chuchu Fan (2024).[https://arxiv.org/abs/2402.08702 PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling].
: Yuchi Liu, Jaskirat Singh, Gaowen Liu, Ali Payani, Liang Zheng (2024).[https://arxiv.org/abs/2405.20252 Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization].
 
;Sep 11 (Shepard Xia)
: Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao (2023). [https://arxiv.org/abs/2303.11366 Reflexion: Language Agents with Verbal Reinforcement Learning].
 
;Sep 4
: Group search-skim-nominate session.
 
== Spring 2024 ==
 
=== Probing and editing LLMs (mechanistic interpretability) ===
 
;Apr 24 (Leo Du)
:Survey and tutorial on positional embedding in Transformers.
 
;Apr 17 (Zike Hu)
 
:Peter Hase et al. (2023). [https://arxiv.org/pdf/2301.04213 Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models].ICLR.
 
=== Explicit reasoning within LLMs using neuro-symbolic methods ===
 
;Apr 10 (Brian Lu)
:Gabriel Poesia et al. (2023). [https://arxiv.org/abs/2306.04031 Certified Deductive Reasoning with Language Models]. arXiv.
 
;Apr 3 (TJ Bai)
:Ben Prystawski et al. (2023). [https://arxiv.org/abs/2304.03843 Why think step by step? Reasoning emerges from the locality of experience]. NeurIPS.
 
;Mar 27 (Pristina Wang)
:Lionel Wong & Gabriel Grand (2023).  [https://arxiv.org/abs/2306.12672 From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought]. arXiv.
 
=== ML for managing LLM calls  ===
 
''Calibration, imputation, prompt learning, active learning, reinforcement learning, etc.''
 
;Mar 13 (Yixuan Wang)
:Yecheng Jason Ma et al. (2023). [https://arxiv.org/abs/2310.12931 Eureka: Human-Level Reward Design via Coding Large Language Models.] arXiv.
 
;Mar 6 (Jiahui Li)
:Xinyuan Wang et al. (2023). [https://arxiv.org/abs/2310.16427 PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization.] arXiv.
;Feb 28 (Tom Wang)
:Fang, Meng, Yuan Li, and Trevor Cohn (2017). [https://arxiv.org/abs/1708.02383 Learning how to active learn: A deep reinforcement learning approach.] arXiv.
 
=== LLM decoding schemes ===
 
;Feb 21 (Henry Li)
:Schick et al. (2023). [https://arxiv.org/pdf/2302.04761.pdf Toolformer: Language Models Can Teach Themselves to Use Tools]. NeurIPS.
 
;Feb 14 (Cole Molloy)
:Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, Robert West (2024).  [https://arxiv.org/abs/2401.09967 Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access].  arXiv.
:(Optional) Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West (2023).  [https://aclanthology.org/2023.emnlp-main.674 Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning].  EMNLP.
 
;Feb 7 (Shepard Xia)
:Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg (2023).  [https://arxiv.org/abs/2306.03341 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model].  NeurIPS.
 
== Fall 2023 ==
 
=== Training language models on small corpora ===
 
;Dec 6  (Ashi Garg)
:David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal (2023).  [https://arxiv.org/abs/2303.09859 Trained on 100 million words and still in shape: BERT meets British National Corpus].  EACL.
:Inar Timiryasov, Jean-Loup Tastet (2023). [https://arxiv.org/abs/2308.02019 Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty].  CoNLL-CMCL Shared Task.
 
;Nov 29 (Nikhil Sharma)
:Lucas Georges Gabriel Charpentier, David Samuel (2023). [https://arxiv.org/abs/2311.02265 Not all layers are equally as important: Every Layer Counts BERT].  CoNLL-CMCL Shared Task.
:Chengxu Zhuang, Evelina Fedorenko, Jacob Andreas (2023). [https://arxiv.org/abs/2310.13257 Visual Grounding Helps Learn Word Meanings in Low-Data Regimes].  arXiv.
 
;Nov 15 (Cole Molloy)
:Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang (2023). [https://arxiv.org/abs/2301.11796 Call for Papers - The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus]. 
::[https://babylm.github.io/timeline.html BabyLM Challenge website]
:Venkata S Govindarajan, Juan Diego Rodriguez, Kaj Bostrom, Kyle Mahowald (2023).  [https://arxiv.org/abs/2301.11796 Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways].  CoNLL-CMCL Shared Task.
 
=== Psycholinguistics on Transformers ===
 
:We also considered reading [https://arxiv.org/abs/2211.09748], [https://arxiv.org/abs/2305.02386], [https://aclanthology.org/2020.emnlp-main.389/].
 
;Nov 8 (Henry Li)
:Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora (2023). [https://arxiv.org/abs/2303.08117 Do Transformers Parse while Predicting the Masked Word?]  EMNLP.
 
;Nov 1 (Cole Molloy)
:Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, David Bau (2023). [https://arxiv.org/abs/2210.07229 Mass-Editing Memory in a Transformer].  ICLR.
::''Background:'' Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov (2023).  [https://arxiv.org/abs/2202.05262 Locating and Editing Factual Associations in GPT].  NeurIPS.
 
;Oct 25 (Yaohan Guan)
:Evan Hernandez, Belinda Z. Li, Jacob Andreas (2023). [https://arxiv.org/abs/2304.00740 Inspecting and Editing Knowledge Representations in Language Models].  arXiv.
 
=== Machine learning for combinatorial optimization / AutoML ===
 
;Oct 18
:Discussion of AutoML topics
 
;Oct 11 (Matthew Francis-Landau)
:Yoshua Bengio, Andrea Lodi, and Antoine Prouvost (2021).  [https://www.sciencedirect.com/science/article/pii/S0377221720306895 Machine learning for combinatorial optimization: A methodological tour d’horizon].  European Journal of Operational Research, 290(2):405-421.
 
;Oct 4 (Jason Eisner)
:Andrea Lodi and Giulia Zarpellon (2017).  [https://link.springer.com/article/10.1007/s11750-017-0451-6 On learning and branching: a survey].  TOP 25:207-236.
 
=== Connecting language models to world models === 
 
:''Some possible papers include [https://arxiv.org/abs/2302.02801], [https://arxiv.org/abs/2212.10012], [https://arxiv.org/abs/2308.09687], [https://arxiv.org/abs/2308.08614].''
:''We also considered the related topic of connecting language models to reasoning: [https://aclanthology.org/2023.acl-long.294 survey], [https://github.com/zjunlp/Prompt4ReasoningPapers repo of papers], [https://wenting-zhao.github.io/complex-reasoning-tutorial/ tutorial].
 
;Sep 27 (Sophia Hager)
:Belinda Z. Li, Maxwell Nye, and Jacob Andreas (2023). [https://aclanthology.org/2023.findings-acl.795.pdf Language Modeling with Latent Situations]. ACL.
 
;Sep 20 (Yaohan Guan)
:Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler (2023). [https://arxiv.org/abs/2308.09687/ Graph of Thoughts: Solving Elaborate Problems with Large Language Models].  AAAI.
 
;Sep 13 (Brian Lu)
:Ramsés J. Sánchez, Lukas Conrads, Pascal Welke, Kostadin Cvejoski, and César Ojeda (2023). [https://aclanthology.org/2023.acl-long.263/ Hidden Schema Networks]. ACL.
 
;Sep 6 (Jason Eisner)
:Informal review of inference and optimization in graphical models.
 
== Spring 2023 ==
 
;Apr 26 (Matthew Francis-Landau + Nikhil Sharma)
: Business models for large LMs - is there a report from Forrester Research, Gartner Group, Deloitte, IDC, Frost & Sullivan, ... ?
 
;Apr 19 (Tim Vieira)
:Schulman et al. (2015). [https://arxiv.org/abs/1502.05477 Trust Region Policy Optimization]. ICML.
::See also Schulman et al. (2017), [https://arxiv.org/abs/1707.06347 Proximal Policy Optimization Algorithms]. 
 
=== Neurosymbolic Methods ===
 
;Apr 12 (Matthew Francis-Landau)
:Alex Gu, Tamara Mitrovska, Daniela Velez, Jacob Andreas, Armando Solar-Lezama (2022).  [https://arxiv.org/abs/2210.11468 ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications]. arXiv.
 
;Apr 5 (Brian Lu)
:Hao Tang, Kevin Ellis (2022).  [https://haotang1995.github.io/files/PLDI_MAPS_2022.pdf From Perception to Programs: Regularize, Overparameterize, and Amortize].  MAPS.
 
=== Making Transformers more efficient / Long-form generation ===
 
;Mar 29 (Henry Li Xinyuan)
:Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller (2020). [https://arxiv.org/abs/2009.14794 Rethinking Attention with Performers]
::See also Tay et al. (2020), [https://arxiv.org/abs/2011.04006 Long Range Arena: A Benchmark for Efficient Transformers], and Qin et al. (2023), [https://arxiv.org/abs/2202.07856 The NLP Task Effectiveness of Long-Range Transformers].
 
;Mar 15 (Sophia Hager)
:Tri Dao, Daniel Y. Fu, Khaled K. Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré (2023). [https://openreview.net/pdf?id=COZDy0WYGg Hungry Hungry Hippos: Towards Language Modeling with State Space Models]
 
;Mar 8 (Yunmo Chen)
:Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré (2022).  [https://arxiv.org/abs/2205.14135 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.]  arXiv.
 
=== Fancy Generative Models (normalizing/reversible flows, GANs, score-based/diffusion models, iterative editing, VAE, DPP,  ...) ===
 
;Mar 1 (Nikhil Sharma)
:Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall (2023).  [https://openreview.net/forum?id=FjNys5c7VyY DreamFusion: Text-to-3D using 2D Diffusion].  ICLR.
 
;Feb 22 (Yaohan Guan)
:Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, Yuan Cao (2023).  [https://openreview.net/forum?id=WE_vluYUL-X ReAct: Synergizing Reasoning and Acting in Language Models].  ICLR.
:Antonia Creswell, Murray Shanahan, Irina Higgins (2023).  [https://openreview.net/forum?id=3Pf3Wg6o-A4 Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning].  ICLR.
 
;Feb 8, Feb 15 (Leo Du)
:Albert Gu, Karan Goel, Christopher Re (2022). [https://arxiv.org/abs/2111.00396 Efficiently Modeling Long Sequences with Structured State Spaces]. [https://arxiv.org/abs/2111.00396 ICLR].
:: ''This one is a repeat.  See also Sasha Rush and Sidd Karamcheti (2022), [https://srush.github.io/annotated-s4/ The Annotated S4] (blog post).''
 
=== Current Trends ===
 
;Feb 1
:Sameer Singh, [https://twimlai.com/podcast/twimlai/ai-trends-2023-natural-language-proc-chatgpt-gpt-4-and-cutting-edge-research/ AI Trends 2023: Natural Language Proc – ChatGPT, GPT-4 and Cutting Edge Research] (podcast with Sam Charrington).
::Links to papers there and at [https://twitter.com/sameer_/status/1617722150349328385].
 
== Fall 2022 ==
 
;Dec 7
:Qi Liu, Dani Yogatama, Phil Blunsom (2022). [https://aclanthology.org/2022.tacl-1.32/ Relational Memory-Augmented Language Models].  TACL.
:Dani Yogatama, Cyprien de Masson d’Autume, Lingpeng Kong (2021).  [https://aclanthology.org/2021.tacl-1.22/ Adaptive Semiparametric Language Models].  TACL.
 
;Nov 30 (Matthew Francis-Landau)
:Clark Barrett, Roberto Sebastiani, Sanjit A. Seshia and Cesare Tinelli (2008). [https://people.eecs.berkeley.edu/~sseshia/pubdir/SMT-BookChapter.pdf Satisfiability Modulo Theories].  Chapter 12 from Biere et al. (eds.), ''Handbook of Satisfiability''.
 
;Nov 16 (Guest&#58; Jennifer White)
:Jennifer White and Ryan Cotterell (2022). [https://arxiv.org/abs/2209.10926 Equivariant Transduction through Invariant Alignment]. COLING.
 
;Nov 9
:Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Ré (2020).  [https://arxiv.org/abs/2008.07669 HiPPO: Recurrent Memory with Optimal Polynomial Projections].  arXiv.
 
;Oct 26 (Leo Du)
:Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison (2021). [https://proceedings.mlr.press/v139/grathwohl21a.html Oops I Took A Gradient: Scalable Sampling for Discrete Distributions]. ICML.
 
;Oct 19 (Sophia Sklaviadis)
:Ongoing work on RNNG and VAE.
 
;Oct 12 (Brian Lu)
:Pengcheng Yin, Chunting Zhou, Junxian He, and Graham Neubig (2018).  [https://aclanthology.org/P18-1070/ StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing].  ACL.
 
;Oct 5 (Suzanna Sia)
:Review of VAE variants.
 
;Sep 28
:Timo Schick et al. (2022).  [https://arxiv.org/abs/2208.11663 PEER: A collaborative language model].  arXiv.
 
;Sep 21 (Lisa Li)
:Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto (2022).  [https://arxiv.org/abs/2205.14217 Diffusion-LM Improves Controllable Text Generation].  arXiv.
 
;Sep 14
:VAE discussion.
 
;Sep 7
:Calvin Luo (2022).  [https://arxiv.org/abs/2208.11970 Understanding Diffusion Models: A Unified Perspective].  arXiv.
 
== Summer 2022 ==
 
;Aug 24 (Leo Du)
:Guy Emerson (2020). [https://aclanthology.org/2020.acl-main.367/ Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional Semantics]. ACL.
::See also Guy Emerson's [https://www.cl.cam.ac.uk/~gete2/thesis.pdf thesis].
 
;Jul 27, Aug 3 (Brian Lu)
:Abulhair Saparov and Tom Mitchell (2022). [https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00463/110435/Towards-General-Natural-Language-Understanding Towards General Natural Language Understanding with Probabilistic Worldbuilding]. TACL.
 
;Jun 15 (Leo Du)
:Goldblum, Geiping, et al. (2020). [https://openreview.net/forum?id=HyxyIgHFvr Truth or backpropaganda? An empirical investigation of deep learning theory]. ICLR.
 
== Spring 2022 ==
 
;May 4 (Felix Yu)
:Xiujun Li et al. (2020). [https://arxiv.org/pdf/2004.06165.pdf Oscar: Object-semantics aligned pre-training for vision-language tasks]. ECCV.
:Luowei Zhou et al. (2020). [https://arxiv.org/pdf/1909.11059.pdf Unified vision-language pre-training for image captioning and VQA]. AAAI.
:Hu, Xiaowei, et al. (2020). [https://arxiv.org/pdf/2009.13682.pdf VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning]. arXiv.
 
;Apr 20 (Tim Vieira)
 
:Esparza, Kiefer, and Luttenberger (2007). [https://link.springer.com/content/pdf/10.1007/978-3-540-73208-2_17.pdf An Extension of Newton’s Method to ω-Continuous Semirings]. In Proceedings of the International Conference on Developments in Language Theory.
 
;Apr 13 (Leo Du)
:Andrew M. Saxe; James L. McClelland; Surya Ganguli (2014). [https://arxiv.org/abs/1312.6120 Exact solutions to the nonlinear dynamics of learning in deep linear neural networks]. ICLR.
:Andrew M. Saxe; James L. McClelland; Surya Ganguli (2019). [https://www.pnas.org/doi/10.1073/pnas.1820226116 A mathematical theory of semantic development in deep neural networks]. PNAS.
:Anthropic (2022). [https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html In-context Learning and Induction Heads].
 
;Apr 6 (Suzanna Sia)
:Anthropic (2021). [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits].
 
;Mar 30 (Jason Eisner)
:Liang Huang, Suphan Fayong, & Yang Guo (2012). [https://aclanthology.org/N12-1015.pdf Structured Perceptron with Inexact Search].  NAACL.  [https://web.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NYCNLP.pdf slides]
 
;Mar 16 (Brian Lu)
:Max Welling, Yee Whye Teh (2011). [https://www.stats.ox.ac.uk/~teh/research/compstats/WelTeh2011a.pdf Bayesian Learning via Stochastic Gradient Langevin Dynamics].  ICML.
 
;Mar 9 (Cihan Xiao)
:Mathias Niepert, Pasquale Minervini, and Luca Franceschi (2021). [https://arxiv.org/pdf/2106.01798.pdf Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions].  NeurIPS.
::Relevant background: Papandreou and Yuille (2011).  [https://home.ttic.edu/~gpapan/pubs/confr/PapandreouYuille_PerturbAndMap_ieee-c-iccv11.pdf Perturb-and-MAP Random Fields: Using Discrete Optimization to Learn and Sample from Energy Models].
 
;Mar 2 (Steven Tan)
:Yang Song (2021).  [https://yang-song.github.io/blog/2021/score/ Generative Modeling by Estimating Gradients of the Data Distribution].  Blog post.
 
;Feb 23 (Sabrina Mielke)
:Albert Gu, Karan Goel, Christopher Re (2022). [https://arxiv.org/abs/2111.00396 Efficiently Modeling Long Sequences with Structured State Spaces]. [https://openreview.net/forum?id=uYLFoz1vlAC ICLR].
 
;Feb 16 (Ryan Cotterell)
:David Chiang, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, Bevan Jones, Kevin Knight (2013). [https://aclanthology.org/P13-1091/ Parsing Graphs with Hyperedge Replacement Grammars]. ACL.
 
;Feb 9 (Matthew Francis-Landau)
:Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, Cynthia Rudin (2018). [https://www.jmlr.org/papers/volume18/17-716/17-716.pdf Learning Certifiably Optimal Rule Lists for Categorical Data]. JMLR.
 
;Feb 2 (Sophia Sklaviadis)
:Yizhou Zhao, Liang Qiu, Wensi Ai, Feng Shi, Song-Chun Zhu (2020). [https://arxiv.org/pdf/2011.09078.pdf Vertical-Horizontal Structured Attention for Generating Music with Chords].  arXiv.
 
== Fall 2021 ==
 
 
''Wednesdays 12pm, in Hackerman 306.''
;Dec 1 (Felix Yu)
:Chandan Singh, W. James Murdoch, Bin Yu (2019). [https://arxiv.org/pdf/1806.05337.pdf Hierarchical Interpretations for Neural Network Predictions]. ICLR.
:Xisen Jin, Zhongyu Wei, Junyi Du, Xiangyang Xue, Xiang Ren (2020). [https://arxiv.org/pdf/1911.06194.pdf Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models]. ICLR.
 
;Nov 17 (Suzanna Sia)
:Henri Prade and Gilles Richard (2021). [https://www.ijcai.org/proceedings/2021/0621.pdf Analogical Proportions: Why They Are Useful in AI] IJCAI. [https://suzyahyah.github.io/machine%20learning/2021/11/18/analogies.html Suz summary blogpost]
 
;Nov 10 (Leo Du)
:Jingjing Xu, Hao Zhou, Chun Gan, Zaixiang Zheng, Lei Li (2021). [https://aclanthology.org/2021.acl-long.571/ Vocabulary Learning via Optimal Transport for Neural Machine Translation]. ACL.
 
;Nov 3 (Matthew Francis-Landau)
:Song Han, Huizi Mao, William J. Dally. (2016) [https://arxiv.org/abs/1510.00149 Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding]. ICLR.
 
;Oct 27 (Sophia Sklaviadis)
:Jiong Cai, Yong Jiang, Kewei Tu (2017). [https://aclanthology.org/D17-1171.pdf CRF Autoencoder for Unsupervised Dependency Parsing]. ACL.
:Waleed Ammar, Chris Dyer, Noah A. Smith (2014). [https://proceedings.neurips.cc/paper/2014/file/b9f94c77652c9a76fc8a442748cd54bd-Paper.pdf Conditional Random Field Autoencoders for Unsupervised Structured Prediction]. NeurIPS.
:Yu Zhang, Zhenghua Li, Min Zhang (2020). [https://aclanthology.org/2020.acl-main.302.pdf Efficient Second-Order TreeCRF for Neural Dependency Parsing]. ACL.
 
;Oct 20 (Chenghao Yang)
:Weizhe Yuan, Graham Neubig, Pengfei Liu (2021). [https://arxiv.org/pdf/2106.11520.pdf BARTScore: Evaluating Generated Text as Text Generation]. NeurIPS.
:Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi (2020). [https://arxiv.org/pdf/1904.09675.pdf BERTScore: Evaluating Text Generation with BERT]. ICLR.
:Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, Steffen Eger (2019). [https://arxiv.org/pdf/1909.02622.pdf MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance]. EMNLP.
 
;Oct 13 (Brian Lu)
:Patrice Y. Simard, Saleema Amershi, David M. Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, John Wernsing (2017). [https://arxiv.org/abs/1707.06742 Machine Teaching: A New Paradigm for Building Machine Learning Systems]. Arxiv.
:Gonzalo Ramos, Christopher Meek, Patrice Simard, Jina Suh, Soroush Ghorashi (2020). [https://www.tandfonline.com/doi/abs/10.1080/07370024.2020.1734931 Interactive machine teaching: a human-centered approach to building machine-learned models]. Human-Computer Interaction.
 
;Oct 6 (Steven Tan)
:Jiatao Gu, Changhan Wang, Jake Zhao (2019). [https://arxiv.org/pdf/1905.11006.pdf Levenshtein Transformer]. NeurIPS.
 
;Sep 29 (Jason Eisner)
: Gabriel Peyre (2019). [https://youtu.be/mITml5ZpqM8 Optimal transport for machine learning] (talk video). The Alan Turing Institute.
 
;Sep 22 (Devanshu Singh)
:Mehrad Moradshahi, Hamid Palangi, Monica S. Lam, Paul Smolensky, Jianfeng Gao (2019). [https://arxiv.org/abs/1910.12647 HUBERT Untangles BERT to Improve Transfer across NLP Tasks]. Arxiv
:Yichen Jiang, Asli Celikyilmaz, Paul Smolensky, Paul Soulos, Sudha Rao, Hamid Palangi, Roland Fernandez, Caitlin Smith, Mohit Bansal, Jianfeng Gao (2021). [https://arxiv.org/abs/2106.01317 Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization]. NAACL.
:Denis Kleyko, Mike Davies, E. Paxon Frady, Pentti Kanerva, Spencer J. Kent, Bruno A. Olshausen, Evgeny Osipov, Jan M. Rabaey, Dmitri A. Rachkovskij, Abbas Rahimi, Friedrich T. Sommer (2021). [https://arxiv.org/abs/2106.05268 Vector Symbolic Architectures as a Computing Framework for Nanoscale Hardware]. Arxiv.
:Paul Soulos, Tom McCoy, Tal Linzen, Paul Smolensky (2019). [https://arxiv.org/abs/1910.09113 Discovering the Compositional Structure of Vector Representations with Role Learning Networks]. BlackboxNLP.
:Hamid Palangi, Paul Smolensky, Xiaodong He, Li Deng (2018). [https://arxiv.org/abs/1705.08432 Question-Answering with Grammatically-Interpretable Representations]. AAAI.
 
;Sep 15 (Hongyuan Mei)
:Vardan Papyan, X.Y. Han, David L. Donoho (2020). [https://arxiv.org/pdf/2008.08186.pdf Prevalence of Neural Collapse during the terminal phase of deep learning training]. PNAS.
 
;Sep 8 (Brian Lu)
:Jonathan Lorraine, Paul Vicol, David Duvenaud (2020). [https://arxiv.org/abs/1911.02590 Optimizing Millions of Hyperparameters by Implicit Differentiation]. AISTATS. ([http://proceedings.mlr.press/v108/lorraine20a/lorraine20a.pdf mlr link]/[https://github.com/lorraine2/implicit-hyper-opt implementation])
 
== Summer 2021 ==
 
;Aug 25 (Sabrina Mielke)
:Max B Paulus, Chris J. Maddison, Andreas Krause (2021). [https://arxiv.org/abs/2010.04838 Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator]. ICLR. ([https://openreview.net/forum?id=Mk6PZtgAgfq forum]/[https://iclr.cc/virtual/2021/oral/3508 slides])
 
;Aug 18 (Matthew Francis-Landau)
:Anselm Paulus, Michal Rolínek, Vít Musil, Brandon Amos, Georg Martius (2021). [https://arxiv.org/abs/2105.02343 CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints]. ICML.
 
;Aug 4 (Tim Vieira)
:Jialin Song, Yuxin Chen, Yisong Yue (2019). [https://arxiv.org/abs/1811.00755 A General Framework for Multi-fidelity Bayesian Optimization with Gaussian Processes]. AISTATS.
 
;July 28 (Chu-Cheng Lin)
:Belinda Z. Li, Maxwell Nye, Jacob Andreas (2021). [https://arxiv.org/abs/2106.00737 Implicit Representations of Meaning in Neural Language Models.] ACL.
 
;July 21 (Leo Du)
:''Discussion of graph signal processing (graph fourier transform, graph convolutions etc.).''
 
;July 14 (Chenghao Yang)
:Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, Stefanie Jegelka (2021). [https://arxiv.org/abs/2010.04592 Contrastive Learning with Hard Negative Samples.] ICLR.
::Recommended readings: [https://lilianweng.github.io/lil-log/2021/05/31/contrastive-representation-learning.html Lilian Weng's blogposts about Contrastive Representation Learning], [https://arxiv.org/abs/2002.05709 SimCLR](ICML'20), [https://arxiv.org/abs/1902.09229 A Theoretical Analysis of Contrastive Unsupervised Representation Learning](ICML'19), and [https://cseweb.ucsd.edu/~elkan/posonly.pdf PU-learning](KDD'08).
 
;July 7 (Matthew Francis-Landau)
:Qiaochu Chen, Aaron Lamoreaux, Xinyu Wang, Greg Durrett, Osbert Bastani, Isil Dillig (2021) [https://dl.acm.org/doi/10.1145/3453483.3454047 Web question answering with neurosymbolic program synthesis] PLDI
 
;Jun 23 (Sabrina Mielke)
:Explanation of BPE ([https://www.derczynski.com/papers/archive/BPE_Gage.pdf Gage, 1994]; [https://www.aclweb.org/anthology/P16-1162/ Sennrich et al., 2016]) and Unigram LM [https://www.aclweb.org/anthology/P18-1007/ (Kudo, 2018)] subword tokenizers, leading into modeling of the unigram (word) distribution as a two-stage process a la [https://dl.acm.org/doi/10.5555/2976248.2976306 Goldwater et al. (2006)], neuralized by [https://arxiv.org/abs/2106.02289 Nikkarinen et al. (2021)].
 
;Jun 16 (Suzanna Sia)
:Kawin Ethayarajh, Dan Jurafsky (2021). [https://arxiv.org/abs/2105.14652 Attention flows are Shapley value explanations.] ACL.
 
;Jun 2 (Brian Lu)
:Muhammad Khalifa, Hady Elsahar, Marc Dymetman (2021).  [https://openreview.net/pdf?id=jWkw45-9AbL A distributional approach to controlled text generation].  ICLR.
 
;May 26 (Ryan Cotterell)
:''Discussion of group-equivariant architectures.''
 
;May 19 (Hongyuan Mei)
:Yuval Atzmon, Felix Kreuk, Uri Shalit, Gal Chechik (2020).  [https://arxiv.org/abs/2006.14610 A causal view of compositional zero-shot recognition].  NeurIPS.
 
== Spring 2021 ==
 
;May 5 (Tim Vieira)
:Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, Thomas Neumann (2017). [http://cidrdb.org/cidr2017/papers/p9-leis-cidr17.pdf Cardinality Estimation Done Right: Index-Based Join Sampling]. Conference on Innovative Data Systems Research.
 
;Apr 28 (Matthew Francis-Landau)
:Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, Swarat Chaudhuri (2020). [https://arxiv.org/abs/2007.12101 Learning differentiable programs with admissible neural heuristics]. NeurIPS.
 
;Apr 21 (Nathaniel Weir)
:Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, Joshua B. Tenenbaum (2020).  [https://arxiv.org/abs/2006.08381 DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning].  arXiv.  [https://web.mit.edu/ellisk/www/documents/dreamcoder_with_supplement.pdf supplement]
 
;Apr 14 (Xiang Lisa Li)
:Yang Song, Stefano Ermon (2019). [https://arxiv.org/pdf/1907.05600.pdf Generative Modeling by Estimating Gradients of the Data Distribution]. NeurIPS.
 
;Apr 7 (Chenghao Yang)
:Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, and Lingpeng Kong (2021).  [https://openreview.net/pdf?id=QtTKTdVrFBB Random feature attention].  ICLR.
 
;Mar 31 (Suzanna Sia)
:Siddhant M. Jayakumar et al.  [https://openreview.net/pdf?id=rylnK6VtDH Multiplicative interactions and where to find them].  ICLR.
 
;Mar 24 (Ryan Cotterell)
:K. Vijay-Shanker and David J. Weir (1989).  [https://www.aclweb.org/anthology/W89-0218.pdf  Recognition of combinatory categorial grammars and linear indexed grammars].  ACL.
 
;Mar 17 (Ryan Cotterell)
:Dasgupta, Papadimitriou, & Vazirani (2006).  Quantum algorithms.  Chapter 10 of ''[https://github.com/eherbold/berkeleytextbooks/blob/master/Algorithms%20-%20Sanjoy%20Dasgupta%2C%20Christos%20H.%20Papadimitriou%2C%20and%20Umesh%20V.%20Vazirani.pdf Algorithms]''.  McGraw Hill.
::''See also [https://www.math3ma.com/blog/a-first-look-at-quantum-probability-part-1 blog posts by Tai-Danae Bradley].''
 
;Mar 10 (Sabrina Mielke)
:MCMC pt. 3 (RJMCMC).
:Philippe Gagnon & Arnaud Doucet (2020). [https://arxiv.org/abs/1911.01340 Non-reversible jump algorithms for Bayesian nested model selection].  J. of Computational and Graphical Statistics.  [http://cs.jhu.edu/~jason/865/nrj_slides.pdf slides]
 
;Mar 3 (Sabrina Mielke)
:MCMC pt. 2 (Irreversible chains, continuous transforms and Jacobians).
 
;Feb 24 (Sabrina Mielke)
:MCMC pt. 1 (Markov chains, balance condition).
:Span Spanbauer, Cameron Freer, Vikash Mansinghka (2020). [https://arxiv.org/abs/2006.15167 Deep involutive generative models for neural MCMC]. arXiv.
:Marco Cusumano-Towner, Alexander K. Lew, Vikash K. Mansinghka (2020). [https://arxiv.org/abs/2007.09871 Automating involutive MCMC using probabilistic and differentiable programming]. arXiv.
:Kirill Neklyudov, Max Welling, Evgenii Egorov, Dmitry Vetrov (2020).  [https://arxiv.org/abs/2006.16653 Involutive MCMC: A unifying framework]. arXiv.
 
;Feb 17 (Devanshu Singh)
:Bellanger and McCallum (2016). [https://arxiv.org/abs/1511.06350 Structured prediction energy networks]. ICML.
 
;Feb 10 (Brian Lu)
:Sachan and Xing (2017). [https://www.aclweb.org/anthology/S17-1029/ Learning to solve geometry problems from natural language demonstrations in textbooks]. *SEM.
 
;Feb 3 (Chu-Cheng Lin & Matthew Francis-Landau)
:Wang et al. (2019). [https://arxiv.org/abs/1905.12149 SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver] ICML
 
;Jan 27 (Tim Vieira)
:Anna Harutyunyan et al. (2019). [https://arxiv.org/abs/1912.02503 Hindsight credit assignment]. NeurIPS.
 
;Jan 20 (Ryan Cotterell)
:Andrew Drozdov, Pat Verga, Mohit Yadav, Mohit Iyyer, Andrew McCallum (2019). [https://arxiv.org/abs/1904.02142 Unsupervised latent tree induction with deep inside-outside recursive autoencoders]. NAACL.
 
;Jan 13 (Suzanna Sia)
:Simon Du, Wei Hu (2019). [https://blog.ml.cmu.edu/2019/10/03/ultra-wide-deep-nets-and-the-neural-tangent-kernel-ntk/ Ultra-Wide deep nets and the neural tangent kernel (NTK)].  Blog post summarizing multiple papers.
 
;Jan 6 (Jason Eisner)
:Nando de Freitas, Pedro Højen-Sørensen, Michael I. Jordan, Stuart Russell (2001).  [https://arxiv.org/pdf/1301.2266.pdf Variational MCMC].  UAI.
:Ardavan Saeedi, Tejas D. Kulkarni, Vikash K. Mansinghka, Samuel J. Gershman (2017).  [https://www.jmlr.org/papers/volume18/15-615/15-615.pdf Variational particle approximations].  JMLR.
:Christian A. Naesseth, Scott W. Linderman, Rajesh Ranganath, David M. Blei (2018).  [https://arxiv.org/pdf/1705.11140.pdf Variational sequential Monte Carlo].  AISTATS.
 
== Fall 2020 ==
 
=== Understanding Neural Magic ===
 
;Dec 10 (Xiao Liu)
:Elena Voita, Ivan Titov (2020). [https://www.aclweb.org/anthology/2020.emnlp-main.14/ Information-Theoretic Probing with Minimum Description Length]. ACL.
 
;Dec 3 (Aaron Mueller)
:Jesse Vig*, Sebastian Gehrmann*, Yonatan Belinkov*, Sharon Qian, Daniel Nevo, Simas Sakenis, Jason Huang, Yaron Singer, Stuart Shieber (2020). [https://arxiv.org/pdf/2004.12265.pdf Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias]. arXiv.
 
=== Multimodal Fusion ===
 
;Nov 12 (Suzanna Sia)
:Weiyao Wang, Du Tran, and Matt Feiszli (2020).  [https://arxiv.org/pdf/1905.12681.pdf What Makes Training Multi-modal Classification Networks Hard?]  CVPR.
 
=== Music Modeling ===
 
;Oct 29 (Amrit Nidhi)
:Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck (2018). [https://arxiv.org/pdf/1809.04281.pdf Music Transformer : Generating Music with Long Term Structure].  arXiv.  [https://storage.googleapis.com/music-transformer/index.html music samples], [https://magenta.tensorflow.org/music-transformer interactive demo]
 
=== Neuro-Symbolic Hybrids ===
 
;Oct 15 (Ankur Kejriwal)
:Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, and Jiajun Wu (2019). [http://nscl.csail.mit.edu/data/papers/2019ICLR-NSCL.pdf The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, And Sentences From Natural Supervision]. ICLR.
 
;Oct 8 (Tim Vieira)
:Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, Josh Tenenbaum (2018). [https://papers.nips.cc/paper/7845-learning-to-infer-graphics-programs-from-hand-drawn-images Learning to Infer Graphics Programs from Hand-Drawn Images]. NeurIPS.
 
=== Neural Nearest Neighbor Methods ===
''Organizer: Brian Lu''
 
;Oct 1 (Anton Belyy)
:Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel and Douwe Kiela (2020). [https://arxiv.org/abs/2005.11401 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks]. arXiv.
 
;Sep 24 (Matthew Francis-Landau)
:Yu. A. Malkov and D. A. Yashunin (2018).  [https://arxiv.org/abs/1603.09320 Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World (HNSW) graphs].  arXiv.
 
;Sep 17 (Brian Lu)
:Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer and Mike Lewis (2020). [https://openreview.net/pdf?id=HklBjCEKvH Generalization through Memorization: Nearest Neighbor Language Models]. ICLR.
 
=== Parsing as Tagging ===
;Sep 10 (Jason Eisner)
:Nikita Kitaev and Dan Klein (2020). [https://www.aclweb.org/anthology/2020.acl-main.557/ Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference]. ACL.
 
== Summer 2020 ==
 
=== Special Session on GPT-3 ===
 
;Jun 10 (Aaron Mueller, João Sedoc, Patrick Xia)
: OpenAI (2020). [https://arxiv.org/abs/2005.14165 Language Models are Few-Shot Learners]. ArXiv.
 
== Spring 2020 ==
 
=== More Efficient Transformers ===
''Organizer: Patrick Xia''
 
;May 13 (Rachel Wicks)
: Beltagy, Peters, Cohan (2020). [https://arxiv.org/abs/2004.05150 Longformer: The Long-Document Transformer]. Arxiv
 
;Apr 28 (Arya McCarthy) (practice talk)
: McCarthy, Li, Gu, Dong (2020). [https://www.aclweb.org/anthology/2020.acl-main.753/ Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation]. ACL
 
;Apr 22 (Patrick Xia)
: Kitaev, Kaiser, Levskaya (2020). [https://openreview.net/forum?id=rkgNKkHtvB Reformer: The Efficient Transformer]. ICLR ([https://docs.google.com/presentation/d/1nHQwhTGQaqwLd3V9CPbIj1ikrLC3xMqReJgPPjIeQHE/edit?usp=sharing Slides from this session])
 
=== Controlled Text Generation ===
''Organizer: Mitchell Gordon''
 
;Apr 15 (Zili Huang)
: Holtzman, Buys, Du, Forbes, Choi (2020). [https://openreview.net/forum?id=rygGQyrFvH The Curious Case of Neural Text Degeneration]. ICLR
: Welleck, Kulikov, Roller, Dinan, Cho, Weston (2020). [https://openreview.net/forum?id=SJeYe0NtvH Neural Text Generation With Unlikelihood Training] ICLR
 
;Apr 8 (Nathaniel Weir)
: Shu, Nakayama, Cho (2019). [https://www.aclweb.org/anthology/P19-1177.pdf Generating Diverse Translations with Sentence Codes]. ACL
 
;Apr 1 (Mitchell Gordon)
: Dathathri, Madotto, Lan, Hung, Frank, Molino, Yosinski, Liu (2020). [https://openreview.net/forum?id=H1edEyBKDS Plug and Play Language Models: A Simple Approach to Controlled Text Generation]. ICLR
 
=== Human-In-The-Loop / Active Learning ===
''Organizer: Anton Belyy''
 
;Mar 25 (Joshua Miller)
:Hu, Lipton, Anandkumar, Ramanan (2019). [https://arxiv.org/pdf/1802.07427.pdf Active Learning with Partial Feedback]. ICLR
 
;Mar 11 (Anton Belyy)
:Yuan, Zhang, Van Durme, Findlater, Boyd-Graber (2019). [https://arxiv.org/pdf/1911.03070.pdf Interactive Refinement of Cross-Lingual Word Embeddings]. Arxiv
 
;Mar 4 (Anton Belyy)
:Ribeiro, Singh, Guestrin (2018). [https://www.aclweb.org/anthology/P18-1079.pdf Semantically Equivalent Adversarial Rules for Debugging NLP Models]. ACL
 
=== Hyperbolic Deep Learning ===
''Organizer: Desh Raj''
 
;Feb 26 (Arya McCarthy)
:Xu, Durrett (2018). [https://www.aclweb.org/anthology/D18-1480 Spherical Latent Spaces for Stable Variational Autoencoders]. EMNLP
 
;Feb 12 (Suzanna Sia)
:Meng, Huang, Wang, Zhang, Zhuang, Kaplan, Han (2019). [https://arxiv.org/pdf/1911.01196.pdf Spherical Text Embeddings]. NeurIPS
 
;Feb 5 (Desh Raj)
:Nickel, Kiela (2017). [https://arxiv.org/pdf/1911.03070.pdf Poincaré Embeddings for Learning Hierarchical Representations]. NeurIPS
 
== Fall 2019 ==
 
=== Current Happenings ===
''Organizer: Various''
 
;Dec 4 (Various)
:ACL Paper Reading and Feedback.
 
; Nov 20 (João Sedoc)
:Raffel et al. (2019). [https://arxiv.org/abs/1910.10683 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer]. ''ArXiV''.
:Bowman (2019). [https://syncedreview.com/2019/11/07/google-t5-explores-the-limits-of-transfer-learning/ Google T5 Explores the Limits of Transfer Learning]. ''Synced Review (Blog Post)''.
 
;Nov 13 (Various)
:EMNLP Favorites presented by those who were able to attend the conference in-person.
 
=== Model Fairness and Interpretability ===
''Organizer: Keith Harrigian''
 
;Nov 6 (Rachel Wicks)
:Gonen & Goldberg (2019). [https://www.aclweb.org/anthology/N19-1061/ Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them]. ''NAACL''.
 
;Oct 30 (Alexandra DeLucia)
:Ribeiro, Singh, & Guestrin (2016). [https://dl.acm.org/doi/pdf/10.1145/2939672.2939778? "Why Should I Trust You?" Explaining the Predictions of Any Classifier]. ''KDD''.
:Lundberg & Lee (2017). [http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions A Unified Approach to Interpreting Model Predictions]. ''NeurIPS''.
 
;Oct 23 (Keith Harrigian)
:Lipton (2016). [https://dl.acm.org/doi/pdf/10.1145/3236386.3241340 The Mythos of Model Interpretability]. ''ACM''.
:Serrano & Smith (2019). [https://www.aclweb.org/anthology/P19-1282/ Is Attention Interpretable]. ''ACL''.
:Jain & Wallace (2019). [https://www.aclweb.org/anthology/N19-1357/ Attention is not Explanation]. ''NAACL''.
:Wiegreffe & Pinter (2019). [https://www.aclweb.org/anthology/D19-1002/ Attention is not not Explanation]. ''EMNLP''.
 
;Oct 16 (Suzanna Sia)
:Shen et al. (2019). [https://openreview.net/pdf?id=B1l6qiR5F7 Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks]. ''ICLR''.
:Dyer, Melis, & Blunsom (2019). [https://arxiv.org/pdf/1909.09428.pdf A Critical Analysis of Biased Parsers in Unsupervised Parsing]. ''ArXiV''.
 
=== Spectral Learning ===
''Organizer: Sabrina J. Mielke''
 
;Sep 25 (Desh Raj)
:Guillaume Rabusseau, Tianyu Li, Doina Precup (2019). [https://arxiv.org/abs/1807.01406Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning].  AISTATS.
:Comments: https://docs.google.com/document/d/1CgE65xFCDty3gZepKxBjOAV8Dd7dS10I2okw2jDb-EE/edit?usp=sharing
 
;Sep 18 (Sabrina J. Mielke)
:Borja Balle, Ariadna Quattoni, and Xavier Carreras (2014).  [https://www.cs.upc.edu/~bballe/slides/tutorial-emnlp14.pdf Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars (tutorial slides)], sections 1-2.  EMNLP.
:The main corresponding paper:
::Balle, Carreras, Luque, Quattoni (2014), [https://link.springer.com/content/pdf/10.1007%2Fs10994-013-5416-x.pdf Spectral learning of weighted automata: A forward-backward perspective]. That part of the tutorial will recap FSAs, introduce Hankel matrices and motivate the correspondence to FSAs and give a general estimation recipe.
:If time permits, we can move on to a recent extension of this work that makes stuff work well:
::Quattoni and Carreras (2019), [https://www.aclweb.org/anthology/P19-1594 Interpolated Spectral N-Gram Language Models].  ACL.
:Comments: https://docs.google.com/document/d/1-MSFlhhNyLfkK-I01VQI05f4F5gJyruHDNU3KBRXwqU
 
== Spring 2019 ==
 
''This bit of the wiki got lost in a disk crash, but we should reconstruct the paper list from the emails at the time.''
 
=== Causal NLP ===
''Organizers: Suzanna Sia and Zach Wood-Doughty''
 
=== Dataset Shift for NLP / Changepoint Detection ===
''Organizer: Desh Raj''
 
=== Grounded Language ===
''Organizer: Mitchell Gordon''
 
=== What's Learned During Representation Learning? ===
''Organizer: Shijie Wu''
 
=== Multitask/Transfer Learning for NLP ===
''Organizers: Fei Wu and Oliver Adams''
 
== Fall 2018 ==
 
=== Random Interesting Papers ===
 
;Dec 5 (David Mueller)
:Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum (2018).  [https://www.aclweb.org/anthology/D18-1548/ Linguistically-Informed Self-Attention for Semantic Role Labeling]. EMNLP.
 
;Nov 28 (Hao Zhu and Chu-Cheng Lin)
:Hao Peng, Roy Schwartz, Sam Thomson, and Noah A. Smith (2018).  [https://arxiv.org/abs/1808.09357 Rational Recurrences].  EMNLP.
 
;Nov 14 (Arya McCarthy)
:Olivia Winn and Smaranda Muresan (2018). [https://www.aclweb.org/anthology/P18-2125/ ‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions]. ACL 2018
:Noga Zaslavsky, Charles Kemp, Terry Regier, Naftali Tishby (2018). [https://www.pnas.org/content/pnas/115/31/7937.full.pdf Efficient compression in color naming and its evolution]. PNAS 2018
 
;Nov 7
:EMNLP debrief
 
=== ML Scholarship ===
 
''Organizer: Patrick Xia''
 
;Oct 31 (Xuan Zhang)
: Yoav Goldberg (2017). [https://medium.com/@yoav.goldberg/an-adversarial-review-of-adversarial-generation-of-natural-language-409ac3378bd7 An Adversarial Review of "Adversarial Generation of Natural Language"] Medium blog post. [https://www.facebook.com/yann.lecun/posts/10154498539442143 Yann Lecun's response (on Facebook)] and [https://medium.com/@yoav.goldberg/a-response-to-yann-lecuns-response-245125295c02 Yoav's response to that (on Medium)]. Original papers referenced - [https://arxiv.org/pdf/1705.10929.pdf Adversarial Generation of Natural Language (Rejeswar et al., 2017, arXiv)] and [https://arxiv.org/pdf/1703.00955.pdf Toward Controlled Generation of Text (Hu et al., 2017, ICML)]
: Joshua Goodman (2002). [Extended Comment on Language Trees and Zipping https://arxiv.org/pdf/cond-mat/0202383.pdf]. Extended version of Comment submitted to Physical Review Letters. [https://arxiv.org/pdf/cond-mat/0203275.pdf On J. Goodman's comment to Language Trees and Zipping (the response)].
 
;Oct 24 (Patrick Xia)
: Zachary Lipton and Jacob Steinhardt (2018). [https://arxiv.org/pdf/1807.03341.pdf Troubling Trends in Machine Learning Scholarship]. ICML Debates 2018.
: D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young (2015). [http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Hidden Technical Debt in Machine Learning Systems]. NeurIPS 2015
 
=== Deep Generative Modeling ===
 
''Organizer: Sabrina J. Mielke''
 
;Oct 17 (Kelly Marchisio)
:"Generating Sentences from a Continuous Space," https://arxiv.org/abs/1511.06349, Bowman et al. (2016).
 
;Oct 10 (Suzanna Sia)
:"Document Neural Autoregressive Distribution Estimation," http://www.jmlr.org/papers/volume18/16-017/16-017.pdf, Lauly, S., Zheng, Y., Allauzen, A., & Larochelle, H. (2017).
:Comments: https://docs.google.com/document/d/12F8uLt5vEm-Ctou1XrtVctLWSmTePHH1BuE6uztFoNc/edit?usp=sharing
 
;Sep 20 (Sabrina J. Mielke)
:"Neural Autoregressive Distribution Estimation," http://www.jmlr.org/papers/volume17/16-272/16-272.pdf, Uria, B., Côté, M. A., Gregor, K., Murray, I., & Larochelle, H. (2016).
 
=== Test of Time Award Papers ===
 
''Organizer: Arya McCarthy''
 
;Sep 19 (Desh Raj)
: Michael Collins (2002). [https://www.aclweb.org/anthology/W02-1001/ Disciminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms]
 
;Sep 12 (David Mueller)
: Regina Barzilay and Mirella Lapata (2005). [https://www.aclweb.org/anthology/P05-1018/ Modeling Local Coherence: An Entity-Based Approach]. ACL 2005.
 
;Sep 5 (Arya McCarthy)
 
:Dan Roth and Wen-tau Yih (2004). [http://www.aclweb.org/anthology/W/W04/W04-2401.pdf A Linear Programming Formulation for Global Inference in Natural Language Tasks]. CoNLL 2004.
 
== Summer 2018 ==
 
;Aug 23 (Chenxi Liu)
:Adam Santoro, Felix Hill, David Barrett, Ari Morcos, and Timothy Lillicrap (2018). [http://proceedings.mlr.press/v80/santoro18a/santoro18a.pdf Measuring abstract reasoning in neural networks]. ICML 2018.
 
;Aug 16 (Sebastian Mielke)
:André F. T. Martins and Ramón F. Astudillo (2016). [http://proceedings.mlr.press/v48/martins16.pdf From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification]. ICML 2016.
:Vlad Niculae, André F. T. Martins, Mathieu Blondel, and Claire Cardie (2018). [http://proceedings.mlr.press/v80/niculae18a/niculae18a.pdf SparseMAP: Differentiable Sparse Structured Inference]. ICML 2018.
 
;Aug 9 (Jacob Buckman)
:Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom (2018). [https://arxiv.org/abs/1808.00508 Neural Arithmetic Logic Units].  arXiv.
 
;Aug 2 (Garrett Nicolai)
:Daniel Deutsch, John Hewitt and Dan Roth (2018).  [http://aclweb.org/anthology/P18-1180 A Distributional and Orthographic Aggregation Model for English Derivational Morphology].  ACL.  [https://danieldeutsch.github.io/papers/acl2018/deutsch-hewitt-roth-2018.pdf slides]
 
;Jul 26
:ACL debriefing session.
 
;Jul 19 (Chu-Cheng Lin)
:Chu-Cheng Lin and Jason Eisner (2018).  [https://arxiv.org/abs/1804.10747 Neural Particle Smoothing for Sampling from Conditional Sequence Models].  NAACL.  [http://www.cs.jhu.edu/~jason/papers/lin+eisner.naacl18.poster.pdf poster]
 
;Jul 12 (Xuan Zhang)
:Yoshua Bengio, Jerome Louradour, Ronan Collobert, and Jason Weston (2009).  [https://ronan.collobert.com/pub/matos/2009_curriculum_icml.pdf Curriculum Learning].  ICML.  [http://wiki.clsp.jhu.edu/w/images/6/61/Curriculum_learning.pdf slides]
 
;Jul 5 (Pamela Shapiro)
:Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer (2018).  [http://www.aclweb.org/anthology/N18-1170 Adversarial Example Generation with Syntactically Controlled Paraphrase Networks].  NAACL.
 
;Jun 29 (Xuan Zhang)
:Alane Suhr, Srinivasan Iyer, and Yoav Artzi (2018).  [http://yoavartzi.com/pub/sia-naacl.2018.pdf Learning to Map Context-Dependent Sentences to Executable Formal Queries].  NAACL (outstanding paper award).  [http://wiki.clsp.jhu.edu/w/images/0/01/Learning_to_map_cd_sentences_to_ef_queries.slides.pdf slides]
 
;Jun 21 (Arya McCarthy)
:Matthew E. Peters et al. (2018).  [http://aclweb.org/anthology/N18-1202 Deep Contextualized Word Representations].  NAACL (outstanding paper award).
:This is the ELMo paper.
 
;Jun 14 (Sebastian Mielke)
:Chaitanya Malaviya, Matthew R. Gormley, and Graham Neubig (2018).  [https://arxiv.org/abs/1805.04570 Neural Factor Graph Models for Cross-lingual Morphological Tagging].  ACL.
::''Bonus paper:'' Austin Matthews, Graham Neubig, and Chris Dyer (2018).  [http://aclweb.org/anthology/N18-1130 Using Morphological Knowledge in Open-Vocabulary Neural Language Models>.  NAACL.
 
;Jun 8
:NAACL debriefing session.
 
== Spring 2018 ==
 
=== Optimal Transport ===
''Organizer: Matthew Francis-Landau''
 
;May 3 (Patrick Xia)
:Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, Bernhard Schoelkopf (2018).  [https://openreview.net/forum?id=HkL7n1-0b Wasserstein Auto-Encoders].  ICLR.
 
;Apr 26 (Chu-Cheng Lin)
:Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun(2017).  [http://aclweb.org/anthology/D17-1207 Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction]. EMNLP.
 
;Apr 19 (Matthew Francis-Landau)
:Gabriel Peyré and Marco Cuturi (2018).  [https://arxiv.org/abs/1803.00567 Computational Optimal Transport], sections 2-2.3, 6-6.2, 4.2 and 9.1. [http://optimaltransport.github.io/resources resources] [https://www.overleaf.com/read/mphxmvhnprzj slides]
 
=== Inference Networks / Stochastic Inversion ===
''Organizer: Sebastian Mielke''
 
;Apr 12 (Annabelle Carrell)
:Lifu Tu and Kevin Gimpel (2018).  [https://openreview.net/forum?id=H1WgVz-AZ Learning Approximate Inference Networks for Structured Prediction].  ICLR.
 
;Apr 5 (Shijie Wu)
:Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu  (2017)..  [https://arxiv.org/pdf/1711.00937.pdf Neural Discrete Representation Learning].  NIPS.
 
;Mar 28 (Sebastian Mielke)
:Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song (2018).  [https://openreview.net/forum?id=SyqShMZRb Syntax-Directed Variational Autoencoder for Structured Data].  ICLR.
 
=== Cooperative Dialog and Emergence of Language ===
''Organizers: Patrick Xia and Tom McCoy''
 
;Mar 15 (Annabelle Carrell)
:He He, Anusha Balakrishnan, Mihail Eric, and Percy Liang (2017).  [https://arxiv.org/pdf/1704.07130.pdf Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings].  ACL.
 
;Mar 8 (Tom McCoy)
:Florencia Reali, Nick Chater, and Morten H. Christiansen (2018). [http://rspb.royalsocietypublishing.org/content/royprsb/285/1871/20172586.full.pdf Simpler grammar, larger vocabulary: How population size affects language]. Proceedings of the Royal Society B.
:Simon Kirby, Hannah Cornish, and Kenny Smith (2008).  [http://www.pnas.org/content/pnas/105/31/10681.full.pdf Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language].  PNAS.
 
;Mar 1 (Patrick Xia)
:Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni (2017).  [https://arxiv.org/abs/1612.07182 Multi-Agent Cooperation and the Emergence of (Natural) Language]. 2017. ICLR.
:Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra (2017).  [http://aclweb.org/anthology/D17-1321 Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog]. EMNLP.
 
=== Computational Historical Linguistics ===
''Organizer: Arya McCarthy''
 
;Feb 22 (Tom McCoy)
:William A. Hamilton, Jure Leskovec, and Dan Jurafsky (2016).  [https://arxiv.org/pdf/1605.09096.pdf Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change].  ACL.
 
;Feb 15 (Arya McCarthy)
:David Hall and Dan Klein (2010).  [https://www.aclweb.org/anthology/P10-1105 Finding Cognate Groups using Phylogenies].  ACL.
 
;Feb 8 (Arya McCarthy)
:Lyle Campbell (2013).  [https://books.google.com/books/about/Historical_Linguistics.html?id=6pskDQAAQBAJ&amp;printsec=frontcover Historical Linguistics: An Introduction], chapter 5.  (See also chapter 1.)
 
== Fall 2017 ==
 
=== Inducing "Syntax" for Semantics ===
''Organizer: Adam Poliak''
 
;Dec 14
:NIPS debriefing session
 
;Nov 30 (Chu-Cheng Lin)
:Franklin Chang, Gary S. Dell, and Kathryn Bock (2006).  [http://www.kecl.ntt.co.jp/clip/member/chang/papers/chang,dell,bock.pdf Becoming syntactic].  Psychological Review. [https://sites.google.com/site/sentenceproductionmodel/cv/chang%2Cfitz%2C2014%28OHLP%29.pdf?attredirects=0&amp;d=1 followup]
 
;Nov 16 (Adam Poliak)
:Gormley, Mitchell, Van Durme, Dredze (2014). [http://www.cs.cmu.edu/~mgormley/papers/gormley+al.acl.2014.pdf Low-resource semantic role labeling]. ACL.
:Williams, Drozdov, Bowman (2018) [https://arxiv.org/abs/1709.01121 Learning to parse from a semantic objective: It works. Is it syntax?]. TACL.
 
;Other suggested papers
:Swabha Swayamdipta, Sam Thomson, Chris Dyer, and Noah A. Smith (2017). [https://arxiv.org/abs/1706.09528 Frame-semantic parsing with softmax-margin segmental RNNs and a syntactic scaffold]. arXiv.
:Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer (2017). [https://homes.cs.washington.edu/~luheng/files/acl2017_hllz.pdf Deep Semantic Role Labeling: What Works and What's Next]. ACL.
 
=== Evaluation Metrics ===
''Organizer: Pamela Shapiro''
 
;Nov 9 (Becky Marvin)
:Philipp Koehn (2004). [http://people.csail.mit.edu/koehn/publications/bootstrap2004.pdf Statistical Significance Tests for Machine Translation Evaluation]. EMNLP.
:Ying Zhang, Stephan Vogel, and Alex Waibel (2004). [http://www.lrec-conf.org/proceedings/lrec2004/pdf/755.pdf Interpreting BLEU/NIST Scores: How Much Improvement Do We Need to Have a Better System?] LREC.
 
;Nov 2 (Pamela Shapiro)
:Chris Callison-Burch, Miles Osborne, and Philipp Koehn (2006). [http://www.cs.jhu.edu/~ccb/publications/re-evaluating-the-role-of-bleu-in-mt-research.pdf Re-evaluating the Role of BLEU in Machine Translation Research]. EACL.
:Yvette Graham, Timothy Baldwin, Alistair Moffat, and Justin Zobel (2014). [https://pdfs.semanticscholar.org/4eed/3f806234a4e5e055ded21193ac3ae9e4b1ca.pdf Is Machine Translation Getting Better over Time?] EACL.
 
;Oct 19 (Harrison Huh)
:Neha Nayak, Gabor Angeli, and Christopher D. Manning (2016). [https://cs.stanford.edu/~angeli/papers/2016-acl-veceval.pdf Evaluating Word Embeddings Using a Representative Suite of Practical Tasks]. ACL.
:Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer (2016). [https://aclweb.org/anthology/W/W16/W16-2506.pdf Problems With Evaluation of Word Embeddings Using Word Similarity Tasks]. ACL.
 
=== Derivational Morphology ===
''Organizer: Arya McCarthy''
 
;Oct 26 (Garrett Nicolai)
:Angeliki Lazaridou, Marco Marelli, Roberto Zamparelli, and Marco Baroni (2013). [http://www.aclweb.org/anthology/P/P13/P13-1149.pdf Compositional-ly (sic) Derived Representations of Morphologically Complex Words in Distributional Semantics]. ACL.
:Max Kisselew, Sebastian Pado, Alexis Palmer, and Jan Snajder (2015). [http://www.aclweb.org/anthology/W15-0108.pdf Obtaining a Better Understanding of Distributional Models of German Derivational Morphology]. Proceedings of the 11th International Conference on Computational Semantics.
 
;Oct 12 (Shijie Wu)
:Noam Chomsky (1968). [http://babel.ucsc.edu/~hank/mrg.readings/Chomsky1970_Nominalization.pdf Remarks on Nominalization]. Linguistics Club, Indiana University.
 
;Oct 5 (Arya McCarthy)
:Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, and David Yarowsky (2017). [http://www.aclweb.org/anthology/D17-1075 Paradigm Completion for Derivational Morphology]. EMNLP.
:Ekaterina Vylomova, Ryan Cotterell, and Timothy Baldwin (2016). [http://www.aclweb.org/anthology/E17-2019 Context-Aware Prediction of Derivational Word-forms]. ACL.
 
<h3><span class="mw-headline" id="Meaning_Representation_Formalisms">Meaning Representation Formalisms</span></h3>
''Organizer: Sebastian Mielke''
Paper ideas and suggestions: [https://docs.google.com/document/d/12lTn3_b0aV-pNKCFrfRpT0FuLx0xGwEi6-CsHApguMk/edit?usp=sharing Google doc]
 
;Sep 28 (Seth Ebner)
:Baldridge and Kruijff (2002). [http://aclweb.org/anthology/P/P02/P02-1041.pdf Coupling CCG and Hybrid Logic Dependency Semantics]. ACL.
 
;Sep 21 (Brian Leonard)
:Emily Bender, Dan Flickinger, Stephan Oepen, Woodley Packard, and Ann Copestake (2015). [http://aclweb.org/anthology/W/W15/W15-0128.pdf Layers of Interpretation: On Grammar and Compositionality]. 11th International Conference on Computational Semantics.
 
;Sep 14 (Sebastian Mielke)
:Angelina Ivanova, Stephan Oepen, Lilja Øvrelid, and Dan Flickinger (2012). [http://www.aclweb.org/anthology/W12-3602 Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies]. 6th Linguistic Annotation Workshop.
:Omri Abend and Ari Rappoport (2017). [https://www.aclweb.org/anthology/P/P17/P17-1008.pdf The State of the Art in Semantic Representation]. ACL.
 
== Fall 2017 ==
''Thursdays 12-1:15pm, Hackerman 306.''
 
=== Inducing "Syntax" for Semantics ===
''Organizer: Adam Poliak''
 
;Suggestions:
1. Swayamdipta, Thomson, Dyer, Smith (2017) [https://arxiv.org/pdf/1706.09528.pdf Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold]. Arxiv (Chu-Cheng)
 
2. He, Lee, Lewis, Zettlemoyer (2017) [https://homes.cs.washington.edu/~luheng/files/acl2017_hllz.pdf Deep Semantic Role Labeling: What Works and What's Next]. ACL
;Dec 7
 
;Nov 30 (Chu-Cheng)
Swayamdipta, Thomson, Dyer, Smith (2017) [https://arxiv.org/pdf/1706.09528.pdf Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold]. Arxiv
 
;Nov 16 (Adam Poliak)
 
:Gormley, Mitchell, Van Durme, Dredze (2014) [http://www.cs.cmu.edu/~mgormley/papers/gormley+al.acl.2014.pdf Low-Resource Semantic Role Labeling]. ACL
:Williams, Drozdov, Bowman (2017) [https://arxiv.org/pdf/1709.01121.pdf Learning to parse from a semantic objective: It works. Is it syntax?]. TACL Submission
 
=== Evaluation Metrics ===
''Organizer: Pamela Shapiro''
<!-- Look in forums such as MTEVAL, REPEVAL, SEMEVAL, LREC; and methods for evaluating AMR, generation, summarization, etc. -->
 
;Nov 9 (Becky Marvin)
 
;Nov 2 (Pamela Shapiro)
 
;Oct 19 (Harrison Huh)
:Neha Nayak, Gabor Angeli, and Christopher D. Manning (2016). [https://cs.stanford.edu/~angeli/papers/2016-acl-veceval.pdf Evaluating Word Embeddings Using a Representative Suite of Practical Tasks]. ACL.
:Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer (2016). [https://aclweb.org/anthology/W/W16/W16-2506.pdf Problems With Evaluation of Word Embeddings Using Word Similarity Tasks]. ACL.
 
=== Derivational Morphology ===
''Organizer: Arya McCarthy''
 
;Oct 26 (Garrett Nicolai)
:Angeliki Lazaridou, Marco Marelli, Roberto Zamparelli, and Marco Baroni (2013). [http://www.aclweb.org/anthology/P/P13/P13-1149.pdf Compositional-ly (sic) Derived Representations of Morphologically Complex Words in Distributional Semantics]. ACL.
:Max Kisselew, Sebastian Pado, Alexis Palmer, and Jan Snajder (2015). [http://www.aclweb.org/anthology/W15-0108.pdf Obtaining a Better Understanding of Distributional Models of German Derivational Morphology]. Proceedings of the 11th International Conference on Computational Semantics.
 
;Oct 12 (Shijie Wu)
:Noam Chomsky (1968). [http://babel.ucsc.edu/~hank/mrg.readings/Chomsky1970_Nominalization.pdf Remarks on Nominalization]. Linguistics Club, Indiana University.
 
;Oct 5 (Arya McCarthy)
:Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, and David Yarowsky (2017). [http://www.aclweb.org/anthology/D17-1075 Paradigm Completion for Derivational Morphology]. EMNLP.
:Ekaterina Vylomova, Ryan Cotterell, and Timothy Baldwin (2016). [http://www.aclweb.org/anthology/E17-2019 Context-Aware Prediction of Derivational Word-forms]. ACL.
 
=== Meaning Representation Formalisms ===
''Organizer: Sebastian Mielke''
 
Paper ideas and suggestions: [https://docs.google.com/document/d/12lTn3_b0aV-pNKCFrfRpT0FuLx0xGwEi6-CsHApguMk/edit?usp=sharing Google doc]
 
;Sep 28 (Seth Ebner)
:Baldridge and Kruijff (2002). [http://aclweb.org/anthology/P/P02/P02-1041.pdf Coupling CCG and Hybrid Logic Dependency Semantics]. ACL.
 
;Sep 21 (Brian Leonard)
:Emily Bender, Dan Flickinger, Stephan Oepen, Woodley Packard, and Ann Copestake (2015). [http://aclweb.org/anthology/W/W15/W15-0128.pdf Layers of Interpretation: On Grammar and Compositionality]. 11th International Conference on Computational Semantics.
 
;Sep 14 (Sebastian Mielke)
:Angelina Ivanova, Stephan Oepen, Lilja Øvrelid, and Dan Flickinger (2012). [http://www.aclweb.org/anthology/W12-3602 Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies]. 6th Linguistic Annotation Workshop.
:Omri Abend and Ari Rappoport (2017). [https://www.aclweb.org/anthology/P/P17/P17-1008.pdf The State of the Art in Semantic Representation]. ACL.
 
== Summer 2017 ==
 
;August 25 (Jason/group)
:Jacob Andreas, Anca Dragan, Dan Klein (2017). [https://arxiv.org/pdf/1704.06960.pdf Translating Neuralese]. ACL.
 
;August 18 (Keisuke Sakaguchi)
:Keisuke Sakaguchi, Matt Post, Benjamin Van Durme (2017). [http://www.aclweb.org/anthology/P/P17/P17-2030.pdf Error-repair Dependency Parsing for Ungrammatical Texts]. ACL. [https://www.slideshare.net/keisks/201707-acl slides].
 
;Second segment (Various)
:Ilya Sutskever (2013). [http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf Training Recurrent Neural Networks]. University of Toronto.
 
;First segment (Various)
:Percy Liang (2011). [https://cs.stanford.edu/~pliang/papers/dcs-thesis2011.pdf Learning Dependency-Based Compositional Semantics]. UC Berkeley.
 
[https://docs.google.com/spreadsheets/d/e/2PACX-1vR0o1nfARFT5weG8NFE8MRKhkG1K4AeMRBpFZYrkMtPD9QrUOAlaPWfLMwh-bzWbTcoSBPg1nRxVTIz/pubhtml (other dissertations considered for discussion)]
 
== Spring 2017 ==
 
=== Point Processes ===
''Organizer: Ryan Cotterell''
 
;May 4 (Keisuke Sakaguchi)
:Alex Kulesza and Ben Taskar (2010). [https://papers.nips.cc/paper/3969-structured-determinantal-point-processes.pdf Structured Determinantal Point Processes]. NIPS.
 
;Apr 27 (Hongyuan Mei)
:Hongyuan Mei and Jason Eisner (2016). [https://arxiv.org/abs/1612.09328 The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process].  arXiv. 
 
;Apr 20 (Ryan Cotterell)
:Ben Taskar. [http://homes.cs.washington.edu/~taskar/pubs/dpp_tut.pdf Determinantal Point Processes] (tutorial).
 
=== Transfer Learning ===
''Organizer: Becky Marvin''
 
;Apr 13 (Chu-Cheng Lin)
: Mikhail Kozhevnikov and Ivan Titov (2013).  [http://www.aclweb.org/anthology/P13-1117 Cross-lingual Transfer of Semantic Role Labeling Models].  ACL.
 
;Apr 6 (Xiaochen Li)
:Oscar Tackstrom, Ryan McDonald, and Jakob Uszkoreit (2012).  [http://soda.swedish-ict.se/5251/1/paper.pdf Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure].  NAACL.
;Mar 30 (Becky Marvin)
:Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight (2016).  [https://aclweb.org/anthology/D16-1163 Transfer Learning for Low-Resource Neural Machine Translation].  EMNLP.
 
=== Dialog ===
''Applications of the two previous topics below.  Organizer: Patrick Xia.''
 
;Mar 16 (Matthew Francis-Landau)
:Tsung-Hsien Wen (2016). [https://arxiv.org/pdf/1604.04562 A Network-based End-to-End Trainable Task-oriented Dialogue System].
 
;Mar 9 (Patrick Xia)
:Jiwei Li (2017). [https://arxiv.org/pdf/1701.06547.pdf Adversarial Learning for Neural Dialogue Generation].
 
=== Deep Reinforcement Learning ===
''Organizers: Hongyuan Mei, Tim Vieira''
 
;Mar 2 (Shijie Wu)
:Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel (2016).  [https://arxiv.org/pdf/1602.02867.pdf Value Iteration Networks].  NIPS.
::''David Silver's [http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf lecture on value iteration] from his [http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html course] might be helpful.''
 
;Feb 23 (Hongyuan Mei and Tim Vieira)
:David Silver (2016).  Tutorial: Deep Reinforcement Learning.  ICML. [http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf Slides], [http://techtalks.tv/talks/deep-reinforcement-learning/62360/ video].
 
=== Generative Adversarial Nets ===
''Organizers: Ryan Cotterell, Dingquan Wang''
 
;Feb 9 and Feb 16 (Ryan Cotterell, Dingquan Wang)
 
:Ian Goodfellow (2016). [https://arxiv.org/pdf/1701.00160v3.pdf NIPS 2016 Tutorial: Generative Adversarial Networks]. [http://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf slides]
 
:Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014). [https://arxiv.org/pdf/1406.2661.pdf Generative Adversarial Nets].  arXiv.
 
:Martin Arjovsky, Soumith Chintala, Léon Bottou (2017). [https://arxiv.org/pdf/1701.07875.pdf Wasserstein GAN]. arXiv. [http://cs.jhu.edu/~wdd/docs/WGAN.pptx slides] [http://cs.jhu.edu/~wdd/docs/WGAN.pdf slides_pdf]
 
:Martin Arjovsky, Léon Bottou (2017). [https://arxiv.org/pdf/1701.04862.pdf Towards Principled Methods for Training Generative Adversarial Networks]. ICLR.
 
== Fall 2016 ==
 
=== Interpretation and visualization of deep networks ===
 
''Organizer: Nanyun (Violet) Peng''
 
;Dec 8 (Zach Wood-Doughty)
:Li, Yixuan, et al. (2015) [https://arxiv.org/abs/1511.07543 Convergent Learning: Do different neural networks learn the same representations?] [slides http://s.yosinski.com/yosinski_160503_iclr_convergent.pdf]
 
::''Bonus paper:'' Kádár, Ákos, Grzegorz Chrupała, and Afra Alishahi. [https://arxiv.org/pdf/1602.08952.pdf Representation of linguistic form and function in recurrent neural networks].
 
;Dec 1 (Pamela Shapiro)
:Andrej Karpathy, Justin Johnson and Li Fei-Fei (2016) [http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf Visualizing and understanding recurrent networks].  ICLR.
 
;Nov 17 (Nanyun (Violet) Peng)
:Tao Lei, Regina Barzilay and Tommi Jaakkola (2016) [https://people.csail.mit.edu/taolei/papers/emnlp16_rationale.pdf Rationalizing Neural Predictions].  EMNLP.
 
=== Neural MT and generation ===
 
''Organizer: Chu-Cheng Lin''
 
;Nov 10
: EMNLP debriefing session.
 
;Nov 3 (Becky Marvin)
:Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka (2016).  [http://www.aclweb.org/anthology/P16-1078 Tree-to-Sequence Attentional Neural Machine Translation].  ACL.
 
;Oct 27 (Chu-Cheng Lin)
:Ilya Sutskever, Oriol Vinyals, and Quoc V. Le (2014).  [https://via.hypothes.is/https://arxiv.org/pdf/1409.3215v3.pdf Sequence to Sequence Learning with Neural Networks].  NIPS.
:Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio (2015). [https://via.hypothes.is/https://arxiv.org/pdf/1409.0473v7.pdf Neural Machine Translation by Jointly Learning to Align and Translate].  ICLR.
 
=== Deep learning in structured prediction ===
 
''Organizer: Tim Vieira''
 
;Sep 29 (Patrick Xia and Matthew Francis-Landau)
:Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros and Noah A. Smith (2016) [http://www.aclweb.org/anthology/N16-1024 Recurrent Neural Network Grammars].  NAACL.
 
;Sep 22 (Chu-Cheng Lin and Hongyuan Mei)
:David Belanger and Andrew McCallum (2016). [https://via.hypothes.is/https://people.cs.umass.edu/~belanger/belanger_spen_icml.pdf#annotations:J2yD_n33EeavDL91X0IyDg Structured Prediction Energy Networks].  ICML.
 
;Sep 15 (Matthew Francis-Landau)
:Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein (2016). [http://arxiv.org/pdf/1601.01705v4.pdf Learning to Compose Neural Networks for Question Answering].  NAACL.
 
=== Hypergraph algorithms ===
 
''Organizer: Travis Wolfe''
 
;Oct 13 (Becky Marvin)
:Alexander M. Rush, Yin-Wen Chang, and Michael Collins (2013).  [http://www.aclweb.org/anthology/D13-1022 Optimal Beam Search for Machine Translation].  EMNLP.
 
;Oct 6 (Tim Vieira and Zach Wood-Doughty)
:Zhifei Li and Jason Eisner (2009). [https://www.cs.jhu.edu/~jason/papers/#li-eisner-2009 First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests].  EMNLP.
 
;Sep 8 (Travis Wolfe)
:Liang Huang (2008). [http://www.aclweb.org/anthology/C08-5001 Advanced Dynamic Programming in Semiring and Hypergraph Frameworks].  COLING tutorial notes.
 
== Spring 2016 ==
 
=== Interpretable ML ===
 
;Apr 28
: Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling (2015).  [http://arxiv.org/abs/1506.04416 Bayesian Dark Knowledge].  Submitted to NIPS.
 
;Apr 21
: Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin. (2016). [http://arxiv.org/pdf/1602.04938v1.pdf Why Should I Trust You?” Explaining the Predictions of Any Classifier]. CHI Workshop on Human-Centred Machine Learning (HCML).
::''Optional background reading: [http://www.icml-2011.org/papers/398_icmlpaper.pdf Bayesian Learning via Stochastic Gradient Langevin Dynamics] (ICML 2011).''
 
;Apr 14
: Letham, Rudin, McCormick, and Madigan (2012).  [https://www.stat.washington.edu/research/reports/2012/tr609.pdf Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model].
 
=== Open-Domain Information Extraction ===
 
''Nanyun will organize this unit.''
 
;Apr 7 (Nanyun Peng)
:Jayant Krishnamurthy and Tom M Mitchell (2015). [http://aclweb.org/anthology/Q/Q15/Q15-1019.pdf Learning a Compositional Semantics for Freebase with an Open Predicate Vocabulary]. TACL.
 
;Mar 31 (Dingquan Wang)
:Sebastian Riedel, Limin Yao, Benjamin M. Marlin and Andrew McCallum (2013). [http://www.riedelcastro.org//publications/papers/riedel13relation.pdf Relation Extraction with Matrix Factorization and Universal Schemas]. NAACL.
 
;Mar 24 (Nanyun Peng)
:T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J. Welling (2015). [http://www.cs.cmu.edu/~tom/pubs/NELL_aaai15.pdf Never-Ending Language Learning]. AAAI.
 
=== Reinforcement learning ===
 
''Keisuke and Tim will organize this unit.''
 
;Mar 10 (Keisuke Sakaguchi, Tim Vieira)
:Sergey Levine and Vladlen Koltun (2013). [https://graphics.stanford.edu/projects/gpspaper/gps_full.pdf Guided Policy Search]. ICML.
 
;Mar 3 (Nick Andrews)
:David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller (2014).  [http://jmlr.org/proceedings/papers/v32/silver14.pdf Deterministic Policy Gradient Algorithms].  ICML.
 
;Feb 11, 18, 25 (Tim Vieira, Travis Wolfe)
:Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard and Ed Snelson (2013).  [http://arxiv.org/pdf/1209.2355v5.pdf Counterfactual Reasoning and Learning Systems].  arxiv.  [http://leon.bottou.org/slides/counterfactuals/mlsummit2013.pdf slides]
 
;Feb 4 (Keisuke Sakaguchi)
:Merwan Barlier, Julien Perolat, Romain Laroche, and Olivier Pietquin (2015). [http://www.aclweb.org/anthology/W15-4602 Human-Machine Dialogue as a Stochastic Game]. SIGDIAL. [http://www.superlectures.com/sigdial2015/downloadFile?id=3&type=slides&filename=human-machine-dialogue-as-a-stochastic-game slides]
::''Optional background:'' Verena Rieser and Oliver Lemon (2011). [http://link.springer.com/chapter/10.1007/978-3-642-24942-6_3 Reinforcement Learning].  In [http://link.springer.com/book/10.1007/978-3-642-24942-6 Reinforcement Learning for Adaptive Dialogue Systems], chapter 3.
 
== Fall 2015 ==
 
=== Tensor Decomp ===
 
; December 17
: NIPS debriefing session.
 
; December 3 (Pushpendre Rastogi)
: Schein et al. (2015). [http://www.columbia.edu/~jwp2128/Papers/Scheinetal2015.pdf Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts]. International Conference on Knowledge Discovery and Data Mining (KDD).  [http://www.hongliangjie.com/2015/08/17/poisson-matrix-factorization/ blog post]
 
; November 19 (Satya Prateek)
: Singh et al. (2015). [http://rockt.github.io/pdf/singh2015towards.pdf Towards Combined Matrix and Tensor Factorization for Universal Schema Relation Extraction] NAACL.
 
; November 12 (Pushpendre Rastogi)
: Tao Lei et al (2014). [https://people.csail.mit.edu/tommi/papers/Lei-ACL14.pdf Low Rank Tensors For Scoring Dependency Structures]. ACL (Best Paper).
 
=== Abstract Meaning Representation (AMR)  ===
 
; Nov 5 (Darcey Riley)
: Frank Drewes, Hans-Jorg Kreowski, and Annegret Habel (1997).  [http://www.informatik.uni-bremen.de/theorie/teach/gratra/2004-1/Skript/hr.pdf Hyperedge Replacement Graph Grammars].  In ''Handbook of Graph Grammars and Computing by Graph Transformation'', pp. 95-162.
 
; Oct 29 (Darcey Riley)
: Jones, Bevan, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, and Kevin Knight (2012).  [http://www.isi.edu/natural-language/people/sembmt-coling-12.pdf Semantics-Based Machine Translation with Hyperedge Replacement Grammars].  Proc. COLING.
 
; Oct 22 (Darcey Riley)
: Banarescu et al. (2013).  [http://amr.isi.edu/a.pdf Abstract Meaning Representation for Sembanking.]  Proc. Linguistic Annotation Workshop.
 
=== Deep + Probabilistic / Deep + Attention ===
 
; Oct 15 (Kevin Duh)
: Deep learning tutorial talk.
 
;October 8 (Chu-Cheng Lin)
: Andriy Mnih and Karol Gregor (2014).  [http://arxiv.org/abs/1402.0030 Neural Variational Inference and Learning in Belief Networks].  ICML.
 
=== Adaptive Inference ===
 
;October 1 (Tim Vieira)
: S. M. Ali Eslami, Daniel Tarlow, Pushmeet Kohli, and John Winn (2014).  [http://arkitus.com/files/nips-14-eslami-just-in-time.pdf Just-In-Time Learning for Fast and Flexible Inference.  NIPS.  [http://arkitus.com/files/nips-14-eslami-just-in-time-supplementary.zip supplementary material], [http://arkitus.com/files/nips-14-eslami-just-in-time-poster.pdf poster]
 
;September 24
: EMNLP debriefing session.
 
;September 17 (Pushpendre Rastogi)
: David Weiss and Ben Taskar (2013).  [http://papers.nips.cc/paper/5142-learning-adaptive-value-of-information-for-structured-prediction Learning Adaptive Value of Information for Structured Prediction]. NIPS.
 
;September 10 (Travis Wolfe)
: Jacob Steinhardt and Percy Liang (2015). [http://arxiv.org/pdf/1502.06665v1.pdf Reified Context Models].
 
;September 3 (Tim Vieira and Adam Teichert)
: Shi, Tianlin, Jacob Steinhardt, and Percy Liang (2015). [http://www.jmlr.org/proceedings/papers/v38/shi15.pdf Learning Where to Sample in Structured Prediction]. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics.
 
== Spring 2015 ==
 
''Thursdays 12-1:15pm, Hackerman 306.''
 
=== Extreme Learning Machine & Computational Learning Theory (w/ practical applications) ===
;Apr 23 (Mozhi Zhang)
: Maclaurin et al. (2015) [http://arxiv.org/pdf/1011.0686.pdf Gradient-based Hyperparameter Optimization through Reversible Learning.] arXiv.
 
;Apr 16 (Tim Vieira)
: Ross et al. (2011) [http://arxiv.org/pdf/1011.0686.pdf A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.] AISTATS.
 
;Apr 9 (Dingquan Wang)
: Long et al. (2010) [http://www.cs.columbia.edu/~rocco/Public/final-camera-ready-icml10.pdf Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate.] ICML.
 
;Apr 2 (Tongfei Chen)
: Blum et al. (1999) [http://hunch.net/~jl/projects/prediction_bounds/progressive_validation/coltfinal.pdf Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation.] COLT.
 
;Mar 26 (Satya Prateek)
:Yosinski et al. (2014) [http://arxiv.org/abs/1411.1792 How transferable are features in deep neural networks?] arXiv.
 
;Mar 12 (Travis Wolfe)
:Huang et al. (2006) [http://www.sciencedirect.com/science/article/pii/S0925231206000385# Extreme Learning Machine: Theory and Applications.] Neurocomputing.
 
=== Transition-based parsing ===
;Feb 19 (Mo Yu)
:Huang et al. (2012) [http://www.aclweb.org/anthology/N/N12/N12-1015.pdf Structured Perceptron with Inexact Search.] NAACL.
 
;Feb 12 (Keisuke Sakaguchi)
:Sartorio et al. (2013) [http://www.aclweb.org/anthology/P13-1014.pdf A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy.] ACL.
 
;Feb 5 (Travis Wolfe)
: Yamada and Matsumoto (2003) [https://faculty.cs.byu.edu/~ringger/CS479/papers/YamadaMatsumoto_DepParse-iwpt2003.pdf Statistical Dependency Analysis With Support Vector Machines.] IWPT.
 
== Fall 2014 ==
 
''Thursdays 12-1:15pm, Hackerman 306.''
 
=== Scientific practice ===
 
;Dec 4 (Michael Paul) - Open science and publishing models
:Jason Priem (2013).  [http://www.nature.com/nature/journal/v495/n7442/full/495437a.html Beyond the paper].  Nature.
:Timothy Gowers and Michael Nielsen (2009).  [http://www.nature.com/nature/journal/v461/n7266/full/461879a.html Massively collaborative mathematics].  Nature.
:Yann LeCun (2011?).  [http://yann.lecun.com/ex/pamphlets/publishing-models.html A new publishing model in computer science].  Blog post.
:Donald Geman (2007).  [http://www.cis.jhu.edu/publications/papers_in_database/GEMAN/Ten_Reasons.pdf Ten reasons why conference papers should be abolished].  Manuscript.
:Eric Price (2014).  [http://blog.mrtz.org/2014/12/15/the-nips-experiment.html The NIPS experiment]. Blog post.
:Bert Huang (2014).  [https://berthuang.wordpress.com/2014/12/18/on-the-nips-experiment-and-review-process/ On the NIPS experiment and review process].  Blog post.
 
;Nov 20 (Dingquan Wang)
: Eisenstein (2013) [http://www.cc.gatech.edu/~jeisenst/papers/naacl2013-badlanguage.pdf What to do about bad language on the internet]. NAACL.
 
;Nov 6 (Matt Gormley)
:Clark et al. (2011) [http://www.aclweb.org/anthology/P11-2031 Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability]. ACL.
:Søgaard et al. (2014) [http://www.aclweb.org/anthology/W14-1601 What's a p-value in NLP?]. CoNLL.
 
=== Probabilistic semantics ===
 
;Oct 30 (Elan Hourticolon-Retzler)
:Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, Luke Zettlemoyer (2013). [http://homes.cs.washington.edu/~lsz/papers/kcaz-emnlp13.pdf Scaling Semantic Parsers with On-the-fly Ontology Matching]. EMNLP.
 
;Oct 23 (Violet Nanyun Peng)
:Jonathan Berant, Percy Liang, (2014). [http://cs.stanford.edu/~pliang/papers/paraphrasing-acl2014.pdf Semantic Parsing via Paraphrasing]. ACL.
 
;Oct 16 (Darcey Riley)
:Noah D. Goodman and Daniel Lassiter.  [http://web.stanford.edu/~ngoodman/papers/Goodman-HCS-final.pdf Probabilistic Semantics and Pragmatics: Uncertainty in Language and Thought].  Chapter for ''Handbook of Semantics''.
 
=== Probabilistic programming ===
 
;Oct 9 (Adam Teichert)
:Examples section of Noah Goodman and Andreas Stuhlmüller (2014).  [http://dippl.org The Design and Implementation of Probabilistic Programming Languages].  Electronic book at http://dippl.org.
 
;Oct 2 (Travis Wolfe)
:Chapters 4,5,6 of Noah Goodman and Andreas Stuhlmüller (2014).  [http://dippl.org The Design and Implementation of Probabilistic Programming Languages].  Electronic book at http://dippl.org.
 
;Sep 25 (Pushpendre Rastogi)
:Chapters 2,3 of Noah Goodman and Andreas Stuhlmüller (2014).  [http://dippl.org The Design and Implementation of Probabilistic Programming Languages].  Electronic book at http://dippl.org.
 
=== Beyond MCMC ===
 
;Sep 18 (Chandler May)
:Aaron Li, Amr Ahmed, Sujith Ravi, and Alexander J Smola (2014). [http://www.sravi.org/pubs/fastlda-kdd2014.pdf Reducing the Sampling Complexity of Topic Models]. KDD.
::''Background: [http://www.keithschwarz.com/darts-dice-coins/ alias sampling]''
 
;Sep 11 (Frank Ferraro)
:Luke Bornn, Yutian Chen, Nando de Freitas, Mareija Eskelin, Jing Fang, and Max Welling (2013). [http://arxiv.org/abs/1301.4168 Herded Gibbs Sampling]. ICLR.
 
;Sep 4 (Nicholas Andrews)
:Anoop Korattikara, Yutian Chen, and Max Welling (2014). [http://arxiv.org/pdf/1304.5299v4 Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget] ICML.
 
== Summer 2014 ==
 
;Aug 14 (Adam Teichert)
:Joseph Gonzalez, Yucheng Low, Arthur Gretton, and Carlos Guestrin (2011). [http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_GonzalezLGG11.pdf Parallel gibbs sampling: From colored fields to thin junction trees.] AISTATS.
 
;July 24 (Tim Vieira)
:Alexandre Bouchard-Côté, Slav Petrov, and Dan Klein (2009). [http://www.stat.ubc.ca/~bouchard/pub/bouchard-auxv.pdf Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs]. NIPS.
 
;July 24 (Matt Gormley)
:Michael U. Gutmann and Aapo Hyvärinen (2010). [http://jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf  A new estimation principle for unnormalized statistical models]. AISTATS.
 
;July 17 (Juneki Hong)
:TBA
 
;May 15 (Tim Vieira)
:TBA
 
;May 8 (Travis Wolfe)
:Percy Liang, Hal Daume, and Dan Klein (2008). [http://www.umiacs.umd.edu/~hal/docs/daume08flat.pdf Structure Compilation: Trading Structure for Features]. ICML.
 
== Spring 2014 ==
 
=== Recent papers ===
 
;May 1 (Michael Paul)
:Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber (2014). [http://www.cs.umd.edu/~ynhu/publications/acl2014_spectral.pdf Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms]. ACL.
 
;Apr 24 (Adam Teichert)
:Dani Yogatama and Noah A. Smith (2014). [http://www.cs.cmu.edu/~nasmith/papers/yogatama+smith.acl14.pdf Linguistic Structured Sparsity in Text Categorization]. ACL.
 
=== Semantic parsing ===
 
;Apr 17 (Juneki Hong)
:Dipanjan Das, Andre F. T. Martins, and Noah Smith (2012). [http://www.cs.cmu.edu/~nasmith/papers/das+martins+smith.starsem12.pdf An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints]. *SEM. [http://www.dipanjandas.com/files/starsemslides.pdf slides]
 
;Apr 10 (Keisuke Sakaguchi & Yiran Zhang)
:Yoav Artzi and Luke Zettlemoyer (2013). [http://yoavartzi.com/pub/az-tacl.2013.pdf Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions]. TACL.
 
;Apr 3 (Xuchen Yao)
:Percy Liang, Michael I. Jordan, and Dan Klein (2011).  [http://cs.stanford.edu/~pliang/papers/dcs-acl2011.pdf Learning dependency-based compositional semantics.]. ACL.
 
=== Clever MT algorithms ===
 
;Mar 27 (Matt Gormley)
: Michel Galley, Chris Quirk, Colin Cherry, and Kristina Toutanova (2013). [http://research.microsoft.com/pubs/201106/regularized_mert.pdf Regularized Minimum Error Rate Training]. EMNLP.
 
;Mar 13 (Dan Deutsch)
:Adam Pauls and Dan Klein (2009). [http://www.cs.berkeley.edu/~adpauls/PAPERS/acl2009.pdf K-Best A* Parsing]. ACL.
 
;Mar 6 (Nanyun Peng)
:Andrei Simion, Michael Collins, and Clifford Stein (2013).  [http://www.cs.columbia.edu/~mcollins/papers/ibm2convex.pdf A Convex Alternative to IBM Model 2]. EMNLP.
=== Online inference ===
 
;Feb 27 (Nicholas Andrews)
:Michael Bryant and Erik B. Sudderth (2012).  [http://cs.brown.edu/~sudderth/papers/nips12hdpOnline.pdf Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes].  NIPS. [https://docs.google.com/presentation/d/1XW2D0vcyhd01ImdFRRaHJfN8clQu9NsgWXg2ZLJRECE/edit?usp=sharing Nick's slides]
 
;Feb 13 (Frank Ferraro), Feb 20 (Ryan Cotterell)
:Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley (2013).  [http://jmlr.org/papers/volume14/hoffman13a/hoffman13a.pdf  Stochastic variational inference]. JMLR.
 
;Feb 6 (Ryan Cotterell)
:Percy Liang and Dan Klein (2009). [http://nlp.cs.berkeley.edu/pubs/Liang-Klein_2009_OnlineEM_paper.pdf Online EM for unsupervised models].  NAACL.  [http://www-cs.stanford.edu/~pliang/papers/online-naacl2009-talk.pdf‎ slides]
 
== Fall 2013 ==
 
=== Recent Papers ===
 
;Dec 12
:NIPS debriefing.
 
;Dec 5 (Xuchen Yao)
:Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang (2013).  [http://cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pdf Semantic parsing on Freebase from question-answer pairs].  EMNLP.  [http://arxiv.org/pdf/1309.4408.pdf supplement]
::''See also:'' Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer (2013).  [https://homes.cs.washington.edu/~tomk/KCAZemnlp2013.pdf Scaling Semantic Parsers with On-the-Fly Ontology Matching].  EMNLP.
 
;Nov 21 (Adam Teichert)
:Alexander M. Rush, Yin-Wen Chang, and Michael Collins (2013). [http://aclweb.org/anthology/D/D13/D13-1022.pdf Optimal Beam Search for Machine Translation]. EMNLP.
 
;Nov 14 (Frank Ferraro)
:Yi Yang and Jacob Eisenstein (2013). [http://aclweb.org/anthology/D/D13/D13-1007.pdf A Log-Linear Model for Unsupervised Text Normalization]. EMNLP.
 
=== ML for Annotation / Active Learning ===
 
;Nov 7 (Michael Paul)
:Dan Garrette and Jason Baldridge (2013).  [http://www.cs.utexas.edu/users/dhg/papers/garrette_baldridge_naacl2013.pdf Learning a Part-of-Speech Tagger from Two Hours of Annotation].  NAACL.
 
;Oct 31 (Ryan Cotterell)
:Burr Settles (2012). [http://www.morganclaypool.com/doi/pdf/10.2200/S00429ED1V01Y201207AIM018  Active Learning], chapters 3-5.  Synthesis Lectures on Artificial Intelligence and Machine Learning.
 
;Oct 24
:EMNLP debriefing.
 
;Oct 17 (Tim Vieira)
:Burr Settles (2012). [http://www.morganclaypool.com/doi/pdf/10.2200/S00429ED1V01Y201207AIM018  Active Learning], chapters 1-3.  Synthesis Lectures on Artificial Intelligence and Machine Learning.
 
=== Informal Domains ===
 
;Oct 10 (Naomi Saphra)
: Jacob Eisenstein (2012). [http://www.cc.gatech.edu/~jeisenst/papers/lasm13-phono.pdf Phonological Factors in Social Media Writing]. Proceedings of NAACL Workshop on Language Analysis in Social Media.
 
;Oct 3 (Juneki Hong)
: Alan Ritter, Sam Clark, Mausam, and Oren Etzioni (2011). [http://aritter.github.io/twitter_ner.pdf Named Entity Recognition in Tweets: An Experimental Study]. EMNLP. [http://aritter.github.io/twitter_ner.pptx slides]
 
=== Deep Learning for NLP ===
 
;Sep 26 (Nick Andrews)
: Richard Socher and Christopher Manning (2013). [http://nlp.stanford.edu/courses/NAACL2013/ Deep Learning for NLP (without Magic)]. Tutorial at NAACL, continued.
 
;Sep 19 (Matt Gormley)
: Richard Socher and Christopher Manning (2013). [http://nlp.stanford.edu/courses/NAACL2013/ Deep Learning for NLP (without Magic)]. Tutorial at NAACL.
 
;Sep 12 (Travis Wolfe)
: Ronan Collobert and Jason Weston (2008). [http://ronan.collobert.com/pub/matos/2008_nlp_icml.pdf A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning]. ICML.
 
== Summer 2013 ==
 
;Aug 15
:ACL debriefing.
 
;Jun 20
:NAACL debriefing.
 
;Jun 13 (Nicholas Andrews)
:Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts (2013). [http://nlp.stanford.edu/pubs/discourse-referent-lifespans.pdf The Life and Death of Discourse Entities: Identifying Singleton Mentions]. NAACL (short paper).
 
== Spring 2013 ==
 
''Thursdays 12-1:15pm in Hackerman 306.''
 
=== Recent NLP papers ===
 
;May 2 (Gaurav Kumar)
:Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, and Joakim Nivre (2013). [http://www.aclweb.org/anthology-new/Q/Q13/Q13-1001.pdf Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging]. TACL.
 
;Apr 25 (Nicholas Andrews)
: J Gillenwater, A Kulesza, and B Taskar (2012). [http://aclweb.org/anthology/D/D12/D12-1065.pdf Discovering Diverse and Salient Threads in Document Collections]. EMNLP.
 
;Apr 18 (Michael Paul)
: R Socher, M Ganjoo, H Sridhar, O Bastani, CD Manning, and AY Ng (2013). [http://arxiv.org/pdf/1301.3666v2.pdf Zero-Shot Learning Through Cross-Modal Transfer]. arXiv, March.
 
=== Inference for NLP ===
 
;Apr 11 (Matt Gormley)
:J. Domke (2011). [http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/viewFile/3718/3998 Dual decomposition for marginal inference]. AAAI.
 
;Apr 4 (Tim Vieira)
: J. Paisley, D. Blei, and M. Jordan (2012). [http://arxiv.org/abs/1206.6430 Variational Bayesian Inference with Stochastic Search]. ICML.
 
;Mar 28 (Adam Teichert)
: D. Weiss, B. Sapp, and B. Taskar (2012). [http://arxiv.org/pdf/1208.3279.pdf Structured Prediction Cascades]. arXiv, August.
 
=== Semantics in NLP ===
 
;Mar 14 (Violet Nanyun Peng)
:Dipanjan Das and Noah A. Smith (2011). [http://www.cs.cmu.edu/~nasmith/papers/das+smith.acl11.pdf Semi-Supervised Frame-Semantic Parsing for Unknown Predicates]. ACL.
 
;Mar 7 (Frank Ferraro)
: David Chen (2012). [http://www.aclweb.org/anthology/P/P12/P12-1045.pdf Fast Online Lexicon Learning for Grounded Language Acquisition]. ACL.
 
;Feb 28 (Darcey Riley)
: Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox (2012).  [http://homes.cs.washington.edu/~lsz/papers/mfzbf-icml12.pdf A Joint Model of Language and Perception for Grounded Attribute Learning].  ICML.
 
=== Alignment ===
 
;Feb 21 (Henry Pao)
: Chris Dyer, Jonathan Clark, Alon Lavie, and Noah A. Smith (2011). [http://www.aclweb.org/anthology/P11-1042 Unsupervised Word Alignment with Arbitrary Features]. ACL.
 
;Feb 14 (Travis Wolfe)
: Adam Pauls, Dan Klein, David Chiang, and Kevin Knight (2010). [http://www.cs.berkeley.edu/~adpauls/PAPERS/naacl2010.pdf Unsupervised Syntactic Alignment with Inversion Transduction Grammars]. NAACL.
 
;Feb 7 (Xuchen Yao)
: Mohit Bansal, Chris Quirk, and Robert C. Moore (2011).  [http://nlp.cs.berkeley.edu/pubs/Bansal-Quirk-Moore_2011_GappyAlignment_paper.pdf Gappy Phrasal Alignment By Agreement]. ACL.
 
== Fall 2012 ==
 
=== Good recent ML papers ===
 
;Jan 24 (Nick Andrews)
:Tony Jebara and Anna Choromanska (2012). [http://books.nips.cc/papers/files/nips25/NIPS2012_0279.pdf Majorization for CRFs and Latent Likelihoods]. NIPS.
 
;Jan 17 (Adam Teichert)
:Po-Ling Loh and Martin Wainwright (2012). [http://books.nips.cc/papers/files/nips25/NIPS2012_1027.pdf Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses]. NIPS.
 
;Jan 10 (Tim Vieira)
:Thomas Furmston and David Barber (2012). [http://books.nips.cc/papers/files/nips25/NIPS2012_1261.pdf A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes]. NIPS.
 
;Jan 3 (Jason Eisner)
:Robert Gens and Pedro Domingos (2012).  [http://homes.cs.washington.edu/~rcg/papers/dspn.pdf Discriminative Learning of Sum-Product Networks].  NIPS.  [http://homes.cs.washington.edu/~rcg/talks/Gens_DSPN_NIPS2012.pdf Slides].
 
=== Good recent NLP papers ===
 
;Dec 13 (Nathaniel Filardo)
:Sebastian Riedel, David Smith, and Andrew McCallum (2012).  [http://aclweb.org/anthology-new/D/D12/D12-1067.pdf Parse, Price and Cut: Delayed Column and Row Generation for Graph Based Parsers]. ACL.  [http://en.wikipedia.org/wiki/Quadratic_programming background]
 
;Dec 6 (Gaurav Kumar)
:Liang Huang, Suphan Fayong, and Yang Guo (2012). [http://www.isi.edu/~lhuang/perc-inexact.pdf Structured Perceptron with Inexact Search.] NAACL.
 
;Nov 29 (Frank Ferraro)
:Jason Naradowsky, Sebastian Riedel, and David Smith (2012). [http://www.aclweb.org/anthology/D/D12/D12-1074.pdf Improving NLP through Marginalization of Hidden Syntactic Structure]. EMNLP.
 
;Nov 15 (Henry Pao)
:Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng (2012).  [http://www.aclweb.org/anthology/D12-1110 Semantic compositionality through recursive matrix-vector spaces].  ACL.
 
=== Human sentence processing ===
 
;Nov 8 (Olivia Buzek)
:Steven T. Piantadosi, Harry Tily, and Edward Gibson (2011).  [http://dx.doi.org/10.1016/j.cognition.2011.10.004 The communicative function of ambiguity in language].  Cognition.
 
;Nov 1 (Aric Velbel)
:Roger Levy and T. Florian Jaeger (2007). [http://idiom.ucsd.edu/~rlevy/papers/paper_info_density_optimize.pdf Speakers optimize information density through syntactic reduction]. Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems.
 
;Oct 25 (Keith Levin)
:Bock, K., & Levelt, W. J. M. (1994). [http://www.mpi.nl/world/materials/publications/levelt/Bock_Levelt_Language_1994.pdf Language production: Grammatical encoding.] In M.A. Gernsbacher (Ed.), Handbook of Psycholinguistics (pp. 945-984). London: Academic Press.
 
=== Streaming/online algorithms in NLP ===
 
;Oct 18 (Travis Wolfe)
:Martins, Gimpel, Smith, Xing, Figueiredo, and Aguiar (2010). [http://www.cs.cmu.edu/~nasmith/papers/martins+etal.tr10.pdf Aggressive Online Learning of Structured Classifiers]. Tech report.
:(also seen online as "Learning Structured Classifiers with Dual Coordinate Ascent")
 
;Oct 11 (Violet (Nanyun) Peng)
:Graham Cormode (2011). [http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf Sketch Techniques for Approximate Query Processing]. Foundations and Trends in Database.
 
;Oct 4 (Matt Gormley)
:Benjamin Van Durme (2012). [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeEMNLP12.pdf Streaming Analysis of Discourse Participants]. EMNLP.
 
=== Events/Narratives in text ===
 
;Sep 27 (Xuchen Yao)
: Quang Do, Wei Lu, Dan Roth (2012). [http://l2r.cs.uiuc.edu/%7Equangdo2/papers/DoLuRo12.pdf Joint Inference for Event Timeline Construction]. EMNLP.
 
;Sep 20 (Adam Teichert)
:Roi Reichart and Regina Barzilay (2012).  [http://aclweb.org/anthology-new/N/N12/N12-1008.pdf Multi Event Extraction Guided by Global Constraints].  NAACL.
 
;Sep 13 (Michael Paul)
:Nathanael Chambers and Dan Jurafsky (2009).  [http://www.usna.edu/Users/cs/nchamber/pubs/acl09-narrative-schema.pdf Unsupervised Learning of Narrative Schemas and their Participants].  ACL.
 
== Summer 2012 ==
 
=== Summer conference papers ===
 
;Aug 30 (Darcey Riley)
:Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku (2012).  [http://aclweb.org/anthology-new/P/P12/P12-1037.pdf Learning to "Read Between the Lines" using Bayesian Logic Programs].  ACL.
 
;Aug 23 (Wes Filardo)
:Zhiheng Huang et al. (2012). [http://aclweb.org/anthology-new/P/P12/P12-1064.pdf Iterative Viterbi A* Algorithm for K-Best Sequential Decoding].  ACL.
 
;Aug 16 (Travis Wolfe)
:Alex Kulesza and Ben Taskar (2011). [http://www.cis.upenn.edu/~taskar/pubs/ldpps_uai11.pdf Learning Determinantal Point Processes]. UAI.
 
;Aug 10 (Nick Andrews)
:David Hall and Dan Klein (2012). [http://aclweb.org/anthology-new/D/D12/D12-1105.pdf  Training Factored PCFGs with Expectation Propagation]. EMNLP.
 
;Aug 3 (Michael Paul)
:Quang Do; Wei Lu; Dan Roth (2012). [http://aclweb.org/anthology-new/D/D12/D12-1062.pdf Joint Inference for Event Timeline Construction]. EMNLP.
 
;Jul 5 (Tim Vieira)
:David Burkett and Dan Klein (2012). [http://www.cs.berkeley.edu/~dburkett/papers/burkett12-bp_alignment.pdf Fast Inference in Phrase Extraction Models with Belief Propagation]. NAACL. [http://www.cs.berkeley.edu/~dburkett/slides/burkett12-bp_alignment-slides.pdf Slides].
 
;Jun 29 (Adam Teichert)
:Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit (2012). [http://aclweb.org/anthology-new/N/N12/N12-1052.pdf Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure]. NAACL.
 
== Spring 2012 ==
 
=== Spectral learning ===
 
;May 3  (Xuchen Yao)
: Paramveer Dhillon, Dean Foster and Lyle Ungar (2011). [http://www.pdhillon.com/nips11dhillon.pdf Multi-View Learning of Word Embeddings via CCA]. NIPS 24 , Granada, Spain, Dec. 2011
 
;Apr 26 (Matt Gormley)
: Franco M. Luque, Ariadna Quattoni, Borja Balle, and Xavier Carreras (2012).  [http://www.lsi.upc.edu/~carreras/pubs/2012-eacl-lqbc.pdf Spectral Learning for Non-Deterministic Dependency Parsing].  EACL 2012. Best paper award.
 
;Apr 19 (Michael Paul)
: Daniel Hsu, Sham M. Kakade, and Tong Zhang (2009).  [http://www.cs.mcgill.ca/~colt2009/papers/011.pdf A Spectral Algorithm for Learning Hidden Markov Models].  Twenty-Second Annual Conference on Learning Theory (COLT).
 
=== Reinforcement learning ===
 
;Apr 12 (Travis Wolfe)
: Wilson, Fern, Ray, and Tadepalli (2007). [http://engr.case.edu/ray_soumya/papers/mtrl-hb.icml07.pdf Multi-Task Reinforcement Learning: A Hierarchical Bayesian Approach]. ICML.
 
;Apr 5 (Nathaniel Filardo)
: Wingate, David et al. (2011). [http://www.mit.edu/~wingated/papers/ijcaipp.pdf Bayesian Policy Search with Policy Priors].  International Joint Conference on Artificial Intelligence (IJCAI).
 
;Mar 29 (Jay Feldman)
:Gergely Neu and Csaba Szepesvári (2009). [http://www.springerlink.com/content/y117l00k52q41235/ Training parsers by inverse reinforcement learning], Machine Learning Volume 77, Issue 2. Published online by Springer Netherlands.
 
=== Non-convex optimization ===
 
;Mar 15 (Frank Ferraro)
Main reading: Robert Michael Lewis, Virginia Torczon, and Michael W. Trosset (2000). [http://www.cs.wm.edu/~va/research/jcam.pdf Direct search methods: then and now]. Journal of Computational and Applied Mathematics, Volume 124, Issues 1-2, December, pp. 191-207.
 
Optional/supplemental reading: Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon (2003). [http://www.cs.wm.edu/~va/research/sirev.pdf Optimization by direct search: new perspectives on some classical and modern methods].  SIAM Review, Vol. 45, Issue 3, pages 385-482.
 
;Mar 8 (Tim Vieira)
 
Eric Brochu, Vlad M. Cora and Nando de Freitas (2009). [http://www.personal.psu.edu/users/s/d/sdt144/MyResearch/ComputerExperiments/Literature/BrochuCoraFreitas2009.pdf A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning]. '''pages 1-23.'''
 
;Mar 1 (Nicholas Andrews)
:Main reading (Part 1): M. Ebden (2008). [http://www.robots.ox.ac.uk/~mebden/reports/GPtutorial.pdf Gaussian Processes for Regression: A Quick Introduction.] TR.
:Extra reading (Chapter 2): Carl Edward Rasmussen and Christopher K. I. Williams (2006). [http://www.gaussianprocess.org/gpml/chapters/RW2.pdf Gaussian Processes for Machine Learning]. MIT Press.
:Extra extra reading (Chapter 45): David J.C. MacKay (2003). [http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/534.548.pdf Information Theory, Inference, and Learning Algorithms]. Cambridge University Press.
 
=== Unsupervised/semisupervised learning of linguistic structure ===
 
;Feb 23 (Olivia Buzek)
:Sharon Goldwater, Thomas L. Griffiths, Mark Johnson (2009).  [http://homepages.inf.ed.ac.uk/sgwater/papers/cognition-hdp.pdf A Bayesian framework for word segmentation: Exploring the effects of context]. Cognition 112 (1), pp. 21--54.
 
;Feb 16 (Adam Teichert)
:Tahira Naseem, Harr Chen, Regina Barzilay, and Mark Johnson (2010). [http://www.aclweb.org/anthology/D/D10/D10-1120.pdf Using Universal Linguistic Knowledge to Guide Grammar Induction], EMNLP.
 
;Feb 9 (Jason Smith)
:Joao V. Graca, Kuzman Ganchev, and Ben Taskar (2007). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.4545&rep=rep1&type=pdf Expectation Maximization and Posterior Constraints]. In Advances in Neural Information Processing Systems, Vol. 20.
 
::''A longer treatment is Ganchev et al. (2010), [http://jmlr.csail.mit.edu/papers/volume11/ganchev10a/ganchev10a.pdf Posterior Regularization for Structured Latent Variable Models], JMLR.''
::''An application to unsupervised dependency parsing is Gillenwater et al. (2011), [http://jmlr.csail.mit.edu/papers/volume11/ganchev10a/ganchev10a.pdf Posterior Sparsity in Unsupervised Dependency Parsing], JMLR.''
 
== Fall 2011 ==
 
=== Knowledge representation and reasoning ===
 
;Dec 1 (Meher Vijay Yeleti)
:D. Koller, A. Levy, and A. Pfeffer (1997). [http://ai.stanford.edu/~koller/Papers/Koller+al:AAAI97b.pdf P-Classic: A Tractable Probabilistic Description Logic]. AAAI.
 
;Nov 17 (Ves Stoyanov)
:Franz Baader and Werner Nutt (2002). [http://www.inf.unibz.it/~franconi/dl/course/dlhb/dlhb-02.pdf Basic Description Logics]. In the Description Logic Handbook.
 
;Nov 10 (Nick Andrews)
:Nir Friedman et al. (1999). [http://www.eecs.harvard.edu/~avi/Papers/lprm.ps Learning Probabilistic Relational Models]. IJCAI.
 
;Nov 3 (Matt Gormley)
:Hector J. Levesque (1986). [http://www.annualreviews.org/doi/pdf/10.1146/annurev.cs.01.060186.001351 Knowledge Representation and Reasoning]. Annual Review of Computer Science, Vol. 1: 255-287.
 
=== Music modeling ===
 
;Oct 27 (Adam Teichert)
:Jean-François Paiement, Yves Grandvalet & Samy Bengio (2009). [http://dx.doi.org/10.1080/09540090902733806 Predictive models for music]. Connection Science 21(2-3):253-272.
 
;Oct 20 (Nathaniel Filardo)
:David Temperley (2010).  [http://theory.esm.rochester.edu/temperley/papers/temperley-mp10.pdf Modeling Common-Practice Rhythm].  Music Perception 27(5):355-376.
 
;Oct 13 (Michael Paul)
:Gerhard Nierhaus (2008). "Genetic Algorithms in Algorithmic Composition".  [http://www.springerlink.com.proxy3.library.jhu.edu/content/978-3-211-75539-6 Algorithmic Composition: Paradigms of Automated Music Generation], Chapter 7.4, pp. 157-186.
 
;Oct 6 (Frank Ferraro)
:Fred Lerdahl and Ray Jackendoff (1983). "[http://www.jstor.org/stable/pdfplus/40285257.pdf An Overview of Hierarchical Structure in Music]." ''Music Perception: An Interdisciplinary Journal''. Vol. 1, No. 2, Hierarchical Structure in Music (Winter 1983/1984), pp. 229-252.
 
;Resources
*[http://www.musictheory.net/lessons http://www.musictheory.net/lessons] provides a sequence of interactive music theory lessons.
*A virtual keyboard: [http://www.bgfl.org/bgfl/custom/resources_ftp/client_ftp/ks2/music/piano/ http://www.bgfl.org/bgfl/custom/resources_ftp/client_ftp/ks2/music/piano/].
 
=== ML in information retrieval ===
 
;Sep 29 (Olivia Buzek)
:Shuang-Hong Yang, Bo Long, Alexander J. Smola, Hongyuan Zha, and Zhaohui Zheng (2011).  [http://dl.acm.org/citation.cfm?id=2009959 Collaborative competitive filtering: learning recommender using context of user choice].  SIGIR.
 
;Sep 22 (Tim Vieira)
:Brian McFee and Gert Lanckriet (2010).  [http://cosmal.ucsd.edu/~gert/papers/mlr.pdf Metric Learning to Rank].  ICML.
 
;Sep 15 (Adam Teichert)
:P. Carpena, P. Bernaola-Galvan, M. Hackenberg, A.V. Coronado, and J. L. Oliver (2009).  [http://pre.aps.org/abstract/PRE/v79/i3/e035102 Level statistics of words: Finding keywords in literary texts and symbolic sequences].  Physical Review.
:Rada Mihalcea, Courtney Corley, and Carlo Strapparava (2006).  [https://www.aaai.org/Papers/AAAI/2006/AAAI06-123.pdf Corpus-based and Knowledge-based Measures of Text Semantic Similarity].  AAAI.
 
;Sep 8 (Travis Wolfe)
:Dafna Shahaf, Carlos Guestrin (2010).  [http://www.cs.cmu.edu/%7Edshahaf/kdd2010-shahaf-guestrin.pdf Connecting the dots between news articles].  Proc. of KDD.
 
== Summer 2011 ==
 
=== Summer conference papers ===
 
;Aug 16 (Matt Gormley)
:Taylor Berg-Kirkpatrick, Dan Klein (2011).  [http://aclweb.org/anthology-new/D/D11/D11-1029.pdf Simple Effective Decipherment via Combinatorial Optimization].  Proc. of EMNLP.
 
;Jul 19 (Matt Gormley)
:Alexander M. Rush and Michael Collins (2011).  [http://www.aclweb.org/anthology/P/P11/P11-1008.pdf Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation].  Proc. of ACL. [http://people.csail.mit.edu/srush/trans_relax.pdf Slides].
 
;Jul 12 (Wes Filardo)
:Daniel Gildea (2010).  [http://www.cs.rochester.edu/%7Egildea/pubs/gildea-naacl10.pdf Optimal Parsing Strategies for Linear Context-Free Rewriting Systems]. Proc. of NAACL.  [http://www.cs.rochester.edu/%7Egildea/pubs/gildea-naacl10-slides.pdf Slides].
 
;Jun 14 (Xiaoxu Kang)
:Limin Yao, Sebastian Riedel, and Andrew McCallum (2010).  [http://www.aclweb.org/anthology-new/D/D10/D10-1099.pdf Collective Cross-Document Relation Extraction Without Labelled Data].  Proc. of EMNLP.
 
;Jun 7 (Nicholas Andrews)
:Harr Chen, Edward Benson, Tahira Naseem, and Regina Barzilay (2011).  [http://people.csail.mit.edu/eob/papers/acl2011-relation-discovery.pdf In-Domain Relation Discovery with Meta-Constraints via Posterior Regularization].  Proc. of ACL.
 
== Spring 2011 ==
 
=== Combinatorial optimization ===
 
;May 5 (Wes Filardo)
:Daniel J. Lehmann (1977). [http://dx.doi.org/10.1016/0304-3975(77)90056-1 Algebraic structures for transitive closure].  ''Theoretical Computer Science'' 4(1):59-76.
::''Also consider Tarjan ([http://doi.acm.org/10.1145/322261.322272 1981a], [http://doi.acm.org/10.1145/322261.322273 1981b]).''
 
;Apr 28 (Jason Smith)
:R. McDonald, F. Pereira, K. Ribarov, and J. Hajic (2005). [http://www.seas.upenn.edu/~strctlrn/bib/PDF/nonprojectiveHLT-EMNLP2005.pdf Non-projective dependency parsing using spanning tree algorithms]. In Proc. HLT/EMNLP, pages 523–530
 
;Apr 21 (Byung Gyu Ahn)
:David Sontag, Amir Globerson, Tommi Jaakkola (2010). [http://people.csail.mit.edu/dsontag/papers/SonGloJaa_optbook.pdf Introduction to Dual Decomposition for Inference]. To appear in Optimization for Machine Learning, editors S. Sra, S. Nowozin, and S. J. Wright: MIT Press, 2010.
 
;Apr 14 (Adam Teichert)
:Jack Edmunds (1965). [http://www.math.ca/cjm/v17/cjm1965v17.0449-0467.pdf Paths, Trees, and Flowers]. Canadian Journal of Mathematics 17: 449--467.
 
=== Game-theoretic approaches to discourse pragmatics and to language evolution ===
 
;Apr 7 (Michael Paul)
:Paul Vogt (2005). [http://www.cs.utexas.edu/~kuipers/readings/Vogt-aij-05.pdf The emergence of compositional structures in perceptually grounded language games]. Artificial Intelligence 167(1-2): 206-242.
 
;Mar 31 (Rachael Richardson)
:David Golland, Percy Liang, Dan Klein (2010). [http://www.eecs.berkeley.edu/~dsg/papers/pragmatics-emnlp2010.pdf A Game-Theoretic Approach to Generating Spatial Descriptions]. EMNLP 2010.
 
;Mar 17 (Xuchen Yao)
:Gerhard Jäger (2008). [http://ibe.eller.arizona.edu/docs/2008/blume/jaeger-semantics.pdf Game theory in semantics and pragmatics].  Unpublished manuscript.
::''Note: This looks quite different from the [http://www2.sfs.uni-tuebingen.de/jaeger/publications/hskArticleGJ-2011.pdf 2011 manuscript] that has the same title and author.
 
;March 10 (Luke Orland)
:Gerhard Jäger (2008). [http://www2.sfs.uni-tuebingen.de/jaeger/publications/compass.pdf Applications of Game Theory in Linguistics]
 
=== Variational inference ===
 
;March 3 (Nicholas Andrews)
:Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). [http://www.cs.berkeley.edu/~pliang/papers/hdppcfg-emnlp2007.pdf The infinite PCFG using hierarchical Dirichlet processes]. EMNLP.
 
;Feb 24 (Nathaniel Filardo)
:Matthew Beal (2003).  [http://www.cse.buffalo.edu/faculty/mbeal/thesis/beal03_3.pdf Variational Bayesian Hidden Markov Models].  Appears as Chapter 3 of <i>Variational Algorithms for Approximate Bayesian Inference</i>, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London.
 
:David MacKay (1997). [http://www.inference.phy.cam.ac.uk/mackay/ensemblePaper.pdf Ensemble Learning for Hidden Markov Models].  Unpublished technical report, Cavendish Laboratory, University of Cambridge.
:Slides from Mark Johnson (2007). [http://www.cog.brown.edu/~mj/papers/emnlp07-slides.pdf Why doesn't EM find good HMM POS-taggers?].  EMNLP.
 
;Feb 17 (Adam Teichert)
:David M. Blei, Andrew Y. Ng, and Michael I. Jordan (2003). [http://jmlr.csail.mit.edu/papers/volume3/blei03a/blei03a.pdf Latent Dirichlet Allocation]. Journal of Machine Learning.
 
;Feb 10 (Matt Gormley)
:Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul (1999). [http://www.cs.berkeley.edu/~jordan/papers/variational-intro.pdf An introduction to variational methods for graphical models]. Machine Learning.
::''To get an intuition first, start with Jason's [http://cs.jhu.edu/~jason/tutorials/variational high-level explanation] of variational inference.  For another reference, try the ACL 2007 [http://www.cs.berkeley.edu/~pliang/papers/tutorial-acl2007.pdf tutorial slides] by Percy Liang and Dan Klein.''
 
== Fall 2010 ==


Topics:
=== Unsupervised discriminative learning ===
* Domain adaptation
* Recent parsing work
* Text compression
* Semisupervised learning


;Dec 9 (Adam Teichert)
:Continue with last week's reading: chapter 3.


{| style="width:800px" border="1"
;Dec 2 (Wes Filardo)
!  width="10%"|Date/Time
:Continue with last week's reading: finish chapter 2.
!  width="10%"|Presenter
!  width="40%"|Paper(s)  
!  Supporting Papers/Notes


|-
;Nov 18 (Jason Smith)
|Sep.26
:Csaba Szepesvári, [http://www.morganclaypool.com/doi/pdf/10.2200/S00268ED1V01Y201005AIM009 Algorithms for Reinforcement Learning]. This week we'll read the preface, chapter 1, and the first section of chapter 2. If you're trying to access this outside of JHU, try [http://proxy.library.jhu.edu/login?url=http://www.morganclaypool.com/doi/pdf/10.2200/S00268ED1V01Y201005AIM009 this link].
|Omar F Zaidan
|J. Blitzer, R. McDonald, F. Pereira


[http://www.cis.upenn.edu/~blitzer/papers/emnlp06.pdf Domain Adaptation with Structural Correspondence Learning]
;Nov 11 (Ves Stoyanov)
:Yves Grandvalet and Yoshua Bengio, [http://www.iro.umontreal.ca/~lisa/pointeurs/entropy_regularization_2006.pdf Entropy Regularization], in: Semi-Supervised Learning, pages 151--168, MIT Press, 2006


EMNLP 2006
;Nov 4 (Michael Paul)
:Baoxun Wang, Xiaolong Wang, Chengjie Sun, Bingquan Liu, Lin Sun (2010). [http://www.aclweb.org/anthology/P/P10/P10-1125.pdf Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities].


|-
;Oct 28 (Adam Teichert)
|Oct.3
:Noah Smith and Jason Eisner (2005). [http://www.cs.jhu.edu/~jason/papers/index.html#gia05 Guiding Unsupervised Grammar Induction Using Contrastive Estimation].
|David Smith
|Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira.


[http://www.cis.upenn.edu/~blitzer/papers/nips06.pdf Analysis of Representations for Domain Adaptation.]
=== Semantic parsing ===


|-
;Oct 21 (Svitlana Volkova)
|Oct. 10
:Mihai Surdeanu, Richard Johansson, Adam Meyers, Llu ́ıs Ma`rquez, Joakim Nivre (2008). [http://www.surdeanu.name/mihai/papers/conll08.pdf The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies]. [http://www.surdeanu.name/mihai/papers/conll08_slides.pdf Slides].
|Nathaniel W Filardo
|Mahoney, Matthew


[http://www.cs.fit.edu/~mmahoney/compression/cs200516.pdf Adaptive Weighing of Context Models for Lossless Data Compression.]
;Oct 14 (Xuchen Yao)
: Wei Lu ,  Hwee Tou Ng ,  Wee Sun Lee ,  Luke S. Zettlemoyer (2008). [http://www.cs.washington.edu/homes/lsz/papers/lnlz-emnlp08.pdf A Generative Model for Parsing Natural Language to Meaning]. EMNLP. [http://wing.comp.nus.edu.sg/~luwei/publications/emnlp08.ppt Slides].


Florida Institue of Technology, CS Department, Technical report CS-2005-16
;Oct 7 (Matt Gormley)
:Dipanjan Das, Nathan Schneider, Desai Chen and Noah A. Smith (2010). [http://www.cs.cmu.edu/~dipanjan/Papers_files/framenet_submitted.pdf Probabilistic Frame-Semantic Parsing]. NAACL.


EMNLP-CoNLL 2007
;Sep 30 (Nicholas Andrews)
:Luke S. Zettlemoyer and Michael Collins (2009). [http://www.cs.washington.edu/homes/lsz/papers/zc-acl09.pdf Learning Context-Dependent Mappings from Sentences to Logical Form]. ACL.


|-
=== Graph-based methods and random walks ===
|Oct. 17
|Markus Dreyer
|Nakagawa, Tetsuji


[http://www.aclweb.org/anthology/D/D07/D07-1100  Multilingual Dependency Parsing Using Global Features]
;Sep 23 (Adam Teichert)
:Jie Cai and Michael Strube (2010). [http://aclweb.org/anthology-new/C/C10/C10-1017.pdf End-to-End Coreference Resolution via Hypergraph Partitioning]. ACL.


EMNLP-CoNLL 2007
;Sep 16 (Delip Rao)
:Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). [http://www.nowpublishers.com/getpdf.aspx?doi=2200000005&product=MAL A Survey of Statistical Network Models]. Foundation and Trends in Machine Learning 2, 2 (Feb.), 129-233.


|-
;Sep 9 (Svitlana Volkova)
|Oct. 26
:Einat Minkov and William W. Cohen (2008).  [http://www.cs.cmu.edu/~einat/emnlp-08.pdf Learning Graph Walk Based Similarity Measures for Parsed Text].  EMNLP.
|Christo Kirov
|Seginer, Yoav


[http://acl.ldc.upenn.edu/P/P07/P07-1049.pdf  Fast Unsupervised Incremental Parsing (syntax induction)]
== Summer 2010 ==


Proceedings ACL 2007
=== Summer conference papers ===


;Aug 12 (Jason Smith)
:Alexander Clark (2010). [http://www.aclweb.org/anthology/W10-2904 Efficient, Correct, Unsupervised Learning for Context-Sensitive Languages]. CoNLL.


|-
;Aug 5 (Veselin Stoyanov)
|Nov. 3
:Hoifung Poon and Pedro Domingos (2010). [http://aclweb.org/anthology-new/P/P10/P10-1031.pdf Unsupervised Ontology Induction from Text]. ACL.
|Christo Kirov
|I. Titov, J. Henderson


[http://www.aclweb.org/anthology-new/P/P07/P07-1080.pdf  Constituent Parsing with Incremental Sigmoid Belief Networks]
;Jul 20
:General discussion of ACL 2010 papers.


ACL 2007
;Jul 15 (Nicholas Andrews)
:Shay B. Cohen, David M. Blei and Noah A. Smith (2010). [http://www.cs.cmu.edu/~scohen/naacl10variadaptor.pdf Variational Inference for Adaptor Grammars].  NAACL.


|-
;Jul 6 (Veselin Stoyanov)
|Nov. 17
:D. Chiang, J. Graehl, K. Knight, A. Pauls, and S. Ravi (2010). [http://www.isi.edu/natural-language/mt/naacl2010_bayes-fst.pdf Bayesian Inference for Finite-State Transducers]. NAACL.
|David Smith
| X. Zhu


[http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf Semi-Supervised Learning Literature Survey]
;Jun 29 (Matt Gormley)
:Percy Liang, Michael I. Jordan, and Dan Klein (2010). [http://aclweb.org/anthology-new/N/N10/N10-1082.pdf Type-Based MCMC].  NAACL. [http://www.cs.berkeley.edu/~pliang/papers/type-naacl2010-talk.pdf Slides].


|-
;Jun 22 (Spence Green)
|Dec. 12
: David Burkett, John Blitzer, and Dan Klein (2010). [http://www.eecs.berkeley.edu/~dburkett/papers/burkett10-parse_and_align.pdf Joint Parsing and Alignment with Weakly Synchronized Grammars]. NAACL.  [http://www.eecs.berkeley.edu/~dburkett/slides/burkett10-parse_and_align-slides.pdf Slides].
|Delip Rao
|M. Belkin, P. Niyogi


[http://citeseer.ist.psu.edu/632472.html Laplacian Eigenmaps for Dimensionality Reduction and Data Representation]
::''Relevant background:''
::* David A. Smith and Jason Eisner (2009).  [http://cs.jhu.edu/~jason/papers/index.html#emnlp09-qg Parser adaptation and projection with quasi-synchronous grammar features]. EMNLP.
::* David A. Smith and Jason Eisner (2008).  [http://cs.jhu.edu/~jason/papers/index.html#emnlp08-bp Dependency parsing by belief propagation]. EMNLP.
::* David Burkett and Dan Klein (2008). [http://www.eecs.berkeley.edu/~dburkett/papers/burkett08-joint_parsing.pdf Two Languages are Better than one (for Syntactic Parsing)]. EMNLP.


ACM 2002
;Jun 17 (Ves Stoyanov)
:Aria Haghighi and Dan Klein (2010). [http://www.cs.berkeley.edu/~aria42/pubs/naacl2010-coref2.pdf Coreference Resolution in a Modular, Entity-Centered Model].  NAACL. 


----
;Jun 10
:General discussion of NAACL 2010 papers.


Mikhail Belkin, Partha Niyogi, Vikas Sindhwani
== Spring 2010 ==


[http://people.cs.uchicago.edu/~vikass/aistats.pdf On Manifold Regularization]
=== Visual scene parsing ===


|}
;May 6 (Rizwan Chaudhry)
:S. Fidler, M. Boben, A. Leonardis (2009).  [http://vicos.fri.uni-lj.si/data/alesl/chapterLeonardis.pdf Learning Hierarchical Compositional  Representations of Object Structure].  In Sven J. Dickinson, Alés Leonardis, and Bernt Schiele (eds.), ''[http://books.google.com/books?id=FnWaqm_AzTQC Object Categorization: Computer and Human Vision Perspectives]''.


==  Summer 2007 ==
::''See also the talk that Geoff Hinton gave here last week, [http://www.clsp.jhu.edu/news-events/abstract.php?sid=20100427 Deep learning with multiplicative interactions].


Topics:
;April 22 (Zach Pezzementi) and April 29 (Balakrishnan V)
* Good recent papers (mainly from 2007)
:Song-Chun Zhu and David Mumford (2006).  [http://www.stat.ucla.edu/~sczhu/papers/Grammar_quest.pdf A Stochastic Grammar of Images].  Foundations and Trends in Computer Graphics and Vision, 2(4):259-362.  [http://www.stat.ucla.edu/~sczhu/JHU_grammar_slides.ppt Slides].
::''[http://www.stat.ucla.edu/~sczhu/papers/Reprint_Grammar.pdf Official final version] is good for screen reading but wastes paper.'' 


;April 15 (Nick Andrews)
:Sven Dickinson (2009). [http://www.cs.toronto.edu/~sven/Papers/cat2009.pdf The Evolution of Object Categorization and the Challenge of Image Abstraction].  In Sven J. Dickinson, Alés Leonardis, and Bernt Schiele (eds.), ''[http://books.google.com/books?id=FnWaqm_AzTQC Object Categorization: Computer and Human Vision Perspectives]''.


{| style="width:800px" border="1"
=== Generalized A* and related coarse-to-fine ideas ===
!  width="10%"|Date/Time
!  width="10%"|Presenter
!  width="40%"|Paper(s)
!  Supporting Papers/Notes
|-
|May 10
|David Smith
|M. Johnson, T. Griffiths, and S. Goldwater


[http://acl.ldc.upenn.edu/N/N07/N07-1018.pdf Bayesian Inference for PCFGs via Markov Chain Monte Carlo]
;April 8 (Matt Gormley)
:André F. T. Martins, Noah A. Smith, and Eric P. Xing (2009). [http://www.cs.cmu.edu/~afm/Home_files/acl2009.pdf Concise Integer Linear Programming Formulations for Dependency Parsing]. ACL-IJCNLP.


HLT/NAACL 2007
;April 1 (Adam Gerber)
:Aria Haghighi, John DeNero, and Dan Klein (2007). [http://www.eecs.berkeley.edu/~aria42/pubs/factor-astar-naacl07.pdf Approximate Factoring for A* Search].  HTL-NAACL 2007. [http://www.eecs.berkeley.edu/~aria42/presentations/naacl07-factor-astar.ppt Slides].


|-
;March 25 (Zhifei Li)
|May 17
:Mark Hopkins and Greg Langmead (2009).  [http://www.aclweb.org/anthology/D/D09/D09-1007.pdf Cube pruning as heuristic search]. EMNLP 2009.
|Markus Dreyer
|M. Galley, K. McKeown


[http://acl.ldc.upenn.edu/N/N07/N07-1023.pdf Lexicalized Markov Grammars for Sentence Compression]
;March 11 (Jason Smith)
:Adam Pauls and Dan Klein (2009).  [http://www.cs.berkeley.edu/~adpauls/PAPERS/acl2009.pdf K-Best A* Parsing].  ACL.  [http://www.cs.berkeley.edu/~adpauls/PAPERS/acl2009-final-slides.pdf Slides].


HLT/NAACL 2007
;March 4 (Nathaniel Filardo)
:Pedro Felzenswalb and David McAllester (2007).  [http://nagoya.uchicago.edu/~dmcallester/astar.pdf The Generalized A* Architecture].  ''Journal of Artificial Intelligence Research''.  Slides from [http://people.cs.uchicago.edu/~pff/talks/astar-slides.pdf].


=== Weakly supervised learning of semantics ===


|-
: ''There's also a nice list of papers at the UT reading group on [http://www.cs.utexas.edu/users/ml/clamp/ Connecting Language Acquisition with Machine Perception].''
|June 2
|Erin Fitzgerald
|J. Jiang, C. Zhai


[http://acl.ldc.upenn.edu/N/N07/N07-1015.pdf A Systematic Exploration of the Feature Space for Relation Extraction]
; Feb 25 (Nick Andrews)
: Luke Zettlemoyer and Michael Collins (2005). [http://people.csail.mit.edu/lsz/papers/zc-uai05.pdf Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars]. In Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI-05).
<!-- <strike>Benjamin Snyder and Regina Barzilay (2007)</strike>. [http://people.csail.mit.edu/bsnyder/papers/db_text-snyder.pdf Database-Text Alignment via Structured Multilabel Classification]. IJCAI. [http://people.csail.mit.edu/bsnyder/presentations/ijcai07.pdf Slides]. -->


HLT/NAACL 2007
; <strike>Feb 11</strike> <font color="red">Feb 18</font> (Ves Stoyanov)
:S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay (2009). [http://www.cs.washington.edu/homes/lsz/papers/bczb-acl2009.pdf Reinforcement Learning for Mapping Instructions to Actions].  ACL-IJCNLP.


|-
; Feb 4 (Rachael Richardson)
|June 6
:Percy Liang, Michael I. Jordan, and Dan Klein (2009).  [http://www.eecs.berkeley.edu/~pliang/papers/semantics-acl2009.pdf Learning Semantic Correspondences with Less Supervision].  ACL-IJCNLP.
|Nikesh Garera
|A. Alexandrescu, K. Kirchhoff


[http://acl.ldc.upenn.edu/N/N07/N07-1026.pdf Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP]
== Fall 2009 ==


HLT/NAACL 2007
<!-- === Best papers from neighboring fields === -->
=== Bayesian methods ===


|-
; Jan 21 (Zhifei Li)
|June 14
: Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). [http://www.aclweb.org/anthology-new/D/D07/D07-1072.pdf The infinite PCFG using hierarchical Dirichlet processes.].  EMNLP 2007.
|David Smith
|X. Zhu, Z. Ghahramani,J. Lafferty


[http://acl.ldc.upenn.edu/N/N07/N07-1026.pdf Semi-supervised learning using Gaussian fields and harmonic functions.]
; Jan 14 (Jason Smith)
: Matthew J. Beal, Zoubin Ghahramani, and Carl Edward Rasmussen (2002).  [http://books.nips.cc/papers/files/nips14/AA01.pdf The Infinite Hidden Markov Model.] NIPS.
::''Also discussed in section 7 of last week's paper.''


ICML 2003
; Jan 7 (Jason Eisner)
: Long lecture on the Dirichlet process (infinite) mixture model. 
:: Reading: Yee Whye Teh, Michael Jordan, Matthew Beal and David Blei (2005), [http://www.cse.buffalo.edu/faculty/mbeal/papers/hdp.pdf Hierarchical Dirichlet Processes].
:: There's also a stack of relevant slides from Jordan's 2005 [http://www.cs.berkeley.edu/%7Ejordan/nips-tutorial05.ps NIPS tutorial].


|-
; Dec 3 (Jason Smith)
|June 21
: Sharon Goldwater and Thomas L. Griffiths (2007), [http://acl.ldc.upenn.edu/P/P07/P07-1094.pdf A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging].  ACL.
|Christopher White
::''This paper uses a Gibbs sampler.  See also the following papers, which compare Gibbs sampling with Variational Bayes and other methods for the same problem:
|K. Murphy, Y. Weiss, M. Jordan
::*Mark Johnson (2007), [http://acl.ldc.upenn.edu/D/D07/D07-1031.pdf Why doesn’t EM find good HMM POS-taggers?].  EMNLP.
::*Jianfeng Gao and Mark Johnson (2008), [http://aclweb.org/anthology-new/D/D08/D08-1036.pdf A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers]. EMNLP.


Propagation for approximate inference: An empirical study.
; Nov 26 (Mechanical Turkey)
: Mary McGlohon (2007), [http://www.cs.cmu.edu/~mmcgloho/pubs/sigbovik2.pdf Fried Chicken Bucket Processes].  SIGBOVIK.


15th UAI, pages 467-?75, 1999
; Nov 19 (Jason Eisner)
|... discussing (loopy) belief propagation as background for survey propagation, a topic which has been getting more attention lately for its ability to "solve very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas.
: Lecture on Gibbs sampling and variational Bayes for LDA and its finite-state generalizations.


Chapter 8 of Chris Bishop's textbook is supposed to be a good treatment of graphical models overall.  It is available free here [http://research.microsoft.com/%7Ecmbishop/PRML/Bishop-PRML-sample.pdf].  He covers BP in section 8.4.4 after first presenting factor graphs in 8.4.3.
; Nov 12 (Jason Eisner)
: Yee Whye Teh (2009), [http://videolectures.net/mlss09uk_teh_nbm/ Nonparametric Bayesian Models].  Video tutorial at Machine Learning Summer School.


David MacKay's treatment of BP, also in terms of factor graphs, is in chapter 26 of his book [http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html].  It's worth reading this chapter in full, perhaps first reading chapter 16.  ... the update equations are given as (26.11) and (26.12) ... [substantial further discussion by jason was here]
; Nov 5 (Zhifei Li)
: David M. Blei, Andrew Y. Ng, & Michael I. Jordan (2003). [http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf Latent Dirichlet Allocation].  Journal of Machine Learning Research 3 (2003) 993-1022.


Some people may prefer Bishop's style, others MacKay's.
=== Inference methods ===
|-
|July 6
|Christopher White
|A. Braunstein, M. Mezard, R. Zecchina.


[http://users.ictp.it/~zecchina/rsa.pdf Survey propagation: an algorithm for satisfiability.]
;Oct 29 (Markus Dreyer)
:Koller & Friedman, Chapter 11, Optimization as Inference


Random Structures and Algorithms, 2005.
;Oct 22 (Puyang Xu)
:Koller & Friedman, Chapter 12: Particle-Based Methods


|We sent some questions to Zecchina.
;Oct 15 (Ariya Rastrow)
:Koller & Friedman, Chapters 3 & 4


Lukas Kroc, Ashish Sabharwal and Bart Selman.
;Oct 1 (Anoop Deoras), Oct 8 (Carolina Parada)
:MacKay (2003), [http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/356.384.pdf Monte Carlo Methods] and [http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/387.412.pdf Efficient Monte Carlo Methods].  Chapters 29-30 of [http://www.inference.phy.cam.ac.uk/mackay/itila/ Information Theory, Inference, and Learning Algorithms].


Survey Propagation Revisited: An Empirical Study.
===Multilingual/ Cross-lingual learning ===


23rd UAI, 2007.
;Sep 24 (Omar F. Zaidan)
:David Burkett and Dan Klein, (2008). [http://www.aclweb.org/anthology/D/D08/D08-1092.pdf Two Languages are Better than One (for Syntactic Parsing)]. EMNLP, 2008.


|-
;Sep 17 (Rachael Richardson)
|July 18
:Alexander Fraser, Renjing Wang, and Hinrich Schütze (2009).  [http://www.aclweb.org/anthology/E/E09/E09-1033.pdf Rich Bitext Projection Features for Parse Reranking]. EACL 2009.
|David Smith
|P. Liang, S. Petrov, M. Jordan, D. Klein


[http://acl.ldc.upenn.edu/D/D07/D07-1072.pdf The Infinite PCFG Using Hierarchical Dirichlet Processes.]
;Sep 10 (Delip Rao)
:Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay (2009).  [http://www.aclweb.org/anthology/N/N09/N09-1010.pdf Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach].  NAACL 2009.


Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,
:Summary: There are several approaches to learning syntax in an unsupervised fashion but this paper belongs to the growing notion of exploiting multiple languages to reduce ambiguity in the learning task. The most important take-home message from the paper is, it is possible to consistently reduce the gap between supervised and unsupervised learning by progressively adding more languages to the mix. This is akin to the multi-view learning results in machine learning literature. An earlier work by the same authors (EMNLP'08) showed how by carefully selecting pairs of languages in multilingual learning one can achieve better accuracies. The current paper builds on that result and shows that it is not really necessary to hand-pick the bilingual pairs; robust performance is guaranteed by blindly adding more languages.
:Well, not so blindly. Adding more languages to the setup means estimating more parameters in the model. Without careful implementation, such a model can become intractable. Section 3 explains in detail about the generative setup and the inference procedure. Starting with Goldwater's monolingual HMM tagging like setup for each language, the HMMs are stitched together using alignment links and latent variables called "superlingual tags" leading to a product of experts model. The superlingual tags can be considered as tags that generate similar kind of syntactic entities in each of the languages. The inference procedure as with any non-trivial npbayes setup involves computing integrals that don't have a closed form solution. Monte Carlo sampling is a standard approach to solve such problems. Gibbs sampling is one such method. The details of the sampling process is in sections 3.5-3.7. This part is a bit technical and will be discussed either tomorrow and/or the sessions on non-parametric bayesian methods. There are other methods one could use, like variational methods and expectation propagation instead.


|-
== Summer 2009 ==
|Aug. 3
|Yi Su
|M. Galley, K. McKeown


[http://acl.ldc.upenn.edu/N/N07/N07-1023.pdf  Lexicalized Markov Grammars for Sentence Compression.]
=== Summer conference papers ===


NAACL-HLT 2007
; July 23 (Zhifei Li)
:Joris Mooij and Bert Kappen, (2008). [http://www.kyb.mpg.de/publications/attachments/NIPS2008-Mooji_5407%5B0%5D.pdf Bounds on marginal probability distributions]. NIPS, 2008.


|-
; July 16 (Markus Dreyer)
|Aug. 11
:Fabien Cromierès, Sadao Kurohashi (2009). [http://www.aclweb.org/anthology/E/E09/E09-1020.pdf An Alignment Algorithm Using Belief Propagation and a Structure-Based Distortion Model]. EACL 2009.
|Nikesh Garera
|L. Shen, G. Satta, A. Joshi.


[http://acl.ldc.upenn.edu/P/P07/P07-1096.pdf   Guided learning for bidirectional sequence classification]
;June 25 (Markus Dreyer)
:Hoifung Poon, Colin Cherry, Kristina Toutanova (2009). [http://aclweb.org/anthology-new/N/N09/N09-1024.pdf Unsupervised Morphological Segmentation with Log-Linear Models]. NAACL 2009.


ACL 2007
;June 19 (Zhifei Li)
:David Chiang, Wei Wang and Kevin Knight, (2009).  [http://www.isi.edu/~chiang/papers/11001.pdf 11,001 new features for statistical machine translation].  NAACL 2009.


|-
== Spring 2009 ==
|Aug. 18
|Markus Dreyer
|D. Talbot, M. Osborne


[http://acl.ldc.upenn.edu/P/P07/P07-1065.pdf  Randomised Language Modelling for Statistical Machine Translation]
=== Information extraction (relevant to TAC) ===


ACL 2007
;Apr 30 (Chuan Liu)
:Jun Wang (2009).  [http://ict.ewi.tudelft.nl/index.php?option=com_pub&task=view&id=2563 Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval].  European Conference on Information Retrieval.


| They use a space-efficient randomized data structure (Bloom Filter) to store very large n-gram models.
;Apr 23 (Wes Filardo)
:Jun Zhu, Zaiqing Nie, Xiaojing Liu Bo Zhang, Ji-Rong Wen (2009).  [http://www.www2009.org/proceedings/pdf/p101.pdf StatSnowball: a Statistical Approach to Extracting Entity Relationships].  WWW 2009.


There is a companion paper that people might want to have a quick look at as well, for comparison:
;Apr 16 (Carolina Parada)
:Julien Ah-Pine, Guillaume Jacquet (2009). [http://newdesign.aclweb.org/anthology-new/E/E09/E09-1007.pdf Clique-Based Clustering for improving Named Entity Recognition systems]. EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30 - April 3, 2009


D. Talbot, M. Osborne
;Apr 9 (Jason Smith)
:Marius Pasca (2009). [http://aclweb.org/anthology-new/E/E09/E09-1073.pdf Outclassing Wikipedia in Open-Domain Information Extraction: Weakly-Supervised Acquisition of Attributes over Conceptual Hierarchies]. EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30 - April 3, 2009


[http://acl.ldc.upenn.edu/D/D07/D07-1049.pdf Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap]
=== Domain adaptation across text genres ===


ACL 2007
;Apr 2 (Arnab Ghoshal)
:Corinna Cortes, Mehryar Mohri, Michael Riley, Afshin Rostamizadeh. [http://www.cs.nyu.edu/~mohri/postscript/bias.pdf Sample Selection Bias Correction Theory]. In Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008).


|-
;Mar 26 (Ariya Rastrow)
|Aug. 30
:Yishay M, Mehryar M, Afshin R (2008). [http://www.cs.nyu.edu/~mohri/postscript/adap.pdf Domain Adaptation with Multiple Sources]. In Proceedings of Advances in Neural Information Processing Systems (NIPS)
|Delip Rao
|Gideon S. Mann


[http://imls.engr.oregonstate.edu/www/htdocs/proceedings/icml2007/papers/441.pdf   Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization]
:'' Optional Reading '' John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. [http://books.nips.cc/papers/files/nips20/NIPS2007_0628.pdf Learning Bounds for Domain Adaptation]. Neural Information Processing Systems - NIPS 2007


Proceedings of the 24 th International Conference on Machine Learning 2007
;Mar 12 (Delip Rao)
:Schweikert G, Widmer C, Scholkopf B, Ratsch G (2008) [http://www.kyb.mpg.de/publication.html?publ=5401 An empirical analysis of domain adaptation algorithms for genomic sequence analysis]. In Proceedings of Advances in Neural Information Processing Systems (NIPS)


:''Optional Reading:'' Marx Z, Rosenstein MT, Dietterich TG, Kaelbling LP (2008) [http://web.engr.oregonstate.edu/~tgd/publications/MarxRosensteinKaelblingDietterich-transfer-book.pdf Two algorithms for transfer learning]. In: Inductive Transfer: 10 years later


;Mar 5 (Omar F. Zaidan)
:Su-In Lee, Vassil Chatalbashev, David Vickrey, and Daphne Koller (2007).  [http://ai.stanford.edu/~silee/papers/suinlee_icml2007.pdf Learning a Meta-Level Prior for Feature Relevance from Multiple Related Tasks]. ICML 2007.


|}
=== Recent good papers ===


== Spring 2007 ==
;Feb 26 (Zhifei Li)
:John DeNero, Alex Bouchard, and Dan Klein (2008). [http://www.eecs.berkeley.edu/%7Edenero/research/papers/emnlp08_denero_sampling_alignment.pdf Sampling Alignment Structure under a Bayesian Translation Model]. EMNLP 2008.


Topics:
;Feb 19 (Jason Eisner)
* Morphology (unsupervised learning)
:Impromptu lecture on Dirichlet distributions, Dirichlet processes, etc.
* Recent IR/QA papers (with an NLP or multilingual focus)
* Integrating search and learning


;Feb 12 (Markus Dreyer)
:Tom Minka (2005).  [ftp://ftp.research.microsoft.com/pub/tr/TR-2005-173.pdf Divergence measures and message passing]. Microsoft Research Technical Report.  Slides: [http://research.microsoft.com/en-us/um/people/minka/papers/message-passing/minka-message-passing-slides.pdf pdf], [http://research.microsoft.com/en-us/um/people/minka/papers/message-passing/minka-message-passing.ppt ppt].


{| style="width:800px" border="1"
;Feb 5 (Delip Rao)
!  width="10%"|Date/Time
:David J. Hand (2006). [http://arxiv.org/pdf/math.ST/0606441 Classifier Technology and The Illusion of Progress]. Statistical Science.
!  width="10%"|Presenter
!  width="40%"|Paper(s)  
! Supporting Papers/Notes
|-
|Apr. 19
|John Blatz
|A. Prieditis


[http://www.cs.jhu.edu/~jblatz/nlp-reading-group/prieditis93.pdf  Machine discovery of Effective Admissible Heuristics ]
== Fall 2008 ==


Machine Learning Journal, 1993
=== Programming languages for AI ===


|-
;Dec 13-14
|Apr. 12
:NIPS workshop on probabilistic programming (see [http://probabilistic-programming.org probabilistic-programming.org]), which mentioned a number of other languages and libraries.  
|Markus Dreyer
|A. Haghighi, J. DeNero and D. Klein


[http://www.eecs.berkeley.edu/~aria42/pubs/factor-astar-naacl07.pdf  Approximate Factoring for A* Search]
;Dec 4 (Omar F. Zaidan)
:Jeff Bilmes (~2002).  [http://ssli.ee.washington.edu/~bilmes/gmtk/ The Graphical Models Toolkit (GMTK)].
::The above link includes a draft of the documentation and a tutorial, as well as the binaries.


NAACL-HLT 2007
;Nov 20 (Wren Thornton)
:Avi Pfeffer (2006).  [http://www.eecs.harvard.edu/~avi/IBAL/tutorial/tutorial.html IBAL Tutorial]. 
::Installed in <code>masters*:~wren/local/bin</code> (linux only, so not masters01 or masters02) and <code>clsp:~wren/local/bin</code>.  Add this directory to your <code>PATH</code>.
::See also [http://www.eecs.harvard.edu/~avi/IBAL/index.html other materials], including this paper: Avi Pfeffer (2007).  [http://www.eecs.harvard.edu/~avi/Papers/ibal-ch.pdf The design and implementation of IBAL: A general-purpose probabilistic language].  In Lise Getoor and Ben Taskar (eds.), [http://www.cs.umd.edu/srl-book/ Introduction to Statistical Relational Learning]. 


|-
;Nov 13 (Nathaniel Filardo)
|Mar. 29 & Apr. 5
:Marc Sumner and Pedro Domingos (2007).  [http://alchemy.cs.washington.edu/tutorial/tutorial.html The Alchemy Tutorial].  [http://www.cs.washington.edu/homes/pedrod/psrai.ppt Slides].
|Zhifei Li
::System is installed in <code>masters*:~nwf/public/alchemy</code>. There is a <code>tutorial</code> subdirectory. You should be able to follow along in the tutorial by running commands like
|H. Daume III, J. Langford, and D. Marcu
~nwf/public/alchemy/bin/infer \
    -i ~nwf/public/alchemy/tutorial/basics/uniform.mln \
    -e ~nwf/public/alchemy/tutorial/empty.db \
    -r uniform.results \
    -q Heads


[http://pub.hal3.name/daume06searn.pdf    Search-based structured prediction.]
=== Miscellaneous ===


Machine Learning Journal, forthcoming
;Oct 30, Nov 6
:Discussion of the EMNLP 2008 papers.


|-
;Oct 23 (Damianos Karakos)
|Mar. 8
:I. Csiszar and G. Tusnady (1984).  Information geometry and alternating minimization procedures. Statistics and Decisions, Suppl. Issue 1, pp. 205-237. 
|David Smith
::The paper is not online, but there are online [http://www.clsp.jhu.edu/~sanjeev/520.674/notes/MLFromIncompleteData.pdf course notes from Sanjeev Khudanpur].
|H. Daume III & D. Marcu


[http://pub.hal3.name/daume05laso.pdf    Learning as search optimization: approximate large margin methods for structured prediction.]
=== Probabilistic relational models ===


ICML 2005
;Oct 16 (Nathaniel Filardo)
:Pedro Domingos et al. (2008).  [http://www.cs.washington.edu/homes/pedrod/papers/pilp.pdf Markov Logic].  In L. De Raedt, P. Frasconi, K. Kersting and S. Muggleton (eds.), Probabilistic Inductive Logic Programming (pp. 92-117). New York: Springer.


|-
;Oct 1 (Balakrishnan Varadarajan?)
|Mar. 1
:Nir Friedman, Lise Getoor, Daphne Koller, and Avi Pfeffer (1999).  [http://ai.stanford.edu/people/nir/Papers/FGKP1.pdf Learning Probabilistic Relational Models].  In IJCAI.
|Wei Chen
::''A longer book chapter version is linked from [http://dags.stanford.edu/PRMs/ here], but the link is dead.''
|M. Kaisser, S. Scheible, and B. Webber


[http://trec.nist.gov/pubs/trec15/papers/udeinburgh.qa.final.pdf    Experiments at the University of Edinburgh for the TREC 2006 QA track.]
;Sep 25 (Zhifei Li)
:David Smith and Jason Eisner (2008). [http://cs.jhu.edu/~jason/papers/#emnlp08-bp Dependency Parsing by Belief Propagation]. In EMNLP.


TREC-15
=== Creative uses of classifiers in NLP ===


|They do some fairly deep interpretation of sentences, extracting their predicate-argument structure.
;Sep 18 (Markus Dreyer)
:D. Rosenberg, D. Klein and B. Taskar (2007). [http://www.seas.upenn.edu/~taskar/pubs/mop-memm.pdf Mixture-of-Parents Maximum Entropy Markov Models].  Uncertainty in Artificial Intelligence (UAI), Vancouver, BC, July.


|-
;Sep 11 (Nikesh Garera)
|Feb. 22
:Yoav Goldberg and Michael Elhadad (2007). [http://acl.ldc.upenn.edu/P/P07/P07-1029.pdf SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking.] In ACL 2007.
|Eric Harley
|K. Kan Lo & W. Lam


[http://trec.nist.gov/pubs/trec15/papers/cuhk.qa.final.pdf     Using Semantic Relations with World Knowledge for Question Answering]
:Libin Shen; Aravind K. Joshi (2003) [http://acl.ldc.upenn.edu/W/W03/W03-0402.pdf An SVM-based voting algorithm with application to parse reranking.] In HLT-NAACL 2003.


TREC-15
== Summer 2008 ==


|-
=== Good current papers ===
|Feb. 15
|Nikhil Bojja
|C. Monson et. al.


[http://acl.ldc.upenn.edu/acl2004/studentws/pdf/monson.pdf      Unsupervised Induction of Natural Language Morphology Inflection Classes]
;August 19 (Zhifei Li)
:Ahmad Emami and Frederick Jelinek (2006). [http://www.springerlink.com/content/q43m746n8p2173w7/fulltext.pdf A neural syntactic language model]. Journal of machine learning, volume 60, numbers 1-3, September, 2005.


ACL Student Workshop '04
;August 5 (Zhifei Li)
:Libin Shen, Jinxi Xu and Ralph Weischedel (2008). [http://www.aclweb.org/anthology-new/P/P08/P08-1066.pdf  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model]. In ACL 2008.


|-
;July 29 (David Smith)
|Feb. 8
:Ronan Collobert and Jason Weston (2008). [http://icml2008.cs.helsinki.fi/papers/391.pdf A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning]. ICML 2008: Helsinki, Finland.
|Delip Rao
|P. Schone and D. Jurafsky 


[http://acl.ldc.upenn.edu/W/W00/W00-0712.pdf     Knowledge-free induction of morphology using latent semantic analysis ]
;July 22 (Nikesh Garera)
:Zornitsa Kozareva, Ellen Riloff and Eduard Hovy (2008). [http://aclweb.org/anthology-new/P/P08/P08-1119.pdf Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs]. Proc. of ACL-08: HLT, Columbus, OH.


CoNLL 2000
;July 15 (Markus Dreyer)
|However, there was an extension of this work reported in NAACL-2001 that looks at circumfixes and prefix/affix combinations. [http://www.stanford.edu/people/jurafsky/NAACL2001_Morphology_final.pdf]
:Sittichai Jiampojamarn, Colin Cherry, and Grzegorz Kondrak (2008). [http://www.aclweb.org/anthology-new/P/P08/P08-1103.pdf Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion].  Proc. of ACL-08: HLT, Columbus, OH.


;July 8 (Delip Rao)
|-
:Liang Sun, Shuiwang Ji, and Jieping Ye (2008). [http://icml2008.cs.helsinki.fi/papers/270.pdf A Least Squares formulation for Canonical Correlation Analysis]. Proc. of ICML-08, Helsinki
|Feb. 1
|Nikesh Garera
|D. Yarowsky and R. Wicentowski 


[http://www.cs.swarthmore.edu/~richardw/pubs/acl2000.ps      Minimally supervised morphological analysis by multimodal alignment ]
::Hotelling, in 1936, proposed a method to characterize the relationship between two variables which widely became known as "Canonical Correlation Analysis" (CCA). This involves solving the generalized eigenvalue problem of the kind Ax = \lambda Bx, which can further be reduced to the symmetric eigenvalue problem (via Cholesky decomposition) in the CCA case. It is a general interest in statistics literature to connect different statistical models to the least squares problem not only to exploit the simpler solutions for solving such problems but also to relate with other methods. The least squares formulation also allows extending the different models using the regularization framework. The least squares formulation for the CCA model involves tying together an older result showing the equivalence of CCA and the Fisher LDA, and a recent least squares formulation of multi-class LDA.


ACL 2000
::CCA has been applied traditionally in social sciences and more recently in IR. There is literature applying CCA for problems in cross-lingual IR, image retrieval, and learning lexicons. Interestingly, the ACL'08 paper by Haghighi et. al. on learning bilingual lexicons using CCA is not the first paper to do that. There is at least [http://www.springerlink.com/content/2bwtnalmq3m9y5kr/ one paper] as early as 2004 by Cancedda & friends from XRCE that does something similar and does not get cited in the ACL paper.


|For more details refer to  [http://www.cs.swarthmore.edu/~richardw/pubs/thesis.pdf Chapter 4]  of Wicentowski's thesis.
;June 12 (Zhifei Li)
:Hao Zhang, Chris Quirk, Robert C. Moore and Daniel Gildea (2008). [http://www.cs.rochester.edu/~zhanghao/docs/zhang-gildea-acl08.pdf Bayesian Learning of Non-compositional Phrases with Synchronous Parsing]. Proc. of ACL-08: HLT, Columbus, OH.


|}
;June 5 (Markus Dreyer)
:Kuzman Ganchev, João Graça and Ben Taskar (2008). [http://www.seas.upenn.edu/~taskar/pubs/acl08.pdf Better Alignments = Better Translations?] Proc. of ACL-08: HLT, Columbus, OH.


== Fall 2006 ==
;May 29 (Nikesh Garera)
:Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick and Dan Klein (2008). [http://www.eecs.berkeley.edu/~aria42/pubs/acl2008-unsup-bilexicon.pdf Learning Bilingual Lexicons from Monolingual Corpora].  Proc. of ACL-08: HLT, Columbus, OH.


Topics:
== Spring 2008 ==
* Machine learning: Margin methods and structured classification
* Linguistics: Syntactic formalisms
* Syntax-based MT


=== Dynamic programming speedups ===


{| style="width:800px" border="1"
;May 15 (David Smith)
!  width="10%"|Date/Time
:Geoffrey Zweig and Mukund Padmanabhan (2000). [http://citeseer.ist.psu.edu/zweig00exact.html Exact Alpha-Beta Computation in Logarithmic Space with Application to MAP Word Graph Construction]. Proc. of ICSLP, Beijing.
!  width="10%"|Presenter
!  width="40%"|Paper(s)  
! Supporting Papers/Notes
|-
|Dec. 13
|Delip Rao
|J. Carbonell et. al.


[http://www.mt-archive.info/AMTA-2006-Carbonell.pdf   Context-based machine translation]
::This is a specialization to HMMs of the DBN version given earlier by [http://citeseer.ist.psu.edu/30635.html Binder, Murphy & Russell (1997)].  See also section 3.7.1 of [http://www.cs.ubc.ca/~murphyk/Thesis/thesis.pdf Kevin Murphy's thesis]


AMTA 2006
::''Related work:'' This ''kind'' of trick was really pioneered by D. S. Hirschberg (1975), who cut the space requirements of longest common subsequence from quadratic all the way down to linear.  Hirschberg's version can be [http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/ nicely adapted to edit distance]. Now, edit distance (and more generally, multiple sequence alignment) is really just a special case of shortest path in a graph.  Hirschberg (1975), above,  was generalized by Korf (1999)'s "Divide and Conquer Bidirectional Search, which  [http://www.cse.wustl.edu/~zhang/teaching/cs511/fall04/dcfa.pdf Korf & Zhang (2000)] (who discuss all these algorithms) further improved to "Divide and Conquer Frontier Search."  [http://citeseer.ist.psu.edu/482531.html Edelkamp & Meyer (2001)] give log-space methods for improving A* search for the shortest path in a graph.  (Note that A* search often fits in memory for our DP problems; reducing its memory requirements becomes paramount when we are searching trees that branch without rejoining, e.g., chess.)  [http://citeseer.ist.psu.edu/kaindl97bidirectional.html Bidirectional search], which is distantly related to A*, is also pretty well studied, including recent work at JHU's AMS Dept.


|-
;May 1 (John Blatz)
|Dec. 6
:Pedro Felzenswalb and David McAllester (2006). [http://nagoya.uchicago.edu/~dmcallester/astar.pdf The Generalized A* Architecture]. To appear in the ''Journal of Artificial Intelligence Research''.
|Jason Smith
|M. Galley et. al.  


[http://www.cs.columbia.edu/nlp/papers/2006/galley_al_06.pdf    Scalable Inference and Training of Context-Rich Syntactic Translation Models]
;Apr. 24 (Zhifei Li)
: Liang Huang (2008). [http://www.cis.upenn.edu/~lhuang3/forest-rerank.pdf Forest Reranking: Discriminative Parsing with Non-Local Features]. To appear in [http://www.ling.ohio-state.edu/acl08/ Proceedings of ACL 2008], Columbus, OH.


ACL 2006
;Apr. 17 (Arnab Ghoshal)
|It may also be helpful to look at:
: Liang Huang and David Chiang (2005). [http://www.cis.upenn.edu/~lhuang3/huang-iwpt-correct.pdf Better k-best parsing].  Proceedings International Workshop on Parsing Technologies.


M. Galley et. al.
=== Grammatical inference ===
[http://www.isi.edu/natural-language/projects/rewrite/whatsin.pdf What's in a translation rule?]
HLT/NAACL 2004


;Apr. 10 (Wren Thornton)
: Carl de Marcken (1996), [http://www.aclweb.org/anthology/P96-1044 Linguistic structure as composition and perturbation].  ACL.
: Also see [http://xxx.lanl.gov/abs/cs.CL/9611002 thesis version].


|-
;Apr. 3 (Nathaniel Filardo)
|Nov. 29
: A. Clark (2006). [http://www.cs.rhul.ac.uk/home/alexc/papers/omphalos.pdf Learning Deterministic Context Free Grammars: The Omphalos Competition].
|Balakrishnan V
|D. Marcu et. al.


[http://www.isi.edu/~marcu/papers/spmt-emnlp06.pdf    SPMT: Statistical Machine Translation with Syntactified Target Language Phrases ]
;Mar. 27 (Nikesh Garera)
: Stolcke, A. and Omohundro, S. (1993).  [http://citeseer.ist.psu.edu/stolcke93hidden.html Hidden Markov model induction by Bayesian model merging.]  Advances in Neural Information Processing Systems (Morgan Kaufmann, San Mateo, CA), 5, 11-18.


EMNLP 2006
=== Inference in graphical models ===


|-
;Mar. 20 (Delip Rao)
|Nov. 15
: Jonathan Yedidia, William Freeman, and Yair Weiss (2001).  [http://www.stat.ucla.edu/~sczhu/Workshops/sctv01/TR2001-16.pdf Bethe free energy, Kikuchi approximations and belief propagation algorithms.] MERL TR-2001-16.
|Eric Harley
|D. Chiang


[http://www.isi.edu/~chiang/papers/synchtut.pdf     An introduction to synchronous grammars]
;Mar. 6&13 (Markus Dreyer)
: M. J. Wainwright, T. Jaakkola and A. S. Willsky (2005).  [http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJaaWil05_Upper.pdf A new class of upper bounds on the log partition function]. IEEE Trans. on Information Theory, 51, 2313--2335.


ACL 2006 Tutorial
;Feb. 28 (David Smith)
|Slides from the talk are also available. [http://www.isi.edi/~chiang/papers/synchtut-slides.pdf]
: David MacKay (2003). [http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/413.435.pdf Variational methods.] Chapter 33 of [http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Information Theory, Inference, and Learning Algorithms.]  


|-
;Feb. 21 (David Smith)
|Nov. 8
: Michael I. Jordan et al. (1999).  [http://www.cs.berkeley.edu/~jordan/papers/variational-intro.pdf An Introduction to Variational Methods for Graphical Models] Machine Learning, 37, 183–233.
|Elliott Drabek
|K.Shklovsky


[http://nlp.cs.jhu.edu/~edrabek/grammatical-sketch/tzeltal.pdf    A Grammatical Sketch of Petalcingo Tzeltal]
;Feb. 7&14 (Delip Rao)
: M. I. Jordan and Y. Weiss (2002). [http://www.cs.berkeley.edu/~jordan/papers/jordan-weiss.ps Probabilistic Inference in Graphical Models], The Handbook of Brain Theory and Neural Networks (MIT Press).


Undergraduate Thesis, Reed College, 2005
==  Fall 2007 ==


|It is 77 pages long, but not dense, and I will be skipping the following sections:
=== Semisupervised learning ===


Pages
;Dec. 12 (Delip Rao)
: M. Belkin, P. Niyogi, [http://citeseer.ist.psu.edu/632472.html  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation], TechReport, UChicago, TR-2002-01
: Mikhail Belkin, Partha Niyogi, Vikas Sindhwani, [http://people.cs.uchicago.edu/~vikass/aistats.pdf On Manifold Regularization], AISTATS 2005


01-14 Phonetics and phonology
;Nov. 17 (David Smith)
: X. Zhu, [http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf  Semi-Supervised Learning Literature Survey]


18-18 Polyvalence
=== Recent parsing papers ===


21-21 Inherent possession and ...
;Nov. 3 (Christo Kirov)
: I. Titov, J. Henderson, [http://www.aclweb.org/anthology-new/P/P07/P07-1080.pdf  Constituent Parsing with Incremental Sigmoid Belief Networks], ACL 2007
46-55 Tense and aspect and other sections


|-
;Oct. 26 (Christo Kirov)
|Nov. 1
: Seginer, Yoav, [http://acl.ldc.upenn.edu/P/P07/P07-1049.pdf  Fast Unsupervised Incremental Parsing (syntax induction)], ACL 2007
|Yi Su
|M. Steedman


Gapping as Constituent Coordination
;Oct. 17 (Markus Dreyer)
: Nakagawa, Tetsuji, [http://www.aclweb.org/anthology/D/D07/D07-1100  Multilingual Dependency Parsing Using Global Features], EMNLP-CoNLL 2007


Linguistics and Philosophy, Vol. 13, 1990, pp.207-264.
=== Text compression ===


|See Yi for photocopies.
;Oct. 10 (Nathaniel W Filardo)
: Mahoney, Matthew, [http://www.cs.fit.edu/~mmahoney/compression/cs200516.pdf  Adaptive Weighting of Context Models for Lossless Data Compression], Florida Institute of Technology, CS Department, Technical report CS-2005-16, EMNLP-CoNLL 2007


|-
''Some other possible papers that we didn't read (not vetted):''
|Oct. 25
|Markus Dreyer
|S. Reizler et. al. 


[http://acl.ldc.upenn.edu/P/P02/P02-1035.pdf     Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques]
* Approaches that consider recursive text structure
** Charikar et al. (2005), [http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/18/31406/01459058.pdf The smallest grammar problem]
** de Marcken (1996), [http://www.aclweb.org/anthology/P96-1044 Linguistic structure as composition and perturbation] ([http://xxx.lanl.gov/abs/cs.CL/9611002 thesis version]) - read later on 4/10/08
** Katajainen et al. (1986), [http://www3.interscience.wiley.com/cgi-bin/abstract/113447140/ABSTRACT?CRETRY=1&SRETRY=0 Syntax-Directed Compression of Program Files]


ACL 2002
* Approaches that learn hidden state
** Cormack & Horspool (1987), [http://www.cs.uvic.ca/~nigelh/Publications/DMC.pdf Data Compression Using Dynamic Markov Modelling]
** Hu et al. (year?), [http://www.asel.udel.edu/icslp/cdrom/vol1/996/a996.pdf Language Modeling with Stochastic Automata]


* Approaches that allow searches inside the compressed text
** Antonio Farina Martinez (2005), [http://snipurl.com/1s2yc New Compression Codes for Text Databases] (dissertation)
** Culpepper & Moffat (2006), Phrase-Based Pattern Matching in Compressed Text
** Shibata et al. (2000), [http://citeseer.ist.psu.edu/265309.html A Boyer-Moore type algorithm for compressed pattern matching]
** Shibata et al. (1999), [http://citeseer.ist.psu.edu/199938.htm Byte Pair Encoding: A Text Compression Scheme That Accelerates Pattern Matching]
** Udi Manber (1997), [http://portal.acm.org/citation.cfm?id=248639 A text compression scheme that allows fast searching directly in the compressed file]


=== Domain adaptation ===


|-
;Oct. 3 (David Smith)
|Oct. 18
: Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira, [http://www.cis.upenn.edu/~blitzer/papers/nips06.pdf Analysis of Representations for Domain Adaptation]
|Erin Fitzgerald
|J. Bresnan & R.M. Kaplan


[http://www.cs.jhu.edu/~jblatz/nlp-reading-group/bresnan-kaplan-1982.pdf       Lexical-Functional Grammar: A Formal System for Grammatical Representation ]
;Sep. 26 (Omar F Zaidan)
: J. Blitzer, R. McDonald, F. Pereira, [http://www.cis.upenn.edu/~blitzer/papers/emnlp06.pdf Domain Adaptation with Structural Correspondence Learning], EMNLP 2006


The Mental Representation of Grammatical Relations, MIT Press, 1982
==  Summer 2007 ==
| BTW, the edited collection that this appears in is generally interesting. Bresnan defends and develops lexicalized grammars in general; the idea of separate surface and semantic roles; and Bresnan & Kaplan's LFG in particular. You should know that she originated (in 1978) the extremely influential idea of lexicalized syntax -- the idea that a grammar is simply a collection of lexical entries to be assembled in standard language-independent ways, but that there are also "lexical redundancy rules" that relate, e.g., active and passive entries for the same verb. Some chapters address morphological and cognitive issues pertaining to lexicalization, including an essay by Pinker on lexicalist learning.


Slides from Erin's presentation can be found [http://www.clsp.jhu.edu/~erin/presentations/LFG.ppt here].
=== Good current papers ===
|-
|Oct. 11
|John Blatz
|L.Xu, D. Wilkinson, F. Southey, & D. Schuurmans 


[http://www.cs.jhu.edu/~jblatz/nlp-reading-group/xu_et_al_ICML_2006.pdf     Discriminative Unsupervised Learning of Structured Predictors ]
;Aug. 30 (Delip Rao)
: Gideon S. Mann, [http://imls.engr.oregonstate.edu/www/htdocs/proceedings/icml2007/papers/441.pdf   Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization], Proceedings of the 24 th International Conference on Machine Learning 2007


ICML 2006
;Aug. 18 (Markus Dreyer)
: D. Talbot, M. Osborne, [http://acl.ldc.upenn.edu/P/P07/P07-1065.pdf  Randomised Language Modelling for Statistical Machine Translation], ACL 2007
: They use a space-efficient randomized data structure (Bloom Filter) to store very large n-gram models.  There is a companion paper that people might want to have a quick look at as well, for comparison:
: D. Talbot, M. Osborne, [http://acl.ldc.upenn.edu/D/D07/D07-1049.pdf Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap], ACL 2007


|-
;Aug. 11 (Nikesh Garera)
|Oct. 4
: L. Shen, G. Satta, A. Joshi., [http://acl.ldc.upenn.edu/P/P07/P07-1096.pdf   Guided learning for bidirectional sequence classification], ACL 2007
|Nikesh Garera
|A. Culotta & J. Sorensen    


[http://acl.ldc.upenn.edu/acl2004/main/pdf/244_pdf_2-col.pdf       Dependency Tree Kernels for Relation Extraction ]
;Aug. 3 (Yi Su)
: M. Galley, K. McKeown, [http://acl.ldc.upenn.edu/N/N07/N07-1023.pdf Lexicalized Markov Grammars for Sentence Compression], NAACL-HLT 2007


ACL 2004
;Jul. 18 (David Smith)
-----
: P. Liang, S. Petrov, M. Jordan, D. Klein, [http://acl.ldc.upenn.edu/D/D07/D07-1072.pdf The Infinite PCFG Using Hierarchical Dirichlet Processes], EMNLP-CoNLL 2007
D. Zelenko, C. Aone, & A. Richardella
[http://www.jmlr.org/papers/volume3/zelenko03a/zelenko03a.pdf Kernel Methods for Relation Extraction]
JMLR, Volume 3, 2003


|-
;Jul. 6 (Christopher White)
|Sept. 27
: A. Braunstein, M. Mezard, R. Zecchina., [http://users.ictp.it/~zecchina/rsa.pdf Survey propagation: an algorithm for satisfiability], Random Structures and Algorithms, 2005.
|David Smith
:: We sent some questions to Zecchina.
|C. Cortes, P. Haffner, & M. Mohri   
: Lukas Kroc, Ashish Sabharwal and Bart Selman. [http://www.cs.cornell.edu/~sabhar/publications/surveyPropUAI07b.pdf Survey propagation revisited: An empirical study]. 23rd UAI, 2007.


[http://www.cs.nyu.edu/~mohri/postscript/kernel.ps     Rational Kernels ]
;Jun. 21 (Christopher White)
: K. Murphy, Y. Weiss, M. Jordan, [http://citeseer.ist.psu.edu/murphy99loopy.html Loopy belief propagation for approximate inference: An empirical study], 15th UAI, pages 467-?75, 1999
:: ... discussing (loopy) belief propagation as background for survey propagation, a topic which has been getting more attention lately for its ability to "solve very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas.  [http://research.microsoft.com/%7Ecmbishop/PRML/Bishop-PRML-sample.pdf Chapter 8 of Chris Bishop's textbook] is supposed to be a good treatment of graphical models overall.  He covers BP in section 8.4.4 after first presenting factor graphs in 8.4.3., David MacKay's treatment of BP, also in terms of factor graphs, is in [http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/334.340.pdf chapter 26] of his book [http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html]. It's worth reading this chapter in full, perhaps first reading [http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/240.247.pdf chapter 16].  ... the update equations are given as (26.11) and (26.12) ... [substantial further discussion by Jason was here] Some people may prefer Bishop's style, others MacKay's.


NIPS 2003
;Jun. 14 (David Smith)
|Papers extending rational kernels, including results on positive semidefinite cases, are at:[http://www.cs.nyu.edu/~mohri/rational.html]
: X. Zhu, Z. Ghahramani,J. Lafferty, [http://acl.ldc.upenn.edu/N/N07/N07-1026.pdf Semi-supervised learning using Gaussian fields and harmonic functions], ICML 2003


For the record, and not to be read, is an interesting parallel line of research in Fisher Kernels over strings, e.g. this paper by Saunders, Shawe-Taylor and Vinokourov: [http://citeseer.ist.psu.edu/524921.html]
;Jun. 6 (Nikesh Garera)
: A. Alexandrescu, K. Kirchhoff, [http://acl.ldc.upenn.edu/N/N07/N07-1026.pdf Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP], HLT/NAACL 2007


|-
;Jun. 2 (Erin Fitzgerald)
|Sept. 20
: J. Jiang, C. Zhai, [http://acl.ldc.upenn.edu/N/N07/N07-1015.pdf A Systematic Exploration of the Feature Space for Relation Extraction], HLT/NAACL 2007
|Elliot Drabek
|K.Q. Weinberger, F. Sha, & L.K. Saul   


[http://www.cs.berkeley.edu/~feisha/pubs/learning_kernel04.pdf     Learning a kernel matrix for nonlinear dimensionality reduction ]
;May 17 (Markus Dreyer)
: M. Galley, K. McKeown, [http://acl.ldc.upenn.edu/N/N07/N07-1023.pdf Lexicalized Markov Grammars for Sentence Compression], HLT/NAACL 2007


ICML 2004
;May 10 (David Smith )
: M. Johnson, T. Griffiths, and S. Goldwater, [http://acl.ldc.upenn.edu/N/N07/N07-1018.pdf Bayesian Inference for PCFGs via Markov Chain Monte Carlo], HLT/NAACL 2007


|S.T. Roweis & L.K. Saul   
==  Spring 2007 ==


[http://www.sciencemag.org/cgi/reprint/290/5500/2323.pdf     Nonlinear Dimensionality Reduction by Locally Linear Embedding ]
=== Integrating search and learning ===
;Apr. 19 (John Blatz)
: A. Prieditis, [http://www.cs.jhu.edu/~jblatz/nlp-reading-group/prieditis93.pdf Machine discovery of Effective Admissible Heuristics ], Machine Learning Journal, 1993


Science, 22 December 2000
;Apr. 12 (Markus Dreyer)
-----
: A. Haghighi, J. DeNero and D. Klein, [http://www.eecs.berkeley.edu/~aria42/pubs/factor-astar-naacl07.pdf   Approximate Factoring for A* Search], NAACL-HLT 2007
J.B. Tenenbaum, V. De Silva, & J.C. Langford 
[http://web.mit.edu/cocosci/Papers/sci_reprint.pdf A global geometric framework for nonlinear dimensionality reduction ]
Science, 22 December 2000


|-
;Mar. 29 & Apr. 5 (Zhifei Li)
|Sept. 13
: H. Daume III, J. Langford, and D. Marcu, [http://pub.hal3.name/daume06searn.pdf    Search-based structured prediction], Machine Learning Journal, forthcoming
|Roy Tromble
|L. Xu, J. Neufeld, B. Larson, & D. Schuurmans     


[http://books.nips.cc/papers/files/nips17/NIPS2004_0834.pdf     Maximum Margin Clustering ]
;Mar. 8 (David Smith)
: H. Daume III & D. Marcu, [http://pub.hal3.name/daume05laso.pdf   Learning as search optimization: approximate large margin methods for structured prediction], ICML 2005


NIPS 2004
=== Recent IR/QA papers (with an NLP or multilingual focus) ===


|}
;Mar. 1 (Wei Chen)
: M. Kaisser, S. Scheible, and B. Webber, [http://trec.nist.gov/pubs/trec15/papers/udeinburgh.qa.final.pdf    Experiments at the University of Edinburgh for the TREC 2006 QA track], TREC-15
:: They do some fairly deep interpretation of sentences, extracting their predicate-argument structure.


== Summer 2006 ==
;Feb. 22 (Eric Harley)
: K. Kan Lo & W. Lam, [http://trec.nist.gov/pubs/trec15/papers/cuhk.qa.final.pdf    Using Semantic Relations with World Knowledge for Question Answering], TREC-15


== Spring 2006 ==
=== Unsupervised learning of morphology ===


== Fall 2005 ==
;Feb. 15 (Nikhil Bojja)
: C. Monson et. al., [http://acl.ldc.upenn.edu/acl2004/studentws/pdf/monson.pdf      Unsupervised Induction of Natural Language Morphology Inflection Classes], ACL Student Workshop '04


{| style="width:800px" border="1"
;Feb. 8 (Delip Rao)
!  width="10%"|Date/Time
: P. Schone and D. Jurafsky,  [http://acl.ldc.upenn.edu/W/W00/W00-0712.pdf      Knowledge-free induction of morphology using latent semantic analysis ], CoNLL 2000
!  width="10%"|Presenter
:: However, there was an extension of this work reported in NAACL-2001 that looks at circumfixes and prefix/affix combinations. [http://www.stanford.edu/people/jurafsky/NAACL2001_Morphology_final.pdf]
!  width="40%"|Paper(s)
!  Supporting Papers/Notes


|-
;Feb. 1 (Nikesh Garera)
|Sept. 14
: D. Yarowsky and R. Wicentowski,  [http://www.cs.swarthmore.edu/~richardw/pubs/acl2000.ps      Minimally supervised morphological analysis by multimodal alignment],ACL 2000
|Nikesh Garera
:: For more details refer to  [http://www.cs.swarthmore.edu/~richardw/pubs/thesis.pdf Chapter 4]  of Wicentowski's thesis.
|M. Jordan   


Statistical Learning Theory Chapter 8 (Exponential family and Generalized linear models)
==  Fall 2006 ==


|-
=== Syntax-based MT ===
|Sept. 21
|Arnab Ghoshal
|M. Jordan   


Statistical Learning Theory Chapter 2&3
;Dec. 13 (Delip Rao)
:J. Carbonell et. al., [http://www.mt-archive.info/AMTA-2006-Carbonell.pdf  Context-based machine translation], AMTA 2006


|-
;Dec. 6 (Jason Smith)
|Oct. 20
:M. Galley et. al.,  [http://www.cs.columbia.edu/nlp/papers/2006/galley_al_06.pdf    Scalable Inference and Training of Context-Rich Syntactic Translation Models], ACL 2006
|Roy Tromble
:It may also be helpful to look at:
|Sheila M. Reynolds, Jeff A. Bilmes   
:M. Galley et. al., [http://www.isi.edu/natural-language/projects/rewrite/whatsin.pdf What's in a translation rule?], HLT/NAACL 2004


[http://ssli.ee.washington.edu/people/bilmes/mypapers/sheila-hlt05.pdf Part-of-Speech Tagging using Virtual Evidence and Negative Training.]
;Nov. 29 (Balakrishnan V)
:D. Marcu et. al., [http://www.isi.edu/~marcu/papers/spmt-emnlp06.pdf     SPMT: Statistical Machine Translation with Syntactified Target Language Phrases ], EMNLP 2006


Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing2005. pp 459--466.
;Nov. 15 (Eric Harley)
:D. Chiang, [http://www.isi.edu/~chiang/papers/synchtut.pdf An introduction to synchronous grammars], ACL 2006 Tutorial
: Slides from the talk are also available. [http://www.isi.edi/~chiang/papers/synchtut-slides.pdf]


|-
=== Linguistics: Syntactic formalisms ===
|Oct. 27
|Markus Dreyer
|D. Roth and W. Yih 


[http://l2r.cs.uiuc.edu/~danr/Papers/RothYi05.pdf Integer Linear Programming Inference for Conditional Random Fields.]
;Nov. 8 (Elliott Drabek)
: K.Shklovsky,  [http://nlp.cs.jhu.edu/~edrabek/grammatical-sketch/tzeltal.pdf   A Grammatical Sketch of Petalcingo Tzeltal], Undergraduate Thesis, Reed College, 2005
:It is 77 pages long, but not dense, and I will be skipping the following sections: Pages
:* 01-14 Phonetics and phonology
:* 18-18 Polyvalence
:* 21-21 Inherent possession and ...
:* 46-55 Tense and aspect and other sections


ICML '2005
;Nov. 1 (Yi Su)
: M. Steedman, Gapping as Constituent Coordination, Linguistics and Philosophy, Vol. 13, 1990, pp.207-264.
: See Yi for photocopies.


;Oct. 25 (Markus Dreyer)
: S. Reizler et. al., [http://acl.ldc.upenn.edu/P/P02/P02-1035.pdf      Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques], ACL 2002


|-
;Oct. 18 (Erin Fitzgerald)
|Nov. 4
: J. Bresnan & R.M. Kaplan,  [http://www.cs.jhu.edu/~jblatz/nlp-reading-group/bresnan-kaplan-1982.pdf      Lexical-Functional Grammar: A Formal System for Grammatical Representation ], The Mental Representation of Grammatical Relations, MIT Press, 1982
|Jason Riesa
::The edited collection that this appears in is generally interesting. Bresnan defends and develops lexicalized grammars in general; the idea of separate surface and semantic roles; and Bresnan & Kaplan's LFG in particular. You should know that she originated (in 1978) the extremely influential idea of lexicalized syntax -- the idea that a grammar is simply a collection of lexical entries to be assembled in standard language-independent ways, but that there are also "lexical redundancy rules" that relate, e.g., active and passive entries for the same verb. Some chapters address morphological and cognitive issues pertaining to lexicalization, including an essay by Pinker on lexicalist learning., Slides from Erin's presentation can be found [http://www.clsp.jhu.edu/~erin/presentations/LFG.ppt here].
|Luke S. Zettlemoyer, Michael Collins.  


[http://people.csail.mit.edu/lsz/papers/uai05.pdf  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial]
=== Machine learning: Margin methods and structured classification ===


Proceedings of UAI 2005
;Oct. 11 (John Blatz)
: L.Xu, D. Wilkinson, F. Southey, & D. Schuurmans, [http://www.cs.jhu.edu/~jblatz/nlp-reading-group/xu_et_al_ICML_2006.pdf      Discriminative Unsupervised Learning of Structured Predictors ], ICML 2006


|-
;Oct. 4 (Nikesh Garera)
|Nov. 16
: A. Culotta & J. Sorensen, [http://acl.ldc.upenn.edu/acl2004/main/pdf/244_pdf_2-col.pdf      Dependency Tree Kernels for Relation Extraction ], ACL 2004
|Safiullah Shareef
|Hassan Sawaf, Jörg Zaplo, Hermann Ney


[http://www.elsnet.org/arabic2001/sawaf.pdf Statistical Classification Methods for Arabic News Articles]
:D. Zelenko, C. Aone, & A. Richardella, [http://www.jmlr.org/papers/volume3/zelenko03a/zelenko03a.pdf Kernel Methods for Relation Extraction], JMLR, Volume 3, 2003


|-
;Sep. 27 (David Smith)
|Nov. 23
: C. Cortes, P. Haffner, & M. Mohri, [http://www.cs.nyu.edu/~mohri/postscript/kernel.ps      Rational Kernels ], NIPS 2003
|Roy Tromble
::Papers extending rational kernels, including results on positive semidefinite cases, are at: [http://www.cs.nyu.edu/~mohri/rational.html], For the record, and not to be read, is an interesting parallel line of research in Fisher Kernels over strings, e.g. this paper by Saunders, Shawe-Taylor and Vinokourov: [http://citeseer.ist.psu.edu/524921.html]
|Sutton, Charles and McCallum, Andrew


[http://www.aclweb.org/anthology/H/H05/H05-1094  Composition of Conditional Random Fields for Transfer Learning]
;Sep. 20 (Elliot Drabek)
: K.Q. Weinberger, F. Sha, & L.K. Saul, [http://www.cs.berkeley.edu/~feisha/pubs/learning_kernel04.pdf      Learning a kernel matrix for nonlinear dimensionality reduction ], ICML 2004


Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing 2005
: S.T. Roweis & L.K. Saul, [http://www.sciencemag.org/cgi/reprint/290/5500/2323.pdf      Nonlinear Dimensionality Reduction by Locally Linear Embedding ], Science, 22 December 2000


|}
:J.B. Tenenbaum, V. De Silva, & J.C. Langford, [http://web.mit.edu/cocosci/Papers/sci_reprint.pdf  A global geometric framework for nonlinear dimensionality reduction ], Science, 22 December 2000


==  Summer 2005 ==
;Sep. 13 (Roy Tromble)
: L. Xu, J. Neufeld, B. Larson, & D. Schuurmans, [http://books.nips.cc/papers/files/nips17/NIPS2004_0834.pdf      Maximum Margin Clustering ], NIPS 2004


{| style="width:800px" border="1"
==  Summer 2006 ==
! width="10%"|Date/Time
!  width="10%"|Presenter
!  width="40%"|Paper(s)
!  Supporting Papers/Notes
|-
|ddd


|}
=== Recent HLT-NAACL papers ===


;Aug. 4 (David Smith)
:Sharon Goldwater, Thomas L. Griffiths, Mark Johnson, [http://acl.ldc.upenn.edu/P/P06/P06-1085.pdf  Contextual Dependencies in Unsupervised Word Segmentation], ACL 2006
:Anyone looking for a more straight-up language modeling discussion can compare:
:* Yee Whye Teh, [http://portal.acm.org/ft_gateway.cfm?id=1220299&type=pdf&coll=GUIDE&dl=GUIDE&CFID=15174251&CFTOKEN=31671821 A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes], ACL 2006
:More resources:
:*[http://www.mlpedia.org/index.php?title=Dirichlet_process  Machine Learning MLPedia page on Dirichlet Processes]
:*[http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps Michael Jordan's NIPS 2005 tutorial: Nonparametric Bayesian Methods: Dirichlet Processes, Chinese Restaurant Processes and All That]
:*Y. Teh, M. Jordan, M. Beal, and D. Blei, [http://www.cs.princeton.edu/~blei/papers/TehJordanBealBlei2004.pdf  Hierarchical Dirichlet processes], Journal of the American Statistical Association, 2006


== Spring 2005 ==
;Jul. 20 (Roy Tromble)
: Mehryar Mohri, Brian Roark, [http://www.cslu.ogi.edu/people/roark/spcfg.pdf Probabilistic Context-Free Grammar Induction Based on Structural Zeros], HLT-NAACL, 2006


{| style="width:800px" border="1"
;Jul. 6 (Keith Hall)
!  width="10%"|Date/Time
: Charles Sutton, Michael Sindelar, Andrew McCallum, [http://www.cs.umass.edu/~casutton/publications/bags-hlt2006.pdf Reducing Weight Undertraining in Structured Discriminative Learning], HLT-NAACL, 2006
!  width="10%"|Presenter
!  width="40%"|Paper(s)  
!  Supporting Papers/Notes
|-
|ddd


|}
;Jun. 31 (Markus Dreyer)
: Joakim Nivre, Johan Hall et al, [http://www.cnts.ua.ac.be/conll/pdf/22124.pdf Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines], CoNLL 2006
:J. Nivre, J. Nilsson, [http://www.vxu.se/msi/users/nivre/papers/acl05.pdf Pseudo-Projective Dependency Parsing], ACL 2005


==  Fall 2004 ==
;Jun. 24 (David Smith)
: Percy Liang, Ben Taskar, Dan Klein, [http://www.eecs.berkeley.edu/~pliang/papers/alignment-naacl2006.pdf Alignment by Agreement], HLT-NAACL 2006


{| style="width:800px" border="1"
==  Spring 2006 ==
! width="10%"|Date/Time
!  width="10%"|Presenter
!  width="40%"|Paper(s)
!  Supporting Papers/Notes
|-
|ddd


|}
=== Algorithms for NLP (mostly) ===


;May 18 (Markus Dreyer)
: Jonathan May, Kevin Knight, [http://www.isi.edu/~jonmay/pubs/naacl06.pdf A Better N-Best List: Practical Determinization of Weighted Finite Tree Automata], Proc. NAACL-HLT, 2006


;May 11 (John Blatz)
: M. Gengler, [http://www.cs.jhu.edu/~jblatz/gengler.pdf An introduction to parallel dynamic programming], Lecture Notes in Computer Science, 1996


==  Summer 2004 ==
;May 4 (David Smith)
: C. E. R. Alves, E. N. C′aceres F. Dehne, [http://citeseer.ist.psu.edu/724170.html Parallel dynamic programming for solving the string editing problem on a CGM/BSP], SPAA 2002


{| style="width:800px" border="1"
;Apr. 20 (Balakrishnan V)
!  width="10%"|Date/Time
: Richard M. Karp, Michael 0. Rabin, [http://www.research.ibm.com/journal/rd/312/ibmrd3102P.pdf Efficient randomized Pattern matching Algorithms], IBM Journal of Research and Development, 1987
!  width="10%"|Presenter
!  width="40%"|Paper(s)  
!  Supporting Papers/Notes
|-
|ddd


|}
;Mar. 31, Apr. 6 (Eric Harley)
:Ben Taskar, Lacoste-Julien Simon, Klein Dan, [http://acl.ldc.upenn.edu/H/H05/H05-1010.pdf A Discriminative Matching Approach to Word Alignment], ACL 2005
:A related paper is
:Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic, [http://acl.ldc.upenn.edu/H/H05/H05-1066.pdf Non-projective Dependency Parsing using Spanning Tree Algorithms], HLT-EMNLP 2005


;Mar.17 (Elliott Franco Drabek)
:Necip Fazil Ayan, Bonnie J. Dorr, Christof Monz, [http://www.cs.umd.edu/~nfa/Publications/ayan-emnlp05-alp.pdf Alignment Link Projection Using Transformation-Based Learning], HLT-EMNLP 2005


==  Spring 2004 ==
;Mar.10 (Roy Tromble)
:Terry Koo, Michael Collins, [http://www.aclweb.org/anthology/H/H05/H05-1064 Hidden-Variable Models for Discriminative Reranking], HLT-EMNLP 2005


{| style="width:800px" border="1"
;Mar.3 (Jason Riesa)
!  width="10%"|Date/Time
:Hal Daume III, Daniel Marcu, [http://www.isi.edu/~hdaume/docs/daume06megam.pdf Domain Adaptation for Statistical Classifiers], Journal of Artificial Intelligence Research, 2006
!  width="10%"|Presenter
:J. Gorman, J. Curran, [http://acl.ldc.upenn.edu/W/W05/W05-1011.pdf  Approximate Searching for Distributional Similarity], Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, 2005
!  width="40%"|Paper(s)  
!  Supporting Papers/Notes
|-
|ddd


|}
;Feb. 23 (Omar F. Zaidan)
:Ravichandran, Pantel, Hovy, [http://arxiv.org/abs/cmp-lg/9606019 Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering], ACL 2005


=== Consensus decoding ===


==  Fall 2003 ==
;Feb. 16 (Noah A Smith)
: Khalil Sima'an, [http://arxiv.org/abs/cmp-lg/9606019 Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars], COLING 1996
:Francisco Casacuberta, Colin de la Higuera, [http://citeseer.ifi.unizh.ch/casacuberta00computational.html Computational complexity of problems on probabilistic grammars and transducers], LNAI 1981
: For a longer and more HMM/compbio view and extended results, see
:Rune B. Lyngsoe, Christian N. S. Pederson, The Consensus String Problem and the Complexity of Comparing Hidden Markov Models, Journal of Computer and System Sciences 65:545-69, 2002


{| style="width:800px" border="1"
=== Extracting idioms ===
!  width="10%"|Date/Time
!  width="10%"|Presenter
!  width="40%"|Paper(s)
!  Supporting Papers/Notes
|-
|ddd


|}
;Feb. 9 (John Blatz)
: Dominic Widdows, Beate Dorow, [http://acl.ldc.upenn.edu/W/W05/W05-1006.pdf Automatic Extraction of Idioms using Graph Analysis and Asymmetric Lexicosyntactic Patterns], Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, 2005
:Afsaneh Fazly, Suzanne Stevenson, [http://www.cs.toronto.edu/~suzanne/papers/paclic-ref.pdf Automatic Acquisition of Knowledge about Multiword Predicates], Proceedings of the 19th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2005).


==  Fall 2005 ==


=== Good recent papers ===


;Nov. 23 (Roy Tromble)
: Sutton, Charles and McCallum, Andrew, [http://www.aclweb.org/anthology/H/H05/H05-1094  Composition of Conditional Random Fields for Transfer Learning], HLT-EMNLP 2005


;Nov. 16 (Safiullah Shareef)
: Hassan Sawaf, J&ouml;rg Zaplo, Hermann Ney, [http://www.elsnet.org/arabic2001/sawaf.pdf Statistical Classification Methods for Arabic News Articles]


== Spring 2003 ==
;Nov. 4 (Jason Riesa)
: Luke S. Zettlemoyer, Michael Collins., [http://people.csail.mit.edu/lsz/papers/uai05.pdf  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial], UAI 2005
 
;Oct. 27 (Markus Dreyer)
: D. Roth and W. Yih, [http://l2r.cs.uiuc.edu/~danr/Papers/RothYi05.pdf Integer Linear Programming Inference for Conditional Random Fields], ICML 2005


{| style="width:800px" border="1"
;Oct. 20 (Roy Tromble)
!  width="10%"|Date/Time
: Sheila M. Reynolds, Jeff A. Bilmes, [http://ssli.ee.washington.edu/people/bilmes/mypapers/sheila-hlt05.pdf Part-of-Speech Tagging using Virtual Evidence and Negative Training], HLT-EMNLP 2005
!  width="10%"|Presenter
!  width="40%"|Paper(s)  
!  Supporting Papers/Notes


|-
=== Statistical learning theory ===
|Feb. 13
|David Smith
|K. Church   


[http://www.research.att.com/~kwc/published_2000_Coling.pdf Empirical Estimates of Adaptation: The chance of Two Noriega's is closer to p/2 than p^2]
;Sep. 21 (Arnab Ghoshal)
: M. Jordan,Statistical Learning Theory, Chapters 2-3


Coling 2000, pp. 173-179
;Sep. 14 (Nikesh Garera)
: M. Jordan,Statistical Learning Theory, Chapter 8 (Exponential family and Generalized linear models)


==  Summer 2005 ==


|-
=== Gibbs sampling ===
|Feb. 19
|Elliott Drabek
|A. Lopez􀀀, M. Nossal􀀀, R. Hwa, P. Resnik 


[http://www.cs.umd.edu/users/alopez/pub/lrec02-lnhr.pdf Word-level Alignment for Multilingual Resource Acquisition]
;Sep. 1 (John, Markus, & Nikesh)
:B. Walsh, [http://nitro.biosci.arizona.edu/courses/EEB581-2004/handouts/Gibbs.pdf Markov Chain Monte Carlo and Gibbs Sampling], Lecture Notes for EEB 581, version 26 April 2004


Proceedings of the 2002 LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data
;Aug. 26 (Roy Tromble)
:Jenny Rose Finkel, Trond Grenager, Christopher Manning, [http://www.aclweb.org/anthology/W/W05/W05-0511  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling], ACL 2005


=== AI ===


|-
;Aug. 19 (John Blatz)
|Feb. 26
:Niyogi, Sourabh, [http://www.aclweb.org/anthology/W/W05/W05-0511 Steps Toward Deep Lexical Acquisition], ACL 2005
|Elliott Drabek
|Steven Abney  


[http://www.vinartus.net/spa/02a.pdf Bootstrapping]
=== Unsupervised or semi-supervised EM ===


ACL'02
;Aug. 5 (Adam)
:Duh, Kevin  and  Kirchhoff, Katrin, [http://www.aclweb.org/anthology/W/W05/W05-0708 Tagging of Dialectal Arabic: A Minimally Supervised Approach], ACL 2005


|-
;Jul. 28 (Zak)
|Mar.6
: Takuya Matsuzaki, Yusuke Miyao, Jun'ichi Tsujii, [http://www.aclweb.org/anthology/P/P05/P05-1010  Probabilistic CFG with Latent Annotations], ACL 2005
|Paola Virga
| Carl M. Kadie, Christopher Meek, David Heckerman


[http://research.microsoft.com/~carlk/papers/cfw.htm A Collaborative Filtering System Using Posteriors Over Weights of Evidence]
;Jul. 21 (Keith)
:Sharon Goldwater and Mark Johnson, [http://www.aclweb.org/anthology/W/W05/W05-0615 Representational Bias in Unsupervised Learning of Syllable Structure], ACL 2005


Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.
;Jul. 21 (Damianos)
:Ando, Rie  and  Zhang, Tong, [http://www.aclweb.org/anthology/P/P05/P05-1001 A High-Performance Semi-Supervised Learning Method for Text Chunking], ACL 2005


=== Learning optimality-theoretic grammars ===


|-
;Jul. 14 (John Blatz)
|Mar.20
: Ying Lin, [http://www.aclweb.org/anthology/P/P05/P05-1043 Learning Stochastic OT Grammars: A Bayesian Approach using Data Augmentation and Gibbs Sampling], ACL 2005
|Roy Tromble
 
|Nikita Schmid, Ahmed Patel
;Jul. 14 (Roy Tromble)
: Sharon Goldwater and Mark Johnson, [http://www.cog.brown.edu:16080/~sgwater/papers/OTvar03.pdf Learning OT Constraint Rankings Using a Maximum Entropy Model], Proceedings of the Workshop on Variation within Optimality Theory, 2003


[ttp://arXiv.org/abs/cs/0201008 Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data]
==  Spring 2005 ==


|-
;May 7 (Markus Dreyer)
|Apr.10
: M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, M. Gori, [http://citeseer.ist.psu.edu/diligenti00focused.html  Focused Crawling Using Context Graphs], 26th International Conference on Very Large Databases, VLDB 2000
|
:Adam Kilgarriff, Gregory Grefenstette, [http://mitpress.mit.edu/catalog/item/default.asp?tid=10839&ttype=6 Introduction to the Special Issue on the Web as Corpus], Computational Lingustics, 2003
|V. N. Vapnik


[http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Intro and Chapters 1, 2A
;Apr. 28 (Damianos Karakos)
: Alessandro Moschitti and Roberto Basili, [http://ai-nlp.info.uniroma2.it/moschitti/publications.htm  Complex Linguistic Features for Text Classification: A comprehensive study], Proceedings of the 26th European Conference on Information Retrieval Research (ECIR 2004)


|-
;Apr. 21 (Omar F. Zaidan)
|Apr.17
:Tin Kam Ho, Jonathan J. Hull, Sargur N. Stihari, [http://www.crc.ricoh.com/~hull/pubs/ho_pami94.pdf  Decision Combination in Multiple Classifier Systems], IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.16. No I. Jan. 1994
|Roy Tromble
:Dan Klein, Kristina Toutanova, H. Tolga Ilhan, Sepandar D. Kamvar and Christopher D. Manning, [http://www-nlp.stanford.edu/~manning/papers/wsd-workshop-camera-mathtime.ps Combining Heterogeneous Classifiers for Word-Sense Disambiguation], ACL 2002
|V. N. Vapnik


[http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory],Chapters 2B - 4A
;Apr. 16 (Brock Pytlik)
: V. Lavrenko, S.L Feng, R. Manmatha, [http://ciir.cs.umass.edu/pubfiles/mm-325.pdf  Statistical models for automatic video annotation and retrieval], ICASSP 2004
: S.L Feng, R. Manmatha, V. Lavrenko, [http://ciir.cs.umass.edu/pubfiles/mm-333.pdf Multiple Bernoulli Relevance Models for Image and Video Annotation]
:: The first is a short paper about the relevance model.  The second is a follow up paper that details a subsequent model based on the CRM.


|-
;Apr. 9 (Noah A Smith)
|Apr. 24
:G. Elidan, N. Friedman., [http://www.cs.huji.ac.il/~nirf/Abstracts/ElF2.html The Information Bottleneck EM Algorithm], UAI 2003
|Paola
:G. Elidan, N. Friedman, [http://jmlr.csail.mit.edu/papers/v6/elidan05a.html Learning Hidden Variable Networks], JMLR 2005
|V. N. Vapnik


[http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Chapters 4B - 5A
;Feb. 25, Mar. 4, Mar. 11, Apr. 2 (David Smith)
: M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, [http://www.cs.berkeley.edu/~jordan/papers/variational-intro.ps.gz Learning in Graphical Models], MIT Press, 1999


|-
==  Fall 2004 ==
|May 1
|Noah
|V. N. Vapnik


[http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Chapters 5B - 6A
;Nov. 27 (Jia Cui)
: David M. Blei, Andrew Y. Ng, Michael I. Jordan, [http://citeseer.ist.psu.edu/blei03latent.html Latent Dirichlet Allocation], JMLR 2003
: Other papers on LDA: [www.cs.toronto.edu/~ywteh/research/npbayes/report.pdf], [http://citeseer.ist.psu.edu/541352.html]


|-
;Nov. 20 (David Smith)
|May 8
: Olle H&auml;ggstr&ouml;m and Karin Nelander, [http://nlp.cs.jhu.edu/~dasmith/mrfcftp.pdf  On Exact Simulation of Markov Random Fields Using Coupling from the Past], Foundation of the Scandinavian Journal of Statistics, 1999
|Noah
:James Fill and Mark Huber, [http://www.mts.jhu.edu/~fill/papers/recycler.pdf  The Randomness Recycler: A New Technique for erfect Sampling], FOCS 2000
|V. N. Vapnik


[http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Chapters 6B - 7A
;Nov. 13 (Charles Schafer)
: Endika Bengoextea, [http://nlp.cs.jhu.edu/~cschafer/david/Ch2.pdf Inexact Graph Matching Using Estimation of Distribution Algorithms, Chapter 2: The graph matching problem], Ph.D dissertation, 2002
:: This chapter is general to the field although pretty sweeping and unspecific as a result. It probably makes a good introduction, since it gives an idea of the scope and diversity of the problem and proposed techniques ...
:Yakov Keselman, Ali Shokoufandeh, M. Fatih Demirci, Sven Dickinson, [http://nlp.cs.jhu.edu/~cschafer/david/many-to-many-graph.pdf Many-to-Many Graph Matching via Metric Embedding], Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE
:: This is a state of the art paper which is quite dense but quite interesting. solves a very general formulation of inexact graph matching by first imbedding graphs into a normed space ...


|-
;Nov. 5 (Michelle Vanni)
|May 15
: Robert S. Swier and Suzanne Stevenson, [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Swier.pdf Unsupervised Semantic Role Labelling], EMNLP 2004
|Chal
:Nianwen Xue and Martha Palmer, [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Xue.pdf Calibrating Features for Semantic Role Labelling], EMNLP 2004
|V. N. Vapnik


[http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Chapters 7B -
;Oct. 29 (Eric Goldlust)
: Stephen Clark and James Curran, [http://web.comlab.ox.ac.uk/oucl/work/stephen.clark/papers/acl04.pdf Parsing the WSJ using CCG and Log-Linear Models], ACL 2004


|}
;Oct. 22 (Michelle Vanni)
: Dekang Lin and Franz Och, [http://acl.ldc.upenn.edu/acl2004/main/pdf/215_pdf_2-col.pdf Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence], ACL 2004
:Babych and Hartley, [http://acl.ldc.upenn.edu/acl2004/main/pdf/349_pdf_2-col.pdf  Extending the BLEU MT Evaluation Method with Frequency Weightings], ACL 2004


==  Fall 2002 ==
;Oct. 15 (John Blatz)
: Daichi Mochihashi, Genichiro Kikui, Kenji Kita, [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Mochihashi.pdf Learning Nonstructural Distance Metric by Minimum Cluster Distortions], EMNLP 2004


{| style="width:800px" border="1"
;Oct. 2 (Nguyen Bach)
!  width="10%"|Date/Time
:Background knowledge on SVM and Graphical Models:
!  width="10%"|Presenter
:* [http://www.cse.msu.edu/~lawhiu/intro_SVM.ppt Intro SVM]
!  width="40%"|Paper(s)
:* [http://www.ai.mit.edu/~murphyk/Bayes/bnintro.html Intro Graphical Models]
!  Supporting Papers/Notes


|-
;Sep. 24, Oct. 7 (Roy Tromble)
|Sep. 10
: B. Taskar, C. Guestrin and D. Koller, [http://robotics.stanford.edu/~btaskar/pubs/mmmn.ps Max-Margin Markov Networks], Neural Information Processing Systems Conference (NIPS03), 2003
|Noah A. Smith
: B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning, [http://robotics.stanford.edu/~btaskar/pubs/mmcfg.ps Max-Margin Parsing], EMNLP 2004
|Collins, Duffy.


[http://www.research.att.com/~mcollins/papers/finalacl2002.ps New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron.]
;Sep. 9 (John Blatz)
:Pascale Fung and Percy Cheung, [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Fung.pdf Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM], ACL 2004
ACL '2002
:Dragos Stefan Munteanu, Alexander Fraser and Daniel Marcu, [http://acl.ldc.upenn.edu/hlt-naacl2004/main/pdf/93_Paper.pdf Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora], ACL 2004


|-
;Sep. 2 (Gideon Mann)
|Sep. 19
:Xin Li, Paul Morie, and Dan Roth, [http://acl.ldc.upenn.edu/hlt-naacl2004/main/pdf/139_Paper.pdf Robust Reading: Identification and Tracing of Ambiguous Names], ACL 2004
|Paola Virga
:Cheng Niu, Wei Li, Rohini K. Srihari, [http://acl.ldc.upenn.edu/acl2004/main/pdf/372_pdf_2-col.pdf Weakly Supervised Learning for Cross-Document Person-Name Disambiguation Supported by Information Extraction], ACL 2004
|Yamada, Knight


[http://acl.ldc.upenn.edu/P/P02/P02-1039.pdf A decoder for Syntax-based Statistical MT]
;Aug. 27 (David Smith)
:I. Dan Melamed, [http://acl.ldc.upenn.edu/acl2004/main/pdf/113_pdf_2-col.pdf Statistical Machine Translation by Parsing], ACL 2004
ACL '2002
:Daniel Gildea, [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Gildea.pdf Dependencies vs. Constituents for Tree-Based Alignment], ACL 2004


|-
;Aug. 20 (Damianos Karakos, Charles Schafer)
|Sep. 26
:P. Pantel and D. Lin, [http://www.cs.ualberta.ca/~lindek/papers/kdd02.pdf Discovering word senses from text], Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002
|Paul Ruhlen
:Diana McCarthy, Rob Koeling, Julie Weeds, John Carroll, [ftp://ftp.informatics.susx.ac.uk/pub/users/dianam/senseranks.pdf Finding Predominant Word Senses in Untagged Text], 2004
|Hwa, Resnik, Weinberg, Kolak


[http://acl.ldc.upenn.edu/P/P02/P02-1050.pdf Evaluating Translational Correspondence using Annotation Projection]
==  Spring 2004 ==
ACL '2002


|-
=== Information extraction ===
|Oct. 2
;May 15 (Roy Tromble)
|Gideon Mann
: Fuchun Peng, Andrew McCallum, [http://www.cs.umass.edu/~mccallum/papers/hlt2004.pdf Accurate Information Extraction from Research Papers using Conditional Random Fields],2004
|Gildea, Jurafsky


[http://www.colorado.edu/ling/jurafsky/cl01.ps Automatic Labeling of Semantics Roles]
;May 1 (Izhak Shafran)
:Eric J. Friedman, [http://citeseer.ist.psu.edu/377160.html Strong Monotonicity in Surplus Sharing], 1999
ACL '2001
: Used Tom Dietterich has a web page on probabilistic relational models:, [http://web.engr.oregonstate.edu/~tgd/classes/539/]


|-
;Apr. 24 (David Smith)
|Oct. 8
: McCallum and Jensen, [http://www.cs.umass.edu/~mccallum/papers/iedatamining-ijcaiws03.pdf Extraction and Data Mining using Conditional-Probability Relational Models], IJCAI'03 Workshop on Learning Statistical Models from Relational Data, 2003
|Elliott Franco Drabek
::The paper is a survey of recent trends in IE and data mining (biased of course towards the authors' work) and a proposal to unify them with conditional random fields.
|Ravichandran, Hovy


[http://www.isi.edu/~ravichan/papers/P0351.pdf Learning Surface Text Patterns for a Question Answering System.]
=== Combinatorial optimization ===
;Apr. 17 (Elliott Franco Drabek)
ACL '2001
: Rina Dechter, [http://www.ics.uci.edu/~dechter/publications/r62.html Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning], 2001


|A similar paper
;Apr. 10 (Noah Ashton Smith)
: Denys Duchier, [http://www.ps.uni-sb.de/Papers/abstracts/duchier-mol6.html Axiomatizing Dependency Parsing Using Set Constraints], Sixth Meeting on Mathematics of Language, 2000


Lin, Pantel
;Apr. 3 (Roy Tromble)
: Roman Bartak, [http://kti.ms.mff.cuni.cz/~bartak/downloads/WDS99.pdf Constraint Programming: In Pursuit of the Holy Grail], 1999


[http://www.cs.ualberta.ca/~ppantel/Download/Papers/kdd01-1.pdf Discovery of Inference Rules for Question Answwering]
=== Learning how to search ===


|-
;Mar. 25 (Eric Goldlust)
|Oct. 17
: Boyan and Moore, [http://citeseer.ist.psu.edu/418699.html Learning Evaluation Functions to Improve Optimization by Local Search], Journal of Machine Learning Research, 2000
|David Smith
|Cotton, Bird


[http://arxiv.org/abs/cs/0204007 An Integrated Framework for Treebanks and Multilayer Annotations]
=== Discourse, summarization, paraphrase ===
LREC '2002


|-
;Mar. 18 (Markus Dreyer)
|Oct. 24
:Eugene Charniak, Niyu Ge, John Hale, [http://citeseer.ist.psu.edu/ge98statistical.html A Statistical Approach to Anaphora Resolution], Proceedings of the Sixth Workshop on Very Large Corpora, 1998
|Roy Tromble
|Han, Benjamin


[http://www.cs.cmu.edu/~benhdj/papers/bhan_naccl_2001.pdf Building a Bilingual Dictionary with Scarce Resources: A Genetic Algorithm Approach.]
;Mar. 5 (Charles Schafer)
: Daniel Marcu, Theory and Practice of Discourse Parsing and Summarization, Chapters 2 & 3, The MIT Press, 2000


|-
;Feb. 19 (David Smith)
|Nov. 1
: Barzilay and Lee, [http://people.csail.mit.edu/regina/my_papers/statpar.ps Learning to Paraphrase: An Unsupervise Approach Using Multiple-Sequence Alignment], HLT 2003
|Chalaporn Hathaidharm
|J.Gao, J.Goodman, M.Li, K.Lee


[http://www.microsoft.com/china/research/dload_files/g-nlps/NLPSP/talip01-4th.pdf Toward A Unified Approach To Statistical Language Modeling For Chinese]
=== Optimality theory ===
ACM Transactions on Asian Language Information Processing, Vol. 1, No. 1, pp 3-33. 2002.


|-
;Feb. 12 (Brock Pytlik)
|Nov. 7
: Bob Frank, Giorgio Satta, [http://www.cogsci.jhu.edu/faculty/frank/papers/ot-revised.pdf Optimality theory and the Generative Complexity of Constraint Violability], MIT Press
|Neda Khalili
|Yamamoto, Church


[http://acl.ldc.upenn.edu/J/J01/J01-1001.pdf Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus]
;Feb. 5 (Brock Pytlik)
:Jessica A. Barlow and Judith A. Gierut, [http://www.cs.jhu.edu/~cschafer/15241_1.pdf Optimality theory in phonological acquisition], Journal of Speech, Language and Hearing 42, 1999
Computational Linguistics '2001
:Paul Boersma, Joost Dekkers and Jeroen van de WeijerIntroduction.  In Optimality Theory: Phonology, Syntax and Acquisition, Oxford University Press 2000


|A relative paper:
== Fall 2003 ==


Kageura
;Dec. 12 (Paola Virga)
: Kamal Nigam and Rayid Ghani, [http://www.kamalnigam.com/papers/cotrain-CIKM00.pdf  Analyzing the Effectiveness and Applicability of Co-training], Ninth International Conference on Information and Knowledge Management 2000


[http://research.nii.ac.jp/~kyo/papers/qualico.ps Bigram Statistics Revisited A Comparative Examination of Some Statistical Measures in Morphological Analysis of Japanese Kanji Sequences]
;Nov. 20 (Noah A. Smith)
: Rebecca Hwa, Miles Osborne, Anoop Sarkar, Mark Steedman, [http://www.cogsci.ed.ac.uk/~osborne/icmlworkshop03.ps.gz Corrected Co-training for Statistical Parsers], ICML 2003


|-
;Nov. 13 (Markus Dreyer)
|Nov. 14
:Goldman and Zhou, [http://citeseer.nj.nec.com/goldman00enhancing.html Enhancing Supervised Learning with Unlabeled Data], ICML 2000
|Michelle Vanni
: An additional paper with some experiments:
|Hearst
: Clark, Curran and Osborne, [http://www.cogsci.ed.ac.uk/~osborne/conll03-cco.pdf Bootstrapping POS taggers using Unlabelled Data], CoNLL 2003


[http://www.sims.berkeley.edu/~hearst/papers/acl99/acl99-tdm.html Untangling Text Data Mining.]
;Nov. 6 (Brock Pytlik)
: Stuart M. Shieber, [http://www.eecs.harvard.edu/~shieber/Courses/Esslli2003/esslli-slides.pdf Transducers as a Substrate for Natural Language Processing]
  ACL '1999


|-
;Oct. 31 (Roy Tromble)
|Nov. 21
: Dekai Wu, [http://acl.ldc.upenn.edu/C/C90/C90-3045.pdf An algorithm for simultaneously bracketing parallel texts by aligning words], ACL 1995
|Silviu Cucerzan
|Ueda, Nakano, Ghahramani, Hinton


[http://www.cs.toronto.edu/~hinton/absps/ueda.html SMEM Algorithm for Mixture Models]
;Oct. 24 (Markus Dreyer)
: Stuart M. Shieber, Yves Schabes, [http://acl.ldc.upenn.edu/C/C90/C90-3045.pdf Synchronous Tree-Adjoining Grammars], Coling 1990
Neural Information Processing Systems '1998
: An additional closely related paper: Stuart M. Shieber, Yves Schabes, [http://acl.ldc.upenn.edu/W/W90/W90-0102.pdf Generation and Synchronous Tree-Adjoining Grammars], Fifth International Workshop on Natural Language Generation


|-
;Oct. 10 (David Smith)
|Dec.5
: Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 6-7, Blackwell (1989)
|Silviu Cucerzan
|Pearce


[http://www.cogs.susx.ac.uk/users/darrenp/academic/dphil/publications/data/Conferences/lrec2002/paper.pdf A Comparative Evaluation of Collocation Extraction Techniques. Darren Pearce.]
;Oct. 3 (Michelle Vanni)
: Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 4-6, Blackwell (1989)
Third International Conference on Language Resources and Evaluation. May. 2002


----
;Sep. 18 (David Smith)
: Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 2-3, Blackwell (1989)


D. Lin
;Sep. 11 (Elliott Franco Drabek)
: Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 1, Blackwell (1989)


[http://acl.ldc.upenn.edu/P/P99/P99-1041.pdf Automatic identification of non-compositional phrases.]
== Spring 2003 ==
In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 317--324.


;May 15 (Chal Haithaidharm)
: V. N. Vapnik, [http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Chapters 7B, 8, 9


|}
;May 8 (Noah Smith)
:V. N. Vapnik, [http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Chapters 6B - 7A


==  Summer 2002 ==
;May 1 (Noah Smith)
:V. N. Vapnik, [http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Chapters 5B - 6A


{| style="width:800px" border="1"
;Apr. 24 (Paola Virga)
!  width="10%"|Date/Time
:V. N. Vapnik, [http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Chapters 4B - 5A
!  width="10%"|Presenter
!  width="40%"|Paper(s)  
!  Supporting Papers/Notes


|-
;Apr. 17 (Roy Tromble)
|July. 24
:V. N. Vapnik, [http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory],Chapters 2B - 4A
|Michelle Vanni
|Merlo


[http://perun.si.umich.edu/clair/ACL02/ A Multilingual Paradigm for Automatic Verb Classification]
;Apr. 10
:V. N. Vapnik, [http://www.cscs.umich.edu/~crshalizi/reviews/vapnik-nature/ The Nature of Statistical Learning Theory], Intro and Chapters 1, 2A
ACL '2002


|-
;Mar.20 (Roy Tromble)
|July. 31
: Nikita Schmid, Ahmed Patel, [http://arXiv.org/abs/cs/0201008 Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data]
|Paola Virga
|Yamada, Knight


[http://acl.ldc.upenn.edu/P/P02/P02-1039.pdf A decoder for Syntax-based Statistical MT]
;Mar.6 (Paola Virga)
: Carl M. Kadie, Christopher Meek, David Heckerman,  [http://research.microsoft.com/~carlk/papers/cfw.htm A Collaborative Filtering System Using Posteriors Over Weights of Evidence], Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.
ACL '2002


|}
;Feb. 26 (Elliott Drabek)
: Steven Abney, [http://www.vinartus.net/spa/02a.pdf Bootstrapping], ACL'02


==  Spring 2002 ==
;Feb. 19 (Elliott Drabek)
: A. Lopez, M. Nossal, R. Hwa, P. Resnik, [http://www.umiacs.umd.edu/~hwa/lnhr02.ps Word-level Alignment for Multilingual Resource Acquisition], Proceedings of the 2002 LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data


{| style="width:800px" border="1"
;Feb. 13 (David Smith)
!  width="10%"|Date/Time
: K. Church, [http://www.research.att.com/~kwc/published_2000_Coling.pdf Empirical Estimates of Adaptation: The chance of Two Noriega's is closer to p/2 than p<sup>2</sup>], COLING 2000, pp. 173-179
!  width="10%"|Presenter
!  width="40%"|Paper(s)
!  Supporting Papers/Notes


|-
==  Fall 2002 ==
|Feb. 7
|Paola Virga
|Knight, Graehl


[http://citeseer.nj.nec.com/knight97machine.html Machine Transliteration]
;Jul. 31 (Paola Virga)
: Kenji Yamada, Kevin Knight, [http://acl.ldc.upenn.edu/P/P02/P02-1039.pdf A decoder for Syntax-based Statistical MT], ACL 2002
Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics


|-
;Jul. 24 (Michelle Vanni)
|Feb. 14
: Paola Merlo, [http://perun.si.umich.edu/clair/ACL02/ A Multilingual Paradigm for Automatic Verb Classification], ACL 2002
|Charles Schafer
|Yaser, Germann


[http://nlp.cs.jhu.edu/~cschafer/trans.ps Translating with Scarce Resources]
;Dec. 5 (Silviu Cucerzan)
: Darren Pearce, [http://www.cogs.susx.ac.uk/users/darrenp/academic/dphil/publications/data/Conferences/lrec2002/paper.pdf  A Comparative Evaluation of Collocation Extraction Techniques], LREC 2002
American Association for Arti�cial Intelligence 2000
: D. Lin, [http://acl.ldc.upenn.edu/P/P99/P99-1041.pdf  Automatic identification of non-compositional phrases], ACL 1999


|-
;Nov. 21 (Silviu Cucerzan)
|Feb. 21
: Ueda, Nakano, Ghahramani, Hinton, [http://www.cs.toronto.edu/~hinton/absps/ueda.html SMEM Algorithm for Mixture Models], Neural Information Processing Systems 1998
|Jia Cui
|Barzilay, McKeown


[http://citeseer.nj.nec.com/452341.html Extracting Paraphrases from a Parallel Corpus]
;Nov. 14 (Michelle Vanni)
: Marti Hearst, [http://www.sims.berkeley.edu/~hearst/papers/acl99/acl99-tdm.html Untangling Text Data Mining], ACL 1999
Computer Science Department Columbia.Univ.


|-
;Nov. 7 (Neda Khalili)
|Feb. 28
: Yamamoto, Church, [http://acl.ldc.upenn.edu/J/J01/J01-1001.pdf Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus], Computational Linguistics 2001
|Silviu Cucerzan
: A related paper: Kageura, [http://research.nii.ac.jp/~kyo/papers/qualico.ps Bigram Statistics Revisited A Comparative Examination of Some Statistical Measures in Morphological Analysis of Japanese Kanji Sequences]
|Marcu


[http://www.isi.edu/natural-language/projects/rewrite/transmem1.pdf Towards a Unified Approach to Memory- and Statistical-Based Machine Translation.]
;Nov. 1 (Chalaporn Hathaidharm)
: J. Gao, J. Goodman, M. Li, K. Lee, [http://www.microsoft.com/china/research/dload_files/g-nlps/NLPSP/talip01-4th.pdf Toward A Unified Approach To Statistical Language Modeling For Chinese], ACM Transactions on Asian Language Information Processing, Vol. 1, No. 1, pp 3-33. 2002.
Annual Meeting of the ACL, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics '2001


|-
;Oct. 24 (Roy Tromble)
|Mar. 14
: Han, Benjamin, [http://www.cs.cmu.edu/~benhdj/papers/bhan_naccl_2001.pdf Building a Bilingual Dictionary with Scarce Resources: A Genetic Algorithm Approach]
|Noah A. Smith
|Ratnaparkhi


[ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z A Simple Introduction to Maximum Entropy Models for NLP]
;Oct. 17 (David Smith)
:Cotton, Bird, [http://arxiv.org/abs/cs/0204007 An Integrated Framework for Treebanks and Multilayer Annotations], LREC 2002
Institute for Research in Cognitive Science, Univ. of Penn.


|-
;Oct. 8 (Elliott Franco Drabek)
|Mar. 28
: Ravichandran, Hovy, [http://www.isi.edu/~ravichan/papers/P0351.pdf Learning Surface Text Patterns for a Question Answering System], ACL 2001
|Swapna Somasundaran
: A similar paper: Lin, Pantel, [http://www.cs.ualberta.ca/~ppantel/Download/Papers/kdd01-1.pdf Discovery of Inference Rules for Question Answering], KDD 2001
|Crestan, El-Beze


[http://www.hcrc.ed.ac.uk/~sempro/papers/5.pdf Improving supervised WSD by including rough semantic features in a Multilevel view of the Context]
;Oct. 2 (Gideon Mann)
: Gildea, Jurafsky, [http://www.colorado.edu/ling/jurafsky/cl01.ps Automatic Labeling of Semantics Roles], ACL 2001
SEMPRO Workshop, Edinburgh, 2001.
|-
|Apr. 11
|Paola Virga
|Neal, Hinton


[http://www.gatsby.ucl.ac.uk/Hinton/chronological.html A view of the EM algorithm that justifies incremental, sparse, and other variants]
;Sep. 26 (Paul Ruhlen)
: Hwa, Resnik, Weinberg, Kolak, [http://acl.ldc.upenn.edu/P/P02/P02-1050.pdf Evaluating Translational Correspondence using Annotation Projection], ACL 2002
Learning in Graphical Models, 1999


|-
;Sep. 19 (Paola Virga)
|Apr. 18
: Yamada, Knight, [http://acl.ldc.upenn.edu/P/P02/P02-1039.pdf A decoder for Syntax-based Statistical MT], ACL 2002
|Paul Ruhlen
|NA. Rao, K. Rose


[http://scl.ece.ucsb.edu/html/papers_B.htm Deterministically annealed design of hidden Markov model speech recognizers]
;Sep. 10 (Noah A. Smith)
: Collins, Duffy., [http://www.research.att.com/~mcollins/papers/finalacl2002.ps New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron], ACL 2002
IEEE Trans. on Speech and Audio Processing, vol. 9, (no. 2), Feb. 2001


|following article builds on the Neal & Hinton paper that we read last week. It tests an incremental version of EM (carefully choosing how incremental it will be), as well as a "lazy EM" version that visits "significant" cases more often. [http://ipsapp008.lwwonline.com/content/getfile/4984/53/3/fulltext.pdf]
== Spring 2002 ==
 
|-
|Apr. 25
|Paul Ruhlen
|H. Al-Adhaileh, Kong, Melamed


[http://www.cs.nyu.edu/~melamed/ftp/papers/redecs01.pdf Malay-English Bitext Mapping and Alignment Using SIMR/GSA Algorithms]
;Apr. 25 (Paul Ruhlen)
:H. Al-Adhaileh, Kong, Melamed, [http://www.cs.nyu.edu/~melamed/ftp/papers/redecs01.pdf Malay-English Bitext Mapping and Alignment Using SIMR/GSA Algorithms], Malaysian National Conference on Research and Development on Lingustics 2001
Malaysian National Conference on Research and Development on Lingustics '2001


|}
;Apr. 18 (Paul Ruhlen)
: N. A. Rao, K. Rose, [http://scl.ece.ucsb.edu/html/papers_B.htm Deterministically annealed design of hidden Markov model speech recognizers], IEEE Trans. on Speech and Audio Processing, vol. 9, (no. 2), Feb. 2001


== Fall 2001 ==
;Apr. 11 (Paola Virga)
: Neal, Hinton, [http://www.gatsby.ucl.ac.uk/Hinton/chronological.html A view of the EM algorithm that justifies incremental, sparse, and other variants], Learning in Graphical Models, 1999
: And [http://ipsapp008.lwwonline.com/content/getfile/4984/53/3/fulltext.pdf this article] builds on the above. It tests an incremental version of EM (carefully choosing how incremental it will be), as well as a "lazy EM" version that visits "significant" cases more often.


{| style="width:800px" border="1"
;Mar. 28 (Swapna Somasundaran)
!  width="10%"|Date/Time
: Crestan, El-Beze, [http://www.hcrc.ed.ac.uk/~sempro/papers/5.pdf Improving supervised WSD by including rough semantic features in a Multilevel view of the Context], SEMPRO Workshop, Edinburgh, 2001.
!  width="10%"|Presenter
!  width="40%"|Paper(s)  
!  Supporting Papers/Notes


|-
;Mar. 14 (Noah A. Smith)
|Dec. 14
: Ratnaparkhi, [ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z A Simple Introduction to Maximum Entropy Models for NLP], Institute for Research in Cognitive Science, Univ. of Penn.
|Jia Cui
|Bellegarda


[http://ieeexplore.ieee.org/lpdocs/epic03/EarlierIssue.HTM?punumber=5&isyr=2000 Exploiting latent semantic information in statistical language models]
;Feb. 28 (Silviu Cucerzan)
: Marcu, [http://www.isi.edu/natural-language/projects/rewrite/transmem1.pdf Towards a Unified Approach to Memory- and Statistical-Based Machine Translation], Annual Meeting of the ACL, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics '2001
Proceedings of the IEEE , Volume: 88 Issue: 8 , Aug. 2000


|-
;Feb. 21 (Jia Cui)
|Nov. 29
: Barzilay, McKeown, [http://citeseer.nj.nec.com/452341.html Extracting Paraphrases from a Parallel Corpus], Computer Science Department, Columbia Univ.
|Silviu Cucerzan
|Mike Collins, Yoram Singer
[http://www.research.att.com/~mcollins/papers/emnlp99.ps Unsupervised Models for Named Entity Classification]
EMNLP/VLC'99


|-
;Feb. 14 (Charles Schafer)
|Nov. 20
: Yaser, Germann, [http://nlp.cs.jhu.edu/~cschafer/trans.ps Translating with Scarce Resources], American Association for Artificial Intelligence 2000
|Radu Florian
|Blum, Mitchell


[http://nlp.cs.jhu.edu/~rflorian/cotraining.ps Combining Labeled and Unlabeled Data with Co-Training]
;Feb. 7 (Paola Virga)
: Knight, Graehl, [http://citeseer.nj.nec.com/knight97machine.html Machine Transliteration], ACL-EACL 1997
Proceedings of 1998 Conference on Computational Learning Theory


|-
==  Fall 2001 ==
|Nov. 16
|Richard Wicentowski
|Eisner, Satta


[http://cs.jhu.edu/~jason/papers/#acl99 Efficient parsing for bilexical context-free grammars and head automaton grammars]
;Dec. 14 (Jia Cui)
: Jerome Bellegarda, [http://ieeexplore.ieee.org/lpdocs/epic03/EarlierIssue.HTM?punumber=5&isyr=2000 Exploiting latent semantic information in statistical language models], Proceedings of the IEEE,  88:8, Aug. 2000
ACL '99


| plagiarism detection systems might be relevant to bitext alignment. A message to the Corpora list yesterday announced the following review paper:[http://www.dcs.shef.ac.uk/~cloughie/papers/Plagiarism.pdf]
;Nov. 29 (Silviu Cucerzan)
: Mike Collins, Yoram Singer, [http://www.research.att.com/~mcollins/papers/emnlp99.ps Unsupervised Models for Named Entity Classification], EMNLP/VLC'99
|-
|Nov. 2
|Paul Ruhlen
|Manning, Schuetze


Foundations of Statistical Natural Language Processing, Section 14 on clustering, pp. 495-527.
;Nov. 20 (Radu Florian)
: Blum, Mitchell, [http://nlp.cs.jhu.edu/~rflorian/cotraining.ps Combining Labeled and Unlabeled Data with Co-Training], COLT 1998
MIT Press


|-
;Nov. 16 (Richard Wicentowski)
|Oct. 26
: Eisner, Satta, [http://cs.jhu.edu/~jason/papers/#acl99 Efficient parsing for bilexical context-free grammars and head automaton grammars], ACL 1999
|Gideon Mann
:: Plagiarism detection systems might be relevant to bitext alignment.  A message to the Corpora list yesterday announced the following review paper: [http://www.dcs.shef.ac.uk/~cloughie/papers/Plagiarism.pdf]
|Tishby, Pereira, Bialek


[http://www.arxiv.org/find/physics/1/au:+Pereira_F/0/1/0/all/0/1 The information bottleneck method]
;Nov. 2 (Paul Ruhlen)
:Manning, Schuetze, Foundations of Statistical Natural Language Processing, Section 14 (clustering), pp. 495-527, MIT Press
|The paper describes a clustering method which is a generalization of their earlier work on "Distributional Clustering of English Words" (pereira,tishby and lee '93).


|}
;Oct. 26 (Gideon Mann)
: Tishby, Pereira, Bialek, [http://www.arxiv.org/find/physics/1/au:+Pereira_F/0/1/0/all/0/1 The information bottleneck method]
:: The paper describes a clustering method which is a generalization of their earlier work on "Distributional Clustering of English Words" (Pereira, Tishby and Lee '93).

Latest revision as of 21:09, 8 November 2024

The Natural Language Processing reading group attempts to keep abreast of interesting research ideas and results that may be useful to us. We typically read and discuss one paper per week. All our past papers are listed below.

The reading group is listed every semester as a 1-credit course, 601.865 ("Selected Topics in NLP"). The instructor is Jason Eisner; contact him to get on the mailing list. At the first course meeting, we brainstorm a bunch of topics for the semester, and vote on which ones to pursue. We then spend about 4 weeks per topic. Although some topics are within NLP, many of them explore potentially relevant work from related fields such as machine learning and linguistics.

During the summer we usually catch up on the latest NLP conference papers.

Instructions on how to present in reading group.
Jason's advice on how to read a paper.
Other weekly reading groups led by CLSP faculty are listed on the CLSP wiki's Main Page.


Fall 2024

Wednesdays 12pm, Hackerman 306.

Adversarial exploitation of LLMs

Dec 4 (Henry Li)
Nicholas Carlini et al. (2024). Stealing Part of a Production Language Model. CIML.
Nov 20 (Sophia Hager)
Nicholas Carlini et al. (2023). Universal and Transferable Adversarial Attacks on Aligned Language Models.
Nov 13 (TJ Bai)
Nicholas Carlini et al. (2023). Are aligned neural networks adversarially aligned? NeurIPS.

Fine-tuning methods

Nov 6 (Brian Lu)
Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian (2024). Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping.
DiJia Su, Sainbayar Sukhbaatar, Michael Rabbat, Yuandong Tian, Qinqing Zheng (2024). Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces.
Oct 30 (Jiahui Li)
Alisa Liu et al. (2024). Tuning Language Models by Proxy. COLM.


Oct 23 (Leo Du)
Tobias Schnabel, Jennifer Neville (2024). Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization.

LLMs for scientific discovery

Oct 16 (Pristina W)
Chenglei Si et al. (2024). Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. CL.
Oct 9 (Yu Lu Liu)
Qingyun Wang, Doug Downey, Heng Ji, Tom Hope (2024). SCIMON : Scientific Inspiration Machines Optimized for Novelty. ACL.
Oct 2 (Cole Molloy)
Nils Dycke, Matej Zečević, Ilia Kuznetsov, Beatrix Suess, Kristian Kersting, Iryna Gurevych (2024). Diagnostic Reasoning in Natural Language: Computational Model and Application.

Training agentic workflows

Sep 25 (Nikhil Sharma)
Yucheng Jiang, Yijia Shao, Dekun Ma, Sina J. Semnani, Monica S. Lam (2024). Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations.
Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su (2024). HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models.
Sep 18 (Tom Wang)
Yongchao Chen, Jacob Arkin, Yilun Hao, Yang Zhang, Nicholas Roy, Chuchu Fan (2024).PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling.
Yuchi Liu, Jaskirat Singh, Gaowen Liu, Ali Payani, Liang Zheng (2024).Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization.
Sep 11 (Shepard Xia)
Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao (2023). Reflexion: Language Agents with Verbal Reinforcement Learning.
Sep 4
Group search-skim-nominate session.

Spring 2024

Probing and editing LLMs (mechanistic interpretability)

Apr 24 (Leo Du)
Survey and tutorial on positional embedding in Transformers.
Apr 17 (Zike Hu)
Peter Hase et al. (2023). Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models.ICLR.

Explicit reasoning within LLMs using neuro-symbolic methods

Apr 10 (Brian Lu)
Gabriel Poesia et al. (2023). Certified Deductive Reasoning with Language Models. arXiv.
Apr 3 (TJ Bai)
Ben Prystawski et al. (2023). Why think step by step? Reasoning emerges from the locality of experience. NeurIPS.
Mar 27 (Pristina Wang)
Lionel Wong & Gabriel Grand (2023). From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought. arXiv.

ML for managing LLM calls

Calibration, imputation, prompt learning, active learning, reinforcement learning, etc.

Mar 13 (Yixuan Wang)
Yecheng Jason Ma et al. (2023). Eureka: Human-Level Reward Design via Coding Large Language Models. arXiv.
Mar 6 (Jiahui Li)
Xinyuan Wang et al. (2023). PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization. arXiv.
Feb 28 (Tom Wang)
Fang, Meng, Yuan Li, and Trevor Cohn (2017). Learning how to active learn: A deep reinforcement learning approach. arXiv.

LLM decoding schemes

Feb 21 (Henry Li)
Schick et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. NeurIPS.
Feb 14 (Cole Molloy)
Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, Robert West (2024). Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access. arXiv.
(Optional) Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West (2023). Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning. EMNLP.
Feb 7 (Shepard Xia)
Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg (2023). Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. NeurIPS.

Fall 2023

Training language models on small corpora

Dec 6 (Ashi Garg)
David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal (2023). Trained on 100 million words and still in shape: BERT meets British National Corpus. EACL.
Inar Timiryasov, Jean-Loup Tastet (2023). Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty. CoNLL-CMCL Shared Task.
Nov 29 (Nikhil Sharma)
Lucas Georges Gabriel Charpentier, David Samuel (2023). Not all layers are equally as important: Every Layer Counts BERT. CoNLL-CMCL Shared Task.
Chengxu Zhuang, Evelina Fedorenko, Jacob Andreas (2023). Visual Grounding Helps Learn Word Meanings in Low-Data Regimes. arXiv.
Nov 15 (Cole Molloy)
Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang (2023). Call for Papers - The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus.
BabyLM Challenge website
Venkata S Govindarajan, Juan Diego Rodriguez, Kaj Bostrom, Kyle Mahowald (2023). Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways. CoNLL-CMCL Shared Task.

Psycholinguistics on Transformers

We also considered reading [1], [2], [3].
Nov 8 (Henry Li)
Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora (2023). Do Transformers Parse while Predicting the Masked Word? EMNLP.
Nov 1 (Cole Molloy)
Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, David Bau (2023). Mass-Editing Memory in a Transformer. ICLR.
Background: Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov (2023). Locating and Editing Factual Associations in GPT. NeurIPS.
Oct 25 (Yaohan Guan)
Evan Hernandez, Belinda Z. Li, Jacob Andreas (2023). Inspecting and Editing Knowledge Representations in Language Models. arXiv.

Machine learning for combinatorial optimization / AutoML

Oct 18
Discussion of AutoML topics
Oct 11 (Matthew Francis-Landau)
Yoshua Bengio, Andrea Lodi, and Antoine Prouvost (2021). Machine learning for combinatorial optimization: A methodological tour d’horizon. European Journal of Operational Research, 290(2):405-421.
Oct 4 (Jason Eisner)
Andrea Lodi and Giulia Zarpellon (2017). On learning and branching: a survey. TOP 25:207-236.

Connecting language models to world models

Some possible papers include [4], [5], [6], [7].
We also considered the related topic of connecting language models to reasoning: survey, repo of papers, tutorial.
Sep 27 (Sophia Hager)
Belinda Z. Li, Maxwell Nye, and Jacob Andreas (2023). Language Modeling with Latent Situations. ACL.
Sep 20 (Yaohan Guan)
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler (2023). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. AAAI.
Sep 13 (Brian Lu)
Ramsés J. Sánchez, Lukas Conrads, Pascal Welke, Kostadin Cvejoski, and César Ojeda (2023). Hidden Schema Networks. ACL.
Sep 6 (Jason Eisner)
Informal review of inference and optimization in graphical models.

Spring 2023

Apr 26 (Matthew Francis-Landau + Nikhil Sharma)
Business models for large LMs - is there a report from Forrester Research, Gartner Group, Deloitte, IDC, Frost & Sullivan, ... ?
Apr 19 (Tim Vieira)
Schulman et al. (2015). Trust Region Policy Optimization. ICML.
See also Schulman et al. (2017), Proximal Policy Optimization Algorithms.

Neurosymbolic Methods

Apr 12 (Matthew Francis-Landau)
Alex Gu, Tamara Mitrovska, Daniela Velez, Jacob Andreas, Armando Solar-Lezama (2022). ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications. arXiv.
Apr 5 (Brian Lu)
Hao Tang, Kevin Ellis (2022). From Perception to Programs: Regularize, Overparameterize, and Amortize. MAPS.

Making Transformers more efficient / Long-form generation

Mar 29 (Henry Li Xinyuan)
Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller (2020). Rethinking Attention with Performers
See also Tay et al. (2020), Long Range Arena: A Benchmark for Efficient Transformers, and Qin et al. (2023), The NLP Task Effectiveness of Long-Range Transformers.
Mar 15 (Sophia Hager)
Tri Dao, Daniel Y. Fu, Khaled K. Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré (2023). Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Mar 8 (Yunmo Chen)
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré (2022). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv.

Fancy Generative Models (normalizing/reversible flows, GANs, score-based/diffusion models, iterative editing, VAE, DPP, ...)

Mar 1 (Nikhil Sharma)
Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall (2023). DreamFusion: Text-to-3D using 2D Diffusion. ICLR.
Feb 22 (Yaohan Guan)
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, Yuan Cao (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR.
Antonia Creswell, Murray Shanahan, Irina Higgins (2023). Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning. ICLR.
Feb 8, Feb 15 (Leo Du)
Albert Gu, Karan Goel, Christopher Re (2022). Efficiently Modeling Long Sequences with Structured State Spaces. ICLR.
This one is a repeat. See also Sasha Rush and Sidd Karamcheti (2022), The Annotated S4 (blog post).

Current Trends

Feb 1
Sameer Singh, AI Trends 2023: Natural Language Proc – ChatGPT, GPT-4 and Cutting Edge Research (podcast with Sam Charrington).
Links to papers there and at [8].

Fall 2022

Dec 7
Qi Liu, Dani Yogatama, Phil Blunsom (2022). Relational Memory-Augmented Language Models. TACL.
Dani Yogatama, Cyprien de Masson d’Autume, Lingpeng Kong (2021). Adaptive Semiparametric Language Models. TACL.
Nov 30 (Matthew Francis-Landau)
Clark Barrett, Roberto Sebastiani, Sanjit A. Seshia and Cesare Tinelli (2008). Satisfiability Modulo Theories. Chapter 12 from Biere et al. (eds.), Handbook of Satisfiability.
Nov 16 (Guest: Jennifer White)
Jennifer White and Ryan Cotterell (2022). Equivariant Transduction through Invariant Alignment. COLING.
Nov 9
Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Ré (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections. arXiv.
Oct 26 (Leo Du)
Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison (2021). Oops I Took A Gradient: Scalable Sampling for Discrete Distributions. ICML.
Oct 19 (Sophia Sklaviadis)
Ongoing work on RNNG and VAE.
Oct 12 (Brian Lu)
Pengcheng Yin, Chunting Zhou, Junxian He, and Graham Neubig (2018). StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing. ACL.
Oct 5 (Suzanna Sia)
Review of VAE variants.
Sep 28
Timo Schick et al. (2022). PEER: A collaborative language model. arXiv.
Sep 21 (Lisa Li)
Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto (2022). Diffusion-LM Improves Controllable Text Generation. arXiv.
Sep 14
VAE discussion.
Sep 7
Calvin Luo (2022). Understanding Diffusion Models: A Unified Perspective. arXiv.

Summer 2022

Aug 24 (Leo Du)
Guy Emerson (2020). Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional Semantics. ACL.
See also Guy Emerson's thesis.
Jul 27, Aug 3 (Brian Lu)
Abulhair Saparov and Tom Mitchell (2022). Towards General Natural Language Understanding with Probabilistic Worldbuilding. TACL.
Jun 15 (Leo Du)
Goldblum, Geiping, et al. (2020). Truth or backpropaganda? An empirical investigation of deep learning theory. ICLR.

Spring 2022

May 4 (Felix Yu)
Xiujun Li et al. (2020). Oscar: Object-semantics aligned pre-training for vision-language tasks. ECCV.
Luowei Zhou et al. (2020). Unified vision-language pre-training for image captioning and VQA. AAAI.
Hu, Xiaowei, et al. (2020). VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning. arXiv.
Apr 20 (Tim Vieira)
Esparza, Kiefer, and Luttenberger (2007). An Extension of Newton’s Method to ω-Continuous Semirings. In Proceedings of the International Conference on Developments in Language Theory.
Apr 13 (Leo Du)
Andrew M. Saxe; James L. McClelland; Surya Ganguli (2014). Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. ICLR.
Andrew M. Saxe; James L. McClelland; Surya Ganguli (2019). A mathematical theory of semantic development in deep neural networks. PNAS.
Anthropic (2022). In-context Learning and Induction Heads.
Apr 6 (Suzanna Sia)
Anthropic (2021). A Mathematical Framework for Transformer Circuits.
Mar 30 (Jason Eisner)
Liang Huang, Suphan Fayong, & Yang Guo (2012). Structured Perceptron with Inexact Search. NAACL. slides
Mar 16 (Brian Lu)
Max Welling, Yee Whye Teh (2011). Bayesian Learning via Stochastic Gradient Langevin Dynamics. ICML.
Mar 9 (Cihan Xiao)
Mathias Niepert, Pasquale Minervini, and Luca Franceschi (2021). Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions. NeurIPS.
Relevant background: Papandreou and Yuille (2011). Perturb-and-MAP Random Fields: Using Discrete Optimization to Learn and Sample from Energy Models.
Mar 2 (Steven Tan)
Yang Song (2021). Generative Modeling by Estimating Gradients of the Data Distribution. Blog post.
Feb 23 (Sabrina Mielke)
Albert Gu, Karan Goel, Christopher Re (2022). Efficiently Modeling Long Sequences with Structured State Spaces. ICLR.
Feb 16 (Ryan Cotterell)
David Chiang, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, Bevan Jones, Kevin Knight (2013). Parsing Graphs with Hyperedge Replacement Grammars. ACL.
Feb 9 (Matthew Francis-Landau)
Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, Cynthia Rudin (2018). Learning Certifiably Optimal Rule Lists for Categorical Data. JMLR.
Feb 2 (Sophia Sklaviadis)
Yizhou Zhao, Liang Qiu, Wensi Ai, Feng Shi, Song-Chun Zhu (2020). Vertical-Horizontal Structured Attention for Generating Music with Chords. arXiv.

Fall 2021

Wednesdays 12pm, in Hackerman 306.

Dec 1 (Felix Yu)
Chandan Singh, W. James Murdoch, Bin Yu (2019). Hierarchical Interpretations for Neural Network Predictions. ICLR.
Xisen Jin, Zhongyu Wei, Junyi Du, Xiangyang Xue, Xiang Ren (2020). Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models. ICLR.
Nov 17 (Suzanna Sia)
Henri Prade and Gilles Richard (2021). Analogical Proportions: Why They Are Useful in AI IJCAI. Suz summary blogpost
Nov 10 (Leo Du)
Jingjing Xu, Hao Zhou, Chun Gan, Zaixiang Zheng, Lei Li (2021). Vocabulary Learning via Optimal Transport for Neural Machine Translation. ACL.
Nov 3 (Matthew Francis-Landau)
Song Han, Huizi Mao, William J. Dally. (2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. ICLR.
Oct 27 (Sophia Sklaviadis)
Jiong Cai, Yong Jiang, Kewei Tu (2017). CRF Autoencoder for Unsupervised Dependency Parsing. ACL.
Waleed Ammar, Chris Dyer, Noah A. Smith (2014). Conditional Random Field Autoencoders for Unsupervised Structured Prediction. NeurIPS.
Yu Zhang, Zhenghua Li, Min Zhang (2020). Efficient Second-Order TreeCRF for Neural Dependency Parsing. ACL.
Oct 20 (Chenghao Yang)
Weizhe Yuan, Graham Neubig, Pengfei Liu (2021). BARTScore: Evaluating Generated Text as Text Generation. NeurIPS.
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi (2020). BERTScore: Evaluating Text Generation with BERT. ICLR.
Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, Steffen Eger (2019). MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance. EMNLP.
Oct 13 (Brian Lu)
Patrice Y. Simard, Saleema Amershi, David M. Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, John Wernsing (2017). Machine Teaching: A New Paradigm for Building Machine Learning Systems. Arxiv.
Gonzalo Ramos, Christopher Meek, Patrice Simard, Jina Suh, Soroush Ghorashi (2020). Interactive machine teaching: a human-centered approach to building machine-learned models. Human-Computer Interaction.
Oct 6 (Steven Tan)
Jiatao Gu, Changhan Wang, Jake Zhao (2019). Levenshtein Transformer. NeurIPS.
Sep 29 (Jason Eisner)
Gabriel Peyre (2019). Optimal transport for machine learning (talk video). The Alan Turing Institute.
Sep 22 (Devanshu Singh)
Mehrad Moradshahi, Hamid Palangi, Monica S. Lam, Paul Smolensky, Jianfeng Gao (2019). HUBERT Untangles BERT to Improve Transfer across NLP Tasks. Arxiv
Yichen Jiang, Asli Celikyilmaz, Paul Smolensky, Paul Soulos, Sudha Rao, Hamid Palangi, Roland Fernandez, Caitlin Smith, Mohit Bansal, Jianfeng Gao (2021). Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization. NAACL.
Denis Kleyko, Mike Davies, E. Paxon Frady, Pentti Kanerva, Spencer J. Kent, Bruno A. Olshausen, Evgeny Osipov, Jan M. Rabaey, Dmitri A. Rachkovskij, Abbas Rahimi, Friedrich T. Sommer (2021). Vector Symbolic Architectures as a Computing Framework for Nanoscale Hardware. Arxiv.
Paul Soulos, Tom McCoy, Tal Linzen, Paul Smolensky (2019). Discovering the Compositional Structure of Vector Representations with Role Learning Networks. BlackboxNLP.
Hamid Palangi, Paul Smolensky, Xiaodong He, Li Deng (2018). Question-Answering with Grammatically-Interpretable Representations. AAAI.
Sep 15 (Hongyuan Mei)
Vardan Papyan, X.Y. Han, David L. Donoho (2020). Prevalence of Neural Collapse during the terminal phase of deep learning training. PNAS.
Sep 8 (Brian Lu)
Jonathan Lorraine, Paul Vicol, David Duvenaud (2020). Optimizing Millions of Hyperparameters by Implicit Differentiation. AISTATS. (mlr link/implementation)

Summer 2021

Aug 25 (Sabrina Mielke)
Max B Paulus, Chris J. Maddison, Andreas Krause (2021). Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator. ICLR. (forum/slides)
Aug 18 (Matthew Francis-Landau)
Anselm Paulus, Michal Rolínek, Vít Musil, Brandon Amos, Georg Martius (2021). CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints. ICML.
Aug 4 (Tim Vieira)
Jialin Song, Yuxin Chen, Yisong Yue (2019). A General Framework for Multi-fidelity Bayesian Optimization with Gaussian Processes. AISTATS.
July 28 (Chu-Cheng Lin)
Belinda Z. Li, Maxwell Nye, Jacob Andreas (2021). Implicit Representations of Meaning in Neural Language Models. ACL.
July 21 (Leo Du)
Discussion of graph signal processing (graph fourier transform, graph convolutions etc.).
July 14 (Chenghao Yang)
Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, Stefanie Jegelka (2021). Contrastive Learning with Hard Negative Samples. ICLR.
Recommended readings: Lilian Weng's blogposts about Contrastive Representation Learning, SimCLR(ICML'20), A Theoretical Analysis of Contrastive Unsupervised Representation Learning(ICML'19), and PU-learning(KDD'08).
July 7 (Matthew Francis-Landau)
Qiaochu Chen, Aaron Lamoreaux, Xinyu Wang, Greg Durrett, Osbert Bastani, Isil Dillig (2021) Web question answering with neurosymbolic program synthesis PLDI
Jun 23 (Sabrina Mielke)
Explanation of BPE (Gage, 1994; Sennrich et al., 2016) and Unigram LM (Kudo, 2018) subword tokenizers, leading into modeling of the unigram (word) distribution as a two-stage process a la Goldwater et al. (2006), neuralized by Nikkarinen et al. (2021).
Jun 16 (Suzanna Sia)
Kawin Ethayarajh, Dan Jurafsky (2021). Attention flows are Shapley value explanations. ACL.
Jun 2 (Brian Lu)
Muhammad Khalifa, Hady Elsahar, Marc Dymetman (2021). A distributional approach to controlled text generation. ICLR.
May 26 (Ryan Cotterell)
Discussion of group-equivariant architectures.
May 19 (Hongyuan Mei)
Yuval Atzmon, Felix Kreuk, Uri Shalit, Gal Chechik (2020). A causal view of compositional zero-shot recognition. NeurIPS.

Spring 2021

May 5 (Tim Vieira)
Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, Thomas Neumann (2017). Cardinality Estimation Done Right: Index-Based Join Sampling. Conference on Innovative Data Systems Research.
Apr 28 (Matthew Francis-Landau)
Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, Swarat Chaudhuri (2020). Learning differentiable programs with admissible neural heuristics. NeurIPS.
Apr 21 (Nathaniel Weir)
Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, Joshua B. Tenenbaum (2020). DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning. arXiv. supplement
Apr 14 (Xiang Lisa Li)
Yang Song, Stefano Ermon (2019). Generative Modeling by Estimating Gradients of the Data Distribution. NeurIPS.
Apr 7 (Chenghao Yang)
Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, and Lingpeng Kong (2021). Random feature attention. ICLR.
Mar 31 (Suzanna Sia)
Siddhant M. Jayakumar et al. Multiplicative interactions and where to find them. ICLR.
Mar 24 (Ryan Cotterell)
K. Vijay-Shanker and David J. Weir (1989). Recognition of combinatory categorial grammars and linear indexed grammars. ACL.
Mar 17 (Ryan Cotterell)
Dasgupta, Papadimitriou, & Vazirani (2006). Quantum algorithms. Chapter 10 of Algorithms. McGraw Hill.
See also blog posts by Tai-Danae Bradley.
Mar 10 (Sabrina Mielke)
MCMC pt. 3 (RJMCMC).
Philippe Gagnon & Arnaud Doucet (2020). Non-reversible jump algorithms for Bayesian nested model selection. J. of Computational and Graphical Statistics. slides
Mar 3 (Sabrina Mielke)
MCMC pt. 2 (Irreversible chains, continuous transforms and Jacobians).
Feb 24 (Sabrina Mielke)
MCMC pt. 1 (Markov chains, balance condition).
Span Spanbauer, Cameron Freer, Vikash Mansinghka (2020). Deep involutive generative models for neural MCMC. arXiv.
Marco Cusumano-Towner, Alexander K. Lew, Vikash K. Mansinghka (2020). Automating involutive MCMC using probabilistic and differentiable programming. arXiv.
Kirill Neklyudov, Max Welling, Evgenii Egorov, Dmitry Vetrov (2020). Involutive MCMC: A unifying framework. arXiv.
Feb 17 (Devanshu Singh)
Bellanger and McCallum (2016). Structured prediction energy networks. ICML.
Feb 10 (Brian Lu)
Sachan and Xing (2017). Learning to solve geometry problems from natural language demonstrations in textbooks. *SEM.
Feb 3 (Chu-Cheng Lin & Matthew Francis-Landau)
Wang et al. (2019). SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver ICML
Jan 27 (Tim Vieira)
Anna Harutyunyan et al. (2019). Hindsight credit assignment. NeurIPS.
Jan 20 (Ryan Cotterell)
Andrew Drozdov, Pat Verga, Mohit Yadav, Mohit Iyyer, Andrew McCallum (2019). Unsupervised latent tree induction with deep inside-outside recursive autoencoders. NAACL.
Jan 13 (Suzanna Sia)
Simon Du, Wei Hu (2019). Ultra-Wide deep nets and the neural tangent kernel (NTK). Blog post summarizing multiple papers.
Jan 6 (Jason Eisner)
Nando de Freitas, Pedro Højen-Sørensen, Michael I. Jordan, Stuart Russell (2001). Variational MCMC. UAI.
Ardavan Saeedi, Tejas D. Kulkarni, Vikash K. Mansinghka, Samuel J. Gershman (2017). Variational particle approximations. JMLR.
Christian A. Naesseth, Scott W. Linderman, Rajesh Ranganath, David M. Blei (2018). Variational sequential Monte Carlo. AISTATS.

Fall 2020

Understanding Neural Magic

Dec 10 (Xiao Liu)
Elena Voita, Ivan Titov (2020). Information-Theoretic Probing with Minimum Description Length. ACL.
Dec 3 (Aaron Mueller)
Jesse Vig*, Sebastian Gehrmann*, Yonatan Belinkov*, Sharon Qian, Daniel Nevo, Simas Sakenis, Jason Huang, Yaron Singer, Stuart Shieber (2020). Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias. arXiv.

Multimodal Fusion

Nov 12 (Suzanna Sia)
Weiyao Wang, Du Tran, and Matt Feiszli (2020). What Makes Training Multi-modal Classification Networks Hard? CVPR.

Music Modeling

Oct 29 (Amrit Nidhi)
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck (2018). Music Transformer : Generating Music with Long Term Structure. arXiv. music samples, interactive demo

Neuro-Symbolic Hybrids

Oct 15 (Ankur Kejriwal)
Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, and Jiajun Wu (2019). The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, And Sentences From Natural Supervision. ICLR.
Oct 8 (Tim Vieira)
Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, Josh Tenenbaum (2018). Learning to Infer Graphics Programs from Hand-Drawn Images. NeurIPS.

Neural Nearest Neighbor Methods

Organizer: Brian Lu

Oct 1 (Anton Belyy)
Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel and Douwe Kiela (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv.
Sep 24 (Matthew Francis-Landau)
Yu. A. Malkov and D. A. Yashunin (2018). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World (HNSW) graphs. arXiv.
Sep 17 (Brian Lu)
Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer and Mike Lewis (2020). Generalization through Memorization: Nearest Neighbor Language Models. ICLR.

Parsing as Tagging

Sep 10 (Jason Eisner)
Nikita Kitaev and Dan Klein (2020). Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference. ACL.

Summer 2020

Special Session on GPT-3

Jun 10 (Aaron Mueller, João Sedoc, Patrick Xia)
OpenAI (2020). Language Models are Few-Shot Learners. ArXiv.

Spring 2020

More Efficient Transformers

Organizer: Patrick Xia

May 13 (Rachel Wicks)
Beltagy, Peters, Cohan (2020). Longformer: The Long-Document Transformer. Arxiv
Apr 28 (Arya McCarthy) (practice talk)
McCarthy, Li, Gu, Dong (2020). Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation. ACL
Apr 22 (Patrick Xia)
Kitaev, Kaiser, Levskaya (2020). Reformer: The Efficient Transformer. ICLR (Slides from this session)

Controlled Text Generation

Organizer: Mitchell Gordon

Apr 15 (Zili Huang)
Holtzman, Buys, Du, Forbes, Choi (2020). The Curious Case of Neural Text Degeneration. ICLR
Welleck, Kulikov, Roller, Dinan, Cho, Weston (2020). Neural Text Generation With Unlikelihood Training ICLR
Apr 8 (Nathaniel Weir)
Shu, Nakayama, Cho (2019). Generating Diverse Translations with Sentence Codes. ACL
Apr 1 (Mitchell Gordon)
Dathathri, Madotto, Lan, Hung, Frank, Molino, Yosinski, Liu (2020). Plug and Play Language Models: A Simple Approach to Controlled Text Generation. ICLR

Human-In-The-Loop / Active Learning

Organizer: Anton Belyy

Mar 25 (Joshua Miller)
Hu, Lipton, Anandkumar, Ramanan (2019). Active Learning with Partial Feedback. ICLR
Mar 11 (Anton Belyy)
Yuan, Zhang, Van Durme, Findlater, Boyd-Graber (2019). Interactive Refinement of Cross-Lingual Word Embeddings. Arxiv
Mar 4 (Anton Belyy)
Ribeiro, Singh, Guestrin (2018). Semantically Equivalent Adversarial Rules for Debugging NLP Models. ACL

Hyperbolic Deep Learning

Organizer: Desh Raj

Feb 26 (Arya McCarthy)
Xu, Durrett (2018). Spherical Latent Spaces for Stable Variational Autoencoders. EMNLP
Feb 12 (Suzanna Sia)
Meng, Huang, Wang, Zhang, Zhuang, Kaplan, Han (2019). Spherical Text Embeddings. NeurIPS
Feb 5 (Desh Raj)
Nickel, Kiela (2017). Poincaré Embeddings for Learning Hierarchical Representations. NeurIPS

Fall 2019

Current Happenings

Organizer: Various

Dec 4 (Various)
ACL Paper Reading and Feedback.
Nov 20 (João Sedoc)
Raffel et al. (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiV.
Bowman (2019). Google T5 Explores the Limits of Transfer Learning. Synced Review (Blog Post).
Nov 13 (Various)
EMNLP Favorites presented by those who were able to attend the conference in-person.

Model Fairness and Interpretability

Organizer: Keith Harrigian

Nov 6 (Rachel Wicks)
Gonen & Goldberg (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. NAACL.
Oct 30 (Alexandra DeLucia)
Ribeiro, Singh, & Guestrin (2016). "Why Should I Trust You?" Explaining the Predictions of Any Classifier. KDD.
Lundberg & Lee (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS.
Oct 23 (Keith Harrigian)
Lipton (2016). The Mythos of Model Interpretability. ACM.
Serrano & Smith (2019). Is Attention Interpretable. ACL.
Jain & Wallace (2019). Attention is not Explanation. NAACL.
Wiegreffe & Pinter (2019). Attention is not not Explanation. EMNLP.
Oct 16 (Suzanna Sia)
Shen et al. (2019). Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks. ICLR.
Dyer, Melis, & Blunsom (2019). A Critical Analysis of Biased Parsers in Unsupervised Parsing. ArXiV.

Spectral Learning

Organizer: Sabrina J. Mielke

Sep 25 (Desh Raj)
Guillaume Rabusseau, Tianyu Li, Doina Precup (2019). Weighted Automata and Recurrent Neural Networks through Spectral Learning. AISTATS.
Comments: https://docs.google.com/document/d/1CgE65xFCDty3gZepKxBjOAV8Dd7dS10I2okw2jDb-EE/edit?usp=sharing
Sep 18 (Sabrina J. Mielke)
Borja Balle, Ariadna Quattoni, and Xavier Carreras (2014). Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars (tutorial slides), sections 1-2. EMNLP.
The main corresponding paper:
Balle, Carreras, Luque, Quattoni (2014), Spectral learning of weighted automata: A forward-backward perspective. That part of the tutorial will recap FSAs, introduce Hankel matrices and motivate the correspondence to FSAs and give a general estimation recipe.
If time permits, we can move on to a recent extension of this work that makes stuff work well:
Quattoni and Carreras (2019), Interpolated Spectral N-Gram Language Models. ACL.
Comments: https://docs.google.com/document/d/1-MSFlhhNyLfkK-I01VQI05f4F5gJyruHDNU3KBRXwqU

Spring 2019

This bit of the wiki got lost in a disk crash, but we should reconstruct the paper list from the emails at the time.

Causal NLP

Organizers: Suzanna Sia and Zach Wood-Doughty

Dataset Shift for NLP / Changepoint Detection

Organizer: Desh Raj

Grounded Language

Organizer: Mitchell Gordon

What's Learned During Representation Learning?

Organizer: Shijie Wu

Multitask/Transfer Learning for NLP

Organizers: Fei Wu and Oliver Adams

Fall 2018

Random Interesting Papers

Dec 5 (David Mueller)
Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum (2018). Linguistically-Informed Self-Attention for Semantic Role Labeling. EMNLP.
Nov 28 (Hao Zhu and Chu-Cheng Lin)
Hao Peng, Roy Schwartz, Sam Thomson, and Noah A. Smith (2018). Rational Recurrences. EMNLP.
Nov 14 (Arya McCarthy)
Olivia Winn and Smaranda Muresan (2018). ‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions. ACL 2018
Noga Zaslavsky, Charles Kemp, Terry Regier, Naftali Tishby (2018). Efficient compression in color naming and its evolution. PNAS 2018
Nov 7
EMNLP debrief

ML Scholarship

Organizer: Patrick Xia

Oct 31 (Xuan Zhang)
Yoav Goldberg (2017). An Adversarial Review of "Adversarial Generation of Natural Language" Medium blog post. Yann Lecun's response (on Facebook) and Yoav's response to that (on Medium). Original papers referenced - Adversarial Generation of Natural Language (Rejeswar et al., 2017, arXiv) and Toward Controlled Generation of Text (Hu et al., 2017, ICML)
Joshua Goodman (2002). [Extended Comment on Language Trees and Zipping https://arxiv.org/pdf/cond-mat/0202383.pdf]. Extended version of Comment submitted to Physical Review Letters. On J. Goodman's comment to Language Trees and Zipping (the response).
Oct 24 (Patrick Xia)
Zachary Lipton and Jacob Steinhardt (2018). Troubling Trends in Machine Learning Scholarship. ICML Debates 2018.
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young (2015). Hidden Technical Debt in Machine Learning Systems. NeurIPS 2015

Deep Generative Modeling

Organizer: Sabrina J. Mielke

Oct 17 (Kelly Marchisio)
"Generating Sentences from a Continuous Space," https://arxiv.org/abs/1511.06349, Bowman et al. (2016).
Oct 10 (Suzanna Sia)
"Document Neural Autoregressive Distribution Estimation," http://www.jmlr.org/papers/volume18/16-017/16-017.pdf, Lauly, S., Zheng, Y., Allauzen, A., & Larochelle, H. (2017).
Comments: https://docs.google.com/document/d/12F8uLt5vEm-Ctou1XrtVctLWSmTePHH1BuE6uztFoNc/edit?usp=sharing
Sep 20 (Sabrina J. Mielke)
"Neural Autoregressive Distribution Estimation," http://www.jmlr.org/papers/volume17/16-272/16-272.pdf, Uria, B., Côté, M. A., Gregor, K., Murray, I., & Larochelle, H. (2016).

Test of Time Award Papers

Organizer: Arya McCarthy

Sep 19 (Desh Raj)
Michael Collins (2002). Disciminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
Sep 12 (David Mueller)
Regina Barzilay and Mirella Lapata (2005). Modeling Local Coherence: An Entity-Based Approach. ACL 2005.
Sep 5 (Arya McCarthy)
Dan Roth and Wen-tau Yih (2004). A Linear Programming Formulation for Global Inference in Natural Language Tasks. CoNLL 2004.

Summer 2018

Aug 23 (Chenxi Liu)
Adam Santoro, Felix Hill, David Barrett, Ari Morcos, and Timothy Lillicrap (2018). Measuring abstract reasoning in neural networks. ICML 2018.
Aug 16 (Sebastian Mielke)
André F. T. Martins and Ramón F. Astudillo (2016). From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. ICML 2016.
Vlad Niculae, André F. T. Martins, Mathieu Blondel, and Claire Cardie (2018). SparseMAP: Differentiable Sparse Structured Inference. ICML 2018.
Aug 9 (Jacob Buckman)
Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom (2018). Neural Arithmetic Logic Units. arXiv.
Aug 2 (Garrett Nicolai)
Daniel Deutsch, John Hewitt and Dan Roth (2018). A Distributional and Orthographic Aggregation Model for English Derivational Morphology. ACL. slides
Jul 26
ACL debriefing session.
Jul 19 (Chu-Cheng Lin)
Chu-Cheng Lin and Jason Eisner (2018). Neural Particle Smoothing for Sampling from Conditional Sequence Models. NAACL. poster
Jul 12 (Xuan Zhang)
Yoshua Bengio, Jerome Louradour, Ronan Collobert, and Jason Weston (2009). Curriculum Learning. ICML. slides
Jul 5 (Pamela Shapiro)
Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer (2018). Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. NAACL.
Jun 29 (Xuan Zhang)
Alane Suhr, Srinivasan Iyer, and Yoav Artzi (2018). Learning to Map Context-Dependent Sentences to Executable Formal Queries. NAACL (outstanding paper award). slides
Jun 21 (Arya McCarthy)
Matthew E. Peters et al. (2018). Deep Contextualized Word Representations. NAACL (outstanding paper award).
This is the ELMo paper.
Jun 14 (Sebastian Mielke)
Chaitanya Malaviya, Matthew R. Gormley, and Graham Neubig (2018). Neural Factor Graph Models for Cross-lingual Morphological Tagging. ACL.
Bonus paper: Austin Matthews, Graham Neubig, and Chris Dyer (2018). [http://aclweb.org/anthology/N18-1130 Using Morphological Knowledge in Open-Vocabulary Neural Language Models>. NAACL.
Jun 8
NAACL debriefing session.

Spring 2018

Optimal Transport

Organizer: Matthew Francis-Landau

May 3 (Patrick Xia)
Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, Bernhard Schoelkopf (2018). Wasserstein Auto-Encoders. ICLR.
Apr 26 (Chu-Cheng Lin)
Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun(2017). Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction. EMNLP.
Apr 19 (Matthew Francis-Landau)
Gabriel Peyré and Marco Cuturi (2018). Computational Optimal Transport, sections 2-2.3, 6-6.2, 4.2 and 9.1. resources slides

Inference Networks / Stochastic Inversion

Organizer: Sebastian Mielke

Apr 12 (Annabelle Carrell)
Lifu Tu and Kevin Gimpel (2018). Learning Approximate Inference Networks for Structured Prediction. ICLR.
Apr 5 (Shijie Wu)
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu (2017).. Neural Discrete Representation Learning. NIPS.
Mar 28 (Sebastian Mielke)
Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song (2018). Syntax-Directed Variational Autoencoder for Structured Data. ICLR.

Cooperative Dialog and Emergence of Language

Organizers: Patrick Xia and Tom McCoy

Mar 15 (Annabelle Carrell)
He He, Anusha Balakrishnan, Mihail Eric, and Percy Liang (2017). Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings. ACL.
Mar 8 (Tom McCoy)
Florencia Reali, Nick Chater, and Morten H. Christiansen (2018). Simpler grammar, larger vocabulary: How population size affects language. Proceedings of the Royal Society B.
Simon Kirby, Hannah Cornish, and Kenny Smith (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. PNAS.
Mar 1 (Patrick Xia)
Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni (2017). Multi-Agent Cooperation and the Emergence of (Natural) Language. 2017. ICLR.
Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra (2017). Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog. EMNLP.

Computational Historical Linguistics

Organizer: Arya McCarthy

Feb 22 (Tom McCoy)
William A. Hamilton, Jure Leskovec, and Dan Jurafsky (2016). Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. ACL.
Feb 15 (Arya McCarthy)
David Hall and Dan Klein (2010). Finding Cognate Groups using Phylogenies. ACL.
Feb 8 (Arya McCarthy)
Lyle Campbell (2013). Historical Linguistics: An Introduction, chapter 5. (See also chapter 1.)

Fall 2017

Inducing "Syntax" for Semantics

Organizer: Adam Poliak

Dec 14
NIPS debriefing session
Nov 30 (Chu-Cheng Lin)
Franklin Chang, Gary S. Dell, and Kathryn Bock (2006). Becoming syntactic. Psychological Review. followup
Nov 16 (Adam Poliak)
Gormley, Mitchell, Van Durme, Dredze (2014). Low-resource semantic role labeling. ACL.
Williams, Drozdov, Bowman (2018) Learning to parse from a semantic objective: It works. Is it syntax?. TACL.
Other suggested papers
Swabha Swayamdipta, Sam Thomson, Chris Dyer, and Noah A. Smith (2017). Frame-semantic parsing with softmax-margin segmental RNNs and a syntactic scaffold. arXiv.
Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer (2017). Deep Semantic Role Labeling: What Works and What's Next. ACL.

Evaluation Metrics

Organizer: Pamela Shapiro

Nov 9 (Becky Marvin)
Philipp Koehn (2004). Statistical Significance Tests for Machine Translation Evaluation. EMNLP.
Ying Zhang, Stephan Vogel, and Alex Waibel (2004). Interpreting BLEU/NIST Scores: How Much Improvement Do We Need to Have a Better System? LREC.
Nov 2 (Pamela Shapiro)
Chris Callison-Burch, Miles Osborne, and Philipp Koehn (2006). Re-evaluating the Role of BLEU in Machine Translation Research. EACL.
Yvette Graham, Timothy Baldwin, Alistair Moffat, and Justin Zobel (2014). Is Machine Translation Getting Better over Time? EACL.
Oct 19 (Harrison Huh)
Neha Nayak, Gabor Angeli, and Christopher D. Manning (2016). Evaluating Word Embeddings Using a Representative Suite of Practical Tasks. ACL.
Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer (2016). Problems With Evaluation of Word Embeddings Using Word Similarity Tasks. ACL.

Derivational Morphology

Organizer: Arya McCarthy

Oct 26 (Garrett Nicolai)
Angeliki Lazaridou, Marco Marelli, Roberto Zamparelli, and Marco Baroni (2013). Compositional-ly (sic) Derived Representations of Morphologically Complex Words in Distributional Semantics. ACL.
Max Kisselew, Sebastian Pado, Alexis Palmer, and Jan Snajder (2015). Obtaining a Better Understanding of Distributional Models of German Derivational Morphology. Proceedings of the 11th International Conference on Computational Semantics.
Oct 12 (Shijie Wu)
Noam Chomsky (1968). Remarks on Nominalization. Linguistics Club, Indiana University.
Oct 5 (Arya McCarthy)
Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, and David Yarowsky (2017). Paradigm Completion for Derivational Morphology. EMNLP.
Ekaterina Vylomova, Ryan Cotterell, and Timothy Baldwin (2016). Context-Aware Prediction of Derivational Word-forms. ACL.

Meaning Representation Formalisms

Organizer: Sebastian Mielke Paper ideas and suggestions: Google doc

Sep 28 (Seth Ebner)
Baldridge and Kruijff (2002). Coupling CCG and Hybrid Logic Dependency Semantics. ACL.
Sep 21 (Brian Leonard)
Emily Bender, Dan Flickinger, Stephan Oepen, Woodley Packard, and Ann Copestake (2015). Layers of Interpretation: On Grammar and Compositionality. 11th International Conference on Computational Semantics.
Sep 14 (Sebastian Mielke)
Angelina Ivanova, Stephan Oepen, Lilja Øvrelid, and Dan Flickinger (2012). Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies. 6th Linguistic Annotation Workshop.
Omri Abend and Ari Rappoport (2017). The State of the Art in Semantic Representation. ACL.

Fall 2017

Thursdays 12-1:15pm, Hackerman 306.

Inducing "Syntax" for Semantics

Organizer: Adam Poliak

Suggestions

1. Swayamdipta, Thomson, Dyer, Smith (2017) Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold. Arxiv (Chu-Cheng)

2. He, Lee, Lewis, Zettlemoyer (2017) Deep Semantic Role Labeling: What Works and What's Next. ACL

Dec 7
Nov 30 (Chu-Cheng)

Swayamdipta, Thomson, Dyer, Smith (2017) Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold. Arxiv

Nov 16 (Adam Poliak)
Gormley, Mitchell, Van Durme, Dredze (2014) Low-Resource Semantic Role Labeling. ACL
Williams, Drozdov, Bowman (2017) Learning to parse from a semantic objective: It works. Is it syntax?. TACL Submission

Evaluation Metrics

Organizer: Pamela Shapiro

Nov 9 (Becky Marvin)
Nov 2 (Pamela Shapiro)
Oct 19 (Harrison Huh)
Neha Nayak, Gabor Angeli, and Christopher D. Manning (2016). Evaluating Word Embeddings Using a Representative Suite of Practical Tasks. ACL.
Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer (2016). Problems With Evaluation of Word Embeddings Using Word Similarity Tasks. ACL.

Derivational Morphology

Organizer: Arya McCarthy

Oct 26 (Garrett Nicolai)
Angeliki Lazaridou, Marco Marelli, Roberto Zamparelli, and Marco Baroni (2013). Compositional-ly (sic) Derived Representations of Morphologically Complex Words in Distributional Semantics. ACL.
Max Kisselew, Sebastian Pado, Alexis Palmer, and Jan Snajder (2015). Obtaining a Better Understanding of Distributional Models of German Derivational Morphology. Proceedings of the 11th International Conference on Computational Semantics.
Oct 12 (Shijie Wu)
Noam Chomsky (1968). Remarks on Nominalization. Linguistics Club, Indiana University.
Oct 5 (Arya McCarthy)
Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, and David Yarowsky (2017). Paradigm Completion for Derivational Morphology. EMNLP.
Ekaterina Vylomova, Ryan Cotterell, and Timothy Baldwin (2016). Context-Aware Prediction of Derivational Word-forms. ACL.

Meaning Representation Formalisms

Organizer: Sebastian Mielke

Paper ideas and suggestions: Google doc

Sep 28 (Seth Ebner)
Baldridge and Kruijff (2002). Coupling CCG and Hybrid Logic Dependency Semantics. ACL.
Sep 21 (Brian Leonard)
Emily Bender, Dan Flickinger, Stephan Oepen, Woodley Packard, and Ann Copestake (2015). Layers of Interpretation: On Grammar and Compositionality. 11th International Conference on Computational Semantics.
Sep 14 (Sebastian Mielke)
Angelina Ivanova, Stephan Oepen, Lilja Øvrelid, and Dan Flickinger (2012). Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies. 6th Linguistic Annotation Workshop.
Omri Abend and Ari Rappoport (2017). The State of the Art in Semantic Representation. ACL.

Summer 2017

August 25 (Jason/group)
Jacob Andreas, Anca Dragan, Dan Klein (2017). Translating Neuralese. ACL.
August 18 (Keisuke Sakaguchi)
Keisuke Sakaguchi, Matt Post, Benjamin Van Durme (2017). Error-repair Dependency Parsing for Ungrammatical Texts. ACL. slides.
Second segment (Various)
Ilya Sutskever (2013). Training Recurrent Neural Networks. University of Toronto.
First segment (Various)
Percy Liang (2011). Learning Dependency-Based Compositional Semantics. UC Berkeley.

(other dissertations considered for discussion)

Spring 2017

Point Processes

Organizer: Ryan Cotterell

May 4 (Keisuke Sakaguchi)
Alex Kulesza and Ben Taskar (2010). Structured Determinantal Point Processes. NIPS.
Apr 27 (Hongyuan Mei)
Hongyuan Mei and Jason Eisner (2016). The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process. arXiv.
Apr 20 (Ryan Cotterell)
Ben Taskar. Determinantal Point Processes (tutorial).

Transfer Learning

Organizer: Becky Marvin

Apr 13 (Chu-Cheng Lin)
Mikhail Kozhevnikov and Ivan Titov (2013). Cross-lingual Transfer of Semantic Role Labeling Models. ACL.
Apr 6 (Xiaochen Li)
Oscar Tackstrom, Ryan McDonald, and Jakob Uszkoreit (2012). Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure. NAACL.
Mar 30 (Becky Marvin)
Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight (2016). Transfer Learning for Low-Resource Neural Machine Translation. EMNLP.

Dialog

Applications of the two previous topics below. Organizer: Patrick Xia.

Mar 16 (Matthew Francis-Landau)
Tsung-Hsien Wen (2016). A Network-based End-to-End Trainable Task-oriented Dialogue System.
Mar 9 (Patrick Xia)
Jiwei Li (2017). Adversarial Learning for Neural Dialogue Generation.

Deep Reinforcement Learning

Organizers: Hongyuan Mei, Tim Vieira

Mar 2 (Shijie Wu)
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel (2016). Value Iteration Networks. NIPS.
David Silver's lecture on value iteration from his course might be helpful.
Feb 23 (Hongyuan Mei and Tim Vieira)
David Silver (2016). Tutorial: Deep Reinforcement Learning. ICML. Slides, video.

Generative Adversarial Nets

Organizers: Ryan Cotterell, Dingquan Wang

Feb 9 and Feb 16 (Ryan Cotterell, Dingquan Wang)
Ian Goodfellow (2016). NIPS 2016 Tutorial: Generative Adversarial Networks. slides
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014). Generative Adversarial Nets. arXiv.
Martin Arjovsky, Soumith Chintala, Léon Bottou (2017). Wasserstein GAN. arXiv. slides slides_pdf
Martin Arjovsky, Léon Bottou (2017). Towards Principled Methods for Training Generative Adversarial Networks. ICLR.

Fall 2016

Interpretation and visualization of deep networks

Organizer: Nanyun (Violet) Peng

Dec 8 (Zach Wood-Doughty)
Li, Yixuan, et al. (2015) Convergent Learning: Do different neural networks learn the same representations? [slides http://s.yosinski.com/yosinski_160503_iclr_convergent.pdf]
Bonus paper: Kádár, Ákos, Grzegorz Chrupała, and Afra Alishahi. Representation of linguistic form and function in recurrent neural networks.
Dec 1 (Pamela Shapiro)
Andrej Karpathy, Justin Johnson and Li Fei-Fei (2016) Visualizing and understanding recurrent networks. ICLR.
Nov 17 (Nanyun (Violet) Peng)
Tao Lei, Regina Barzilay and Tommi Jaakkola (2016) Rationalizing Neural Predictions. EMNLP.

Neural MT and generation

Organizer: Chu-Cheng Lin

Nov 10
EMNLP debriefing session.
Nov 3 (Becky Marvin)
Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka (2016). Tree-to-Sequence Attentional Neural Machine Translation. ACL.
Oct 27 (Chu-Cheng Lin)
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le (2014). Sequence to Sequence Learning with Neural Networks. NIPS.
Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio (2015). Neural Machine Translation by Jointly Learning to Align and Translate. ICLR.

Deep learning in structured prediction

Organizer: Tim Vieira

Sep 29 (Patrick Xia and Matthew Francis-Landau)
Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros and Noah A. Smith (2016) Recurrent Neural Network Grammars. NAACL.
Sep 22 (Chu-Cheng Lin and Hongyuan Mei)
David Belanger and Andrew McCallum (2016). Structured Prediction Energy Networks. ICML.
Sep 15 (Matthew Francis-Landau)
Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein (2016). Learning to Compose Neural Networks for Question Answering. NAACL.

Hypergraph algorithms

Organizer: Travis Wolfe

Oct 13 (Becky Marvin)
Alexander M. Rush, Yin-Wen Chang, and Michael Collins (2013). Optimal Beam Search for Machine Translation. EMNLP.
Oct 6 (Tim Vieira and Zach Wood-Doughty)
Zhifei Li and Jason Eisner (2009). First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests. EMNLP.
Sep 8 (Travis Wolfe)
Liang Huang (2008). Advanced Dynamic Programming in Semiring and Hypergraph Frameworks. COLING tutorial notes.

Spring 2016

Interpretable ML

Apr 28
Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling (2015). Bayesian Dark Knowledge. Submitted to NIPS.
Apr 21
Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin. (2016). Why Should I Trust You?” Explaining the Predictions of Any Classifier. CHI Workshop on Human-Centred Machine Learning (HCML).
Optional background reading: Bayesian Learning via Stochastic Gradient Langevin Dynamics (ICML 2011).
Apr 14
Letham, Rudin, McCormick, and Madigan (2012). Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model.

Open-Domain Information Extraction

Nanyun will organize this unit.

Apr 7 (Nanyun Peng)
Jayant Krishnamurthy and Tom M Mitchell (2015). Learning a Compositional Semantics for Freebase with an Open Predicate Vocabulary. TACL.
Mar 31 (Dingquan Wang)
Sebastian Riedel, Limin Yao, Benjamin M. Marlin and Andrew McCallum (2013). Relation Extraction with Matrix Factorization and Universal Schemas. NAACL.
Mar 24 (Nanyun Peng)
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J. Welling (2015). Never-Ending Language Learning. AAAI.

Reinforcement learning

Keisuke and Tim will organize this unit.

Mar 10 (Keisuke Sakaguchi, Tim Vieira)
Sergey Levine and Vladlen Koltun (2013). Guided Policy Search. ICML.
Mar 3 (Nick Andrews)
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller (2014). Deterministic Policy Gradient Algorithms. ICML.
Feb 11, 18, 25 (Tim Vieira, Travis Wolfe)
Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard and Ed Snelson (2013). Counterfactual Reasoning and Learning Systems. arxiv. slides
Feb 4 (Keisuke Sakaguchi)
Merwan Barlier, Julien Perolat, Romain Laroche, and Olivier Pietquin (2015). Human-Machine Dialogue as a Stochastic Game. SIGDIAL. slides
Optional background: Verena Rieser and Oliver Lemon (2011). Reinforcement Learning. In Reinforcement Learning for Adaptive Dialogue Systems, chapter 3.

Fall 2015

Tensor Decomp

December 17
NIPS debriefing session.
December 3 (Pushpendre Rastogi)
Schein et al. (2015). Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts. International Conference on Knowledge Discovery and Data Mining (KDD). blog post
November 19 (Satya Prateek)
Singh et al. (2015). Towards Combined Matrix and Tensor Factorization for Universal Schema Relation Extraction NAACL.
November 12 (Pushpendre Rastogi)
Tao Lei et al (2014). Low Rank Tensors For Scoring Dependency Structures. ACL (Best Paper).

Abstract Meaning Representation (AMR)

Nov 5 (Darcey Riley)
Frank Drewes, Hans-Jorg Kreowski, and Annegret Habel (1997). Hyperedge Replacement Graph Grammars. In Handbook of Graph Grammars and Computing by Graph Transformation, pp. 95-162.
Oct 29 (Darcey Riley)
Jones, Bevan, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, and Kevin Knight (2012). Semantics-Based Machine Translation with Hyperedge Replacement Grammars. Proc. COLING.
Oct 22 (Darcey Riley)
Banarescu et al. (2013). Abstract Meaning Representation for Sembanking. Proc. Linguistic Annotation Workshop.

Deep + Probabilistic / Deep + Attention

Oct 15 (Kevin Duh)
Deep learning tutorial talk.
October 8 (Chu-Cheng Lin)
Andriy Mnih and Karol Gregor (2014). Neural Variational Inference and Learning in Belief Networks. ICML.

Adaptive Inference

October 1 (Tim Vieira)
S. M. Ali Eslami, Daniel Tarlow, Pushmeet Kohli, and John Winn (2014). Just-In-Time Learning for Fast and Flexible Inference. NIPS. [http://arkitus.com/files/nips-14-eslami-just-in-time-supplementary.zip supplementary material, poster
September 24
EMNLP debriefing session.
September 17 (Pushpendre Rastogi)
David Weiss and Ben Taskar (2013). Learning Adaptive Value of Information for Structured Prediction. NIPS.
September 10 (Travis Wolfe)
Jacob Steinhardt and Percy Liang (2015). Reified Context Models.
September 3 (Tim Vieira and Adam Teichert)
Shi, Tianlin, Jacob Steinhardt, and Percy Liang (2015). Learning Where to Sample in Structured Prediction. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics.

Spring 2015

Thursdays 12-1:15pm, Hackerman 306.

Extreme Learning Machine & Computational Learning Theory (w/ practical applications)

Apr 23 (Mozhi Zhang)
Maclaurin et al. (2015) Gradient-based Hyperparameter Optimization through Reversible Learning. arXiv.
Apr 16 (Tim Vieira)
Ross et al. (2011) A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. AISTATS.
Apr 9 (Dingquan Wang)
Long et al. (2010) Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate. ICML.
Apr 2 (Tongfei Chen)
Blum et al. (1999) Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation. COLT.
Mar 26 (Satya Prateek)
Yosinski et al. (2014) How transferable are features in deep neural networks? arXiv.
Mar 12 (Travis Wolfe)
Huang et al. (2006) Extreme Learning Machine: Theory and Applications. Neurocomputing.

Transition-based parsing

Feb 19 (Mo Yu)
Huang et al. (2012) Structured Perceptron with Inexact Search. NAACL.
Feb 12 (Keisuke Sakaguchi)
Sartorio et al. (2013) A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy. ACL.
Feb 5 (Travis Wolfe)
Yamada and Matsumoto (2003) Statistical Dependency Analysis With Support Vector Machines. IWPT.

Fall 2014

Thursdays 12-1:15pm, Hackerman 306.

Scientific practice

Dec 4 (Michael Paul) - Open science and publishing models
Jason Priem (2013). Beyond the paper. Nature.
Timothy Gowers and Michael Nielsen (2009). Massively collaborative mathematics. Nature.
Yann LeCun (2011?). A new publishing model in computer science. Blog post.
Donald Geman (2007). Ten reasons why conference papers should be abolished. Manuscript.
Eric Price (2014). The NIPS experiment. Blog post.
Bert Huang (2014). On the NIPS experiment and review process. Blog post.
Nov 20 (Dingquan Wang)
Eisenstein (2013) What to do about bad language on the internet. NAACL.
Nov 6 (Matt Gormley)
Clark et al. (2011) Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability. ACL.
Søgaard et al. (2014) What's a p-value in NLP?. CoNLL.

Probabilistic semantics

Oct 30 (Elan Hourticolon-Retzler)
Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, Luke Zettlemoyer (2013). Scaling Semantic Parsers with On-the-fly Ontology Matching. EMNLP.
Oct 23 (Violet Nanyun Peng)
Jonathan Berant, Percy Liang, (2014). Semantic Parsing via Paraphrasing. ACL.
Oct 16 (Darcey Riley)
Noah D. Goodman and Daniel Lassiter. Probabilistic Semantics and Pragmatics: Uncertainty in Language and Thought. Chapter for Handbook of Semantics.

Probabilistic programming

Oct 9 (Adam Teichert)
Examples section of Noah Goodman and Andreas Stuhlmüller (2014). The Design and Implementation of Probabilistic Programming Languages. Electronic book at http://dippl.org.
Oct 2 (Travis Wolfe)
Chapters 4,5,6 of Noah Goodman and Andreas Stuhlmüller (2014). The Design and Implementation of Probabilistic Programming Languages. Electronic book at http://dippl.org.
Sep 25 (Pushpendre Rastogi)
Chapters 2,3 of Noah Goodman and Andreas Stuhlmüller (2014). The Design and Implementation of Probabilistic Programming Languages. Electronic book at http://dippl.org.

Beyond MCMC

Sep 18 (Chandler May)
Aaron Li, Amr Ahmed, Sujith Ravi, and Alexander J Smola (2014). Reducing the Sampling Complexity of Topic Models. KDD.
Background: alias sampling
Sep 11 (Frank Ferraro)
Luke Bornn, Yutian Chen, Nando de Freitas, Mareija Eskelin, Jing Fang, and Max Welling (2013). Herded Gibbs Sampling. ICLR.
Sep 4 (Nicholas Andrews)
Anoop Korattikara, Yutian Chen, and Max Welling (2014). Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget ICML.

Summer 2014

Aug 14 (Adam Teichert)
Joseph Gonzalez, Yucheng Low, Arthur Gretton, and Carlos Guestrin (2011). Parallel gibbs sampling: From colored fields to thin junction trees. AISTATS.
July 24 (Tim Vieira)
Alexandre Bouchard-Côté, Slav Petrov, and Dan Klein (2009). Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs. NIPS.
July 24 (Matt Gormley)
Michael U. Gutmann and Aapo Hyvärinen (2010). A new estimation principle for unnormalized statistical models. AISTATS.
July 17 (Juneki Hong)
TBA
May 15 (Tim Vieira)
TBA
May 8 (Travis Wolfe)
Percy Liang, Hal Daume, and Dan Klein (2008). Structure Compilation: Trading Structure for Features. ICML.

Spring 2014

Recent papers

May 1 (Michael Paul)
Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber (2014). Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms. ACL.
Apr 24 (Adam Teichert)
Dani Yogatama and Noah A. Smith (2014). Linguistic Structured Sparsity in Text Categorization. ACL.

Semantic parsing

Apr 17 (Juneki Hong)
Dipanjan Das, Andre F. T. Martins, and Noah Smith (2012). An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints. *SEM. slides
Apr 10 (Keisuke Sakaguchi & Yiran Zhang)
Yoav Artzi and Luke Zettlemoyer (2013). Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions. TACL.
Apr 3 (Xuchen Yao)
Percy Liang, Michael I. Jordan, and Dan Klein (2011). Learning dependency-based compositional semantics.. ACL.

Clever MT algorithms

Mar 27 (Matt Gormley)
Michel Galley, Chris Quirk, Colin Cherry, and Kristina Toutanova (2013). Regularized Minimum Error Rate Training. EMNLP.
Mar 13 (Dan Deutsch)
Adam Pauls and Dan Klein (2009). K-Best A* Parsing. ACL.
Mar 6 (Nanyun Peng)
Andrei Simion, Michael Collins, and Clifford Stein (2013). A Convex Alternative to IBM Model 2. EMNLP.

Online inference

Feb 27 (Nicholas Andrews)
Michael Bryant and Erik B. Sudderth (2012). Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes. NIPS. Nick's slides
Feb 13 (Frank Ferraro), Feb 20 (Ryan Cotterell)
Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley (2013). Stochastic variational inference. JMLR.
Feb 6 (Ryan Cotterell)
Percy Liang and Dan Klein (2009). Online EM for unsupervised models. NAACL. slides

Fall 2013

Recent Papers

Dec 12
NIPS debriefing.
Dec 5 (Xuchen Yao)
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang (2013). Semantic parsing on Freebase from question-answer pairs. EMNLP. supplement
See also: Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer (2013). Scaling Semantic Parsers with On-the-Fly Ontology Matching. EMNLP.
Nov 21 (Adam Teichert)
Alexander M. Rush, Yin-Wen Chang, and Michael Collins (2013). Optimal Beam Search for Machine Translation. EMNLP.
Nov 14 (Frank Ferraro)
Yi Yang and Jacob Eisenstein (2013). A Log-Linear Model for Unsupervised Text Normalization. EMNLP.

ML for Annotation / Active Learning

Nov 7 (Michael Paul)
Dan Garrette and Jason Baldridge (2013). Learning a Part-of-Speech Tagger from Two Hours of Annotation. NAACL.
Oct 31 (Ryan Cotterell)
Burr Settles (2012). Active Learning, chapters 3-5. Synthesis Lectures on Artificial Intelligence and Machine Learning.
Oct 24
EMNLP debriefing.
Oct 17 (Tim Vieira)
Burr Settles (2012). Active Learning, chapters 1-3. Synthesis Lectures on Artificial Intelligence and Machine Learning.

Informal Domains

Oct 10 (Naomi Saphra)
Jacob Eisenstein (2012). Phonological Factors in Social Media Writing. Proceedings of NAACL Workshop on Language Analysis in Social Media.
Oct 3 (Juneki Hong)
Alan Ritter, Sam Clark, Mausam, and Oren Etzioni (2011). Named Entity Recognition in Tweets: An Experimental Study. EMNLP. slides

Deep Learning for NLP

Sep 26 (Nick Andrews)
Richard Socher and Christopher Manning (2013). Deep Learning for NLP (without Magic). Tutorial at NAACL, continued.
Sep 19 (Matt Gormley)
Richard Socher and Christopher Manning (2013). Deep Learning for NLP (without Magic). Tutorial at NAACL.
Sep 12 (Travis Wolfe)
Ronan Collobert and Jason Weston (2008). A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML.

Summer 2013

Aug 15
ACL debriefing.
Jun 20
NAACL debriefing.
Jun 13 (Nicholas Andrews)
Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts (2013). The Life and Death of Discourse Entities: Identifying Singleton Mentions. NAACL (short paper).

Spring 2013

Thursdays 12-1:15pm in Hackerman 306.

Recent NLP papers

May 2 (Gaurav Kumar)
Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, and Joakim Nivre (2013). Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging. TACL.
Apr 25 (Nicholas Andrews)
J Gillenwater, A Kulesza, and B Taskar (2012). Discovering Diverse and Salient Threads in Document Collections. EMNLP.
Apr 18 (Michael Paul)
R Socher, M Ganjoo, H Sridhar, O Bastani, CD Manning, and AY Ng (2013). Zero-Shot Learning Through Cross-Modal Transfer. arXiv, March.

Inference for NLP

Apr 11 (Matt Gormley)
J. Domke (2011). Dual decomposition for marginal inference. AAAI.
Apr 4 (Tim Vieira)
J. Paisley, D. Blei, and M. Jordan (2012). Variational Bayesian Inference with Stochastic Search. ICML.
Mar 28 (Adam Teichert)
D. Weiss, B. Sapp, and B. Taskar (2012). Structured Prediction Cascades. arXiv, August.

Semantics in NLP

Mar 14 (Violet Nanyun Peng)
Dipanjan Das and Noah A. Smith (2011). Semi-Supervised Frame-Semantic Parsing for Unknown Predicates. ACL.
Mar 7 (Frank Ferraro)
David Chen (2012). Fast Online Lexicon Learning for Grounded Language Acquisition. ACL.
Feb 28 (Darcey Riley)
Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox (2012). A Joint Model of Language and Perception for Grounded Attribute Learning. ICML.

Alignment

Feb 21 (Henry Pao)
Chris Dyer, Jonathan Clark, Alon Lavie, and Noah A. Smith (2011). Unsupervised Word Alignment with Arbitrary Features. ACL.
Feb 14 (Travis Wolfe)
Adam Pauls, Dan Klein, David Chiang, and Kevin Knight (2010). Unsupervised Syntactic Alignment with Inversion Transduction Grammars. NAACL.
Feb 7 (Xuchen Yao)
Mohit Bansal, Chris Quirk, and Robert C. Moore (2011). Gappy Phrasal Alignment By Agreement. ACL.

Fall 2012

Good recent ML papers

Jan 24 (Nick Andrews)
Tony Jebara and Anna Choromanska (2012). Majorization for CRFs and Latent Likelihoods. NIPS.
Jan 17 (Adam Teichert)
Po-Ling Loh and Martin Wainwright (2012). Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. NIPS.
Jan 10 (Tim Vieira)
Thomas Furmston and David Barber (2012). A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes. NIPS.
Jan 3 (Jason Eisner)
Robert Gens and Pedro Domingos (2012). Discriminative Learning of Sum-Product Networks. NIPS. Slides.

Good recent NLP papers

Dec 13 (Nathaniel Filardo)
Sebastian Riedel, David Smith, and Andrew McCallum (2012). Parse, Price and Cut: Delayed Column and Row Generation for Graph Based Parsers. ACL. background
Dec 6 (Gaurav Kumar)
Liang Huang, Suphan Fayong, and Yang Guo (2012). Structured Perceptron with Inexact Search. NAACL.
Nov 29 (Frank Ferraro)
Jason Naradowsky, Sebastian Riedel, and David Smith (2012). Improving NLP through Marginalization of Hidden Syntactic Structure. EMNLP.
Nov 15 (Henry Pao)
Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng (2012). Semantic compositionality through recursive matrix-vector spaces. ACL.

Human sentence processing

Nov 8 (Olivia Buzek)
Steven T. Piantadosi, Harry Tily, and Edward Gibson (2011). The communicative function of ambiguity in language. Cognition.
Nov 1 (Aric Velbel)
Roger Levy and T. Florian Jaeger (2007). Speakers optimize information density through syntactic reduction. Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems.
Oct 25 (Keith Levin)
Bock, K., & Levelt, W. J. M. (1994). Language production: Grammatical encoding. In M.A. Gernsbacher (Ed.), Handbook of Psycholinguistics (pp. 945-984). London: Academic Press.

Streaming/online algorithms in NLP

Oct 18 (Travis Wolfe)
Martins, Gimpel, Smith, Xing, Figueiredo, and Aguiar (2010). Aggressive Online Learning of Structured Classifiers. Tech report.
(also seen online as "Learning Structured Classifiers with Dual Coordinate Ascent")
Oct 11 (Violet (Nanyun) Peng)
Graham Cormode (2011). Sketch Techniques for Approximate Query Processing. Foundations and Trends in Database.
Oct 4 (Matt Gormley)
Benjamin Van Durme (2012). Streaming Analysis of Discourse Participants. EMNLP.

Events/Narratives in text

Sep 27 (Xuchen Yao)
Quang Do, Wei Lu, Dan Roth (2012). Joint Inference for Event Timeline Construction. EMNLP.
Sep 20 (Adam Teichert)
Roi Reichart and Regina Barzilay (2012). Multi Event Extraction Guided by Global Constraints. NAACL.
Sep 13 (Michael Paul)
Nathanael Chambers and Dan Jurafsky (2009). Unsupervised Learning of Narrative Schemas and their Participants. ACL.

Summer 2012

Summer conference papers

Aug 30 (Darcey Riley)
Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku (2012). Learning to "Read Between the Lines" using Bayesian Logic Programs. ACL.
Aug 23 (Wes Filardo)
Zhiheng Huang et al. (2012). Iterative Viterbi A* Algorithm for K-Best Sequential Decoding. ACL.
Aug 16 (Travis Wolfe)
Alex Kulesza and Ben Taskar (2011). Learning Determinantal Point Processes. UAI.
Aug 10 (Nick Andrews)
David Hall and Dan Klein (2012). Training Factored PCFGs with Expectation Propagation. EMNLP.
Aug 3 (Michael Paul)
Quang Do; Wei Lu; Dan Roth (2012). Joint Inference for Event Timeline Construction. EMNLP.
Jul 5 (Tim Vieira)
David Burkett and Dan Klein (2012). Fast Inference in Phrase Extraction Models with Belief Propagation. NAACL. Slides.
Jun 29 (Adam Teichert)
Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit (2012). Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure. NAACL.

Spring 2012

Spectral learning

May 3 (Xuchen Yao)
Paramveer Dhillon, Dean Foster and Lyle Ungar (2011). Multi-View Learning of Word Embeddings via CCA. NIPS 24 , Granada, Spain, Dec. 2011
Apr 26 (Matt Gormley)
Franco M. Luque, Ariadna Quattoni, Borja Balle, and Xavier Carreras (2012). Spectral Learning for Non-Deterministic Dependency Parsing. EACL 2012. Best paper award.
Apr 19 (Michael Paul)
Daniel Hsu, Sham M. Kakade, and Tong Zhang (2009). A Spectral Algorithm for Learning Hidden Markov Models. Twenty-Second Annual Conference on Learning Theory (COLT).

Reinforcement learning

Apr 12 (Travis Wolfe)
Wilson, Fern, Ray, and Tadepalli (2007). Multi-Task Reinforcement Learning: A Hierarchical Bayesian Approach. ICML.
Apr 5 (Nathaniel Filardo)
Wingate, David et al. (2011). Bayesian Policy Search with Policy Priors. International Joint Conference on Artificial Intelligence (IJCAI).
Mar 29 (Jay Feldman)
Gergely Neu and Csaba Szepesvári (2009). Training parsers by inverse reinforcement learning, Machine Learning Volume 77, Issue 2. Published online by Springer Netherlands.

Non-convex optimization

Mar 15 (Frank Ferraro)

Main reading: Robert Michael Lewis, Virginia Torczon, and Michael W. Trosset (2000). Direct search methods: then and now. Journal of Computational and Applied Mathematics, Volume 124, Issues 1-2, December, pp. 191-207.

Optional/supplemental reading: Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon (2003). Optimization by direct search: new perspectives on some classical and modern methods. SIAM Review, Vol. 45, Issue 3, pages 385-482.

Mar 8 (Tim Vieira)

Eric Brochu, Vlad M. Cora and Nando de Freitas (2009). A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. pages 1-23.

Mar 1 (Nicholas Andrews)
Main reading (Part 1): M. Ebden (2008). Gaussian Processes for Regression: A Quick Introduction. TR.
Extra reading (Chapter 2): Carl Edward Rasmussen and Christopher K. I. Williams (2006). Gaussian Processes for Machine Learning. MIT Press.
Extra extra reading (Chapter 45): David J.C. MacKay (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press.

Unsupervised/semisupervised learning of linguistic structure

Feb 23 (Olivia Buzek)
Sharon Goldwater, Thomas L. Griffiths, Mark Johnson (2009). A Bayesian framework for word segmentation: Exploring the effects of context. Cognition 112 (1), pp. 21--54.
Feb 16 (Adam Teichert)
Tahira Naseem, Harr Chen, Regina Barzilay, and Mark Johnson (2010). Using Universal Linguistic Knowledge to Guide Grammar Induction, EMNLP.
Feb 9 (Jason Smith)
Joao V. Graca, Kuzman Ganchev, and Ben Taskar (2007). Expectation Maximization and Posterior Constraints. In Advances in Neural Information Processing Systems, Vol. 20.
A longer treatment is Ganchev et al. (2010), Posterior Regularization for Structured Latent Variable Models, JMLR.
An application to unsupervised dependency parsing is Gillenwater et al. (2011), Posterior Sparsity in Unsupervised Dependency Parsing, JMLR.

Fall 2011

Knowledge representation and reasoning

Dec 1 (Meher Vijay Yeleti)
D. Koller, A. Levy, and A. Pfeffer (1997). P-Classic: A Tractable Probabilistic Description Logic. AAAI.
Nov 17 (Ves Stoyanov)
Franz Baader and Werner Nutt (2002). Basic Description Logics. In the Description Logic Handbook.
Nov 10 (Nick Andrews)
Nir Friedman et al. (1999). Learning Probabilistic Relational Models. IJCAI.
Nov 3 (Matt Gormley)
Hector J. Levesque (1986). Knowledge Representation and Reasoning. Annual Review of Computer Science, Vol. 1: 255-287.

Music modeling

Oct 27 (Adam Teichert)
Jean-François Paiement, Yves Grandvalet & Samy Bengio (2009). Predictive models for music. Connection Science 21(2-3):253-272.
Oct 20 (Nathaniel Filardo)
David Temperley (2010). Modeling Common-Practice Rhythm. Music Perception 27(5):355-376.
Oct 13 (Michael Paul)
Gerhard Nierhaus (2008). "Genetic Algorithms in Algorithmic Composition". Algorithmic Composition: Paradigms of Automated Music Generation, Chapter 7.4, pp. 157-186.
Oct 6 (Frank Ferraro)
Fred Lerdahl and Ray Jackendoff (1983). "An Overview of Hierarchical Structure in Music." Music Perception: An Interdisciplinary Journal. Vol. 1, No. 2, Hierarchical Structure in Music (Winter 1983/1984), pp. 229-252.
Resources

ML in information retrieval

Sep 29 (Olivia Buzek)
Shuang-Hong Yang, Bo Long, Alexander J. Smola, Hongyuan Zha, and Zhaohui Zheng (2011). Collaborative competitive filtering: learning recommender using context of user choice. SIGIR.
Sep 22 (Tim Vieira)
Brian McFee and Gert Lanckriet (2010). Metric Learning to Rank. ICML.
Sep 15 (Adam Teichert)
P. Carpena, P. Bernaola-Galvan, M. Hackenberg, A.V. Coronado, and J. L. Oliver (2009). Level statistics of words: Finding keywords in literary texts and symbolic sequences. Physical Review.
Rada Mihalcea, Courtney Corley, and Carlo Strapparava (2006). Corpus-based and Knowledge-based Measures of Text Semantic Similarity. AAAI.
Sep 8 (Travis Wolfe)
Dafna Shahaf, Carlos Guestrin (2010). Connecting the dots between news articles. Proc. of KDD.

Summer 2011

Summer conference papers

Aug 16 (Matt Gormley)
Taylor Berg-Kirkpatrick, Dan Klein (2011). Simple Effective Decipherment via Combinatorial Optimization. Proc. of EMNLP.
Jul 19 (Matt Gormley)
Alexander M. Rush and Michael Collins (2011). Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation. Proc. of ACL. Slides.
Jul 12 (Wes Filardo)
Daniel Gildea (2010). Optimal Parsing Strategies for Linear Context-Free Rewriting Systems. Proc. of NAACL. Slides.
Jun 14 (Xiaoxu Kang)
Limin Yao, Sebastian Riedel, and Andrew McCallum (2010). Collective Cross-Document Relation Extraction Without Labelled Data. Proc. of EMNLP.
Jun 7 (Nicholas Andrews)
Harr Chen, Edward Benson, Tahira Naseem, and Regina Barzilay (2011). In-Domain Relation Discovery with Meta-Constraints via Posterior Regularization. Proc. of ACL.

Spring 2011

Combinatorial optimization

May 5 (Wes Filardo)
Daniel J. Lehmann (1977). Algebraic structures for transitive closure. Theoretical Computer Science 4(1):59-76.
Also consider Tarjan (1981a, 1981b).
Apr 28 (Jason Smith)
R. McDonald, F. Pereira, K. Ribarov, and J. Hajic (2005). Non-projective dependency parsing using spanning tree algorithms. In Proc. HLT/EMNLP, pages 523–530
Apr 21 (Byung Gyu Ahn)
David Sontag, Amir Globerson, Tommi Jaakkola (2010). Introduction to Dual Decomposition for Inference. To appear in Optimization for Machine Learning, editors S. Sra, S. Nowozin, and S. J. Wright: MIT Press, 2010.
Apr 14 (Adam Teichert)
Jack Edmunds (1965). Paths, Trees, and Flowers. Canadian Journal of Mathematics 17: 449--467.

Game-theoretic approaches to discourse pragmatics and to language evolution

Apr 7 (Michael Paul)
Paul Vogt (2005). The emergence of compositional structures in perceptually grounded language games. Artificial Intelligence 167(1-2): 206-242.
Mar 31 (Rachael Richardson)
David Golland, Percy Liang, Dan Klein (2010). A Game-Theoretic Approach to Generating Spatial Descriptions. EMNLP 2010.
Mar 17 (Xuchen Yao)
Gerhard Jäger (2008). Game theory in semantics and pragmatics. Unpublished manuscript.
Note: This looks quite different from the 2011 manuscript that has the same title and author.
March 10 (Luke Orland)
Gerhard Jäger (2008). Applications of Game Theory in Linguistics

Variational inference

March 3 (Nicholas Andrews)
Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). The infinite PCFG using hierarchical Dirichlet processes. EMNLP.
Feb 24 (Nathaniel Filardo)
Matthew Beal (2003). Variational Bayesian Hidden Markov Models. Appears as Chapter 3 of Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London.
David MacKay (1997). Ensemble Learning for Hidden Markov Models. Unpublished technical report, Cavendish Laboratory, University of Cambridge.
Slides from Mark Johnson (2007). Why doesn't EM find good HMM POS-taggers?. EMNLP.
Feb 17 (Adam Teichert)
David M. Blei, Andrew Y. Ng, and Michael I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning.
Feb 10 (Matt Gormley)
Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul (1999). An introduction to variational methods for graphical models. Machine Learning.
To get an intuition first, start with Jason's high-level explanation of variational inference. For another reference, try the ACL 2007 tutorial slides by Percy Liang and Dan Klein.

Fall 2010

Unsupervised discriminative learning

Dec 9 (Adam Teichert)
Continue with last week's reading: chapter 3.
Dec 2 (Wes Filardo)
Continue with last week's reading: finish chapter 2.
Nov 18 (Jason Smith)
Csaba Szepesvári, Algorithms for Reinforcement Learning. This week we'll read the preface, chapter 1, and the first section of chapter 2. If you're trying to access this outside of JHU, try this link.
Nov 11 (Ves Stoyanov)
Yves Grandvalet and Yoshua Bengio, Entropy Regularization, in: Semi-Supervised Learning, pages 151--168, MIT Press, 2006
Nov 4 (Michael Paul)
Baoxun Wang, Xiaolong Wang, Chengjie Sun, Bingquan Liu, Lin Sun (2010). Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities.
Oct 28 (Adam Teichert)
Noah Smith and Jason Eisner (2005). Guiding Unsupervised Grammar Induction Using Contrastive Estimation.

Semantic parsing

Oct 21 (Svitlana Volkova)
Mihai Surdeanu, Richard Johansson, Adam Meyers, Llu ́ıs Ma`rquez, Joakim Nivre (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. Slides.
Oct 14 (Xuchen Yao)
Wei Lu , Hwee Tou Ng , Wee Sun Lee , Luke S. Zettlemoyer (2008). A Generative Model for Parsing Natural Language to Meaning. EMNLP. Slides.
Oct 7 (Matt Gormley)
Dipanjan Das, Nathan Schneider, Desai Chen and Noah A. Smith (2010). Probabilistic Frame-Semantic Parsing. NAACL.
Sep 30 (Nicholas Andrews)
Luke S. Zettlemoyer and Michael Collins (2009). Learning Context-Dependent Mappings from Sentences to Logical Form. ACL.

Graph-based methods and random walks

Sep 23 (Adam Teichert)
Jie Cai and Michael Strube (2010). End-to-End Coreference Resolution via Hypergraph Partitioning. ACL.
Sep 16 (Delip Rao)
Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). A Survey of Statistical Network Models. Foundation and Trends in Machine Learning 2, 2 (Feb.), 129-233.
Sep 9 (Svitlana Volkova)
Einat Minkov and William W. Cohen (2008). Learning Graph Walk Based Similarity Measures for Parsed Text. EMNLP.

Summer 2010

Summer conference papers

Aug 12 (Jason Smith)
Alexander Clark (2010). Efficient, Correct, Unsupervised Learning for Context-Sensitive Languages. CoNLL.
Aug 5 (Veselin Stoyanov)
Hoifung Poon and Pedro Domingos (2010). Unsupervised Ontology Induction from Text. ACL.
Jul 20
General discussion of ACL 2010 papers.
Jul 15 (Nicholas Andrews)
Shay B. Cohen, David M. Blei and Noah A. Smith (2010). Variational Inference for Adaptor Grammars. NAACL.
Jul 6 (Veselin Stoyanov)
D. Chiang, J. Graehl, K. Knight, A. Pauls, and S. Ravi (2010). Bayesian Inference for Finite-State Transducers. NAACL.
Jun 29 (Matt Gormley)
Percy Liang, Michael I. Jordan, and Dan Klein (2010). Type-Based MCMC. NAACL. Slides.
Jun 22 (Spence Green)
David Burkett, John Blitzer, and Dan Klein (2010). Joint Parsing and Alignment with Weakly Synchronized Grammars. NAACL. Slides.
Relevant background:
Jun 17 (Ves Stoyanov)
Aria Haghighi and Dan Klein (2010). Coreference Resolution in a Modular, Entity-Centered Model. NAACL.
Jun 10
General discussion of NAACL 2010 papers.

Spring 2010

Visual scene parsing

May 6 (Rizwan Chaudhry)
S. Fidler, M. Boben, A. Leonardis (2009). Learning Hierarchical Compositional Representations of Object Structure. In Sven J. Dickinson, Alés Leonardis, and Bernt Schiele (eds.), Object Categorization: Computer and Human Vision Perspectives.
See also the talk that Geoff Hinton gave here last week, Deep learning with multiplicative interactions.
April 22 (Zach Pezzementi) and April 29 (Balakrishnan V)
Song-Chun Zhu and David Mumford (2006). A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2(4):259-362. Slides.
Official final version is good for screen reading but wastes paper.
April 15 (Nick Andrews)
Sven Dickinson (2009). The Evolution of Object Categorization and the Challenge of Image Abstraction. In Sven J. Dickinson, Alés Leonardis, and Bernt Schiele (eds.), Object Categorization: Computer and Human Vision Perspectives.

Generalized A* and related coarse-to-fine ideas

April 8 (Matt Gormley)
André F. T. Martins, Noah A. Smith, and Eric P. Xing (2009). Concise Integer Linear Programming Formulations for Dependency Parsing. ACL-IJCNLP.
April 1 (Adam Gerber)
Aria Haghighi, John DeNero, and Dan Klein (2007). Approximate Factoring for A* Search. HTL-NAACL 2007. Slides.
March 25 (Zhifei Li)
Mark Hopkins and Greg Langmead (2009). Cube pruning as heuristic search. EMNLP 2009.
March 11 (Jason Smith)
Adam Pauls and Dan Klein (2009). K-Best A* Parsing. ACL. Slides.
March 4 (Nathaniel Filardo)
Pedro Felzenswalb and David McAllester (2007). The Generalized A* Architecture. Journal of Artificial Intelligence Research. Slides from [9].

Weakly supervised learning of semantics

There's also a nice list of papers at the UT reading group on Connecting Language Acquisition with Machine Perception.
Feb 25 (Nick Andrews)
Luke Zettlemoyer and Michael Collins (2005). Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI-05).
Feb 11 Feb 18 (Ves Stoyanov)
S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay (2009). Reinforcement Learning for Mapping Instructions to Actions. ACL-IJCNLP.
Feb 4 (Rachael Richardson)
Percy Liang, Michael I. Jordan, and Dan Klein (2009). Learning Semantic Correspondences with Less Supervision. ACL-IJCNLP.

Fall 2009

Bayesian methods

Jan 21 (Zhifei Li)
Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). The infinite PCFG using hierarchical Dirichlet processes.. EMNLP 2007.
Jan 14 (Jason Smith)
Matthew J. Beal, Zoubin Ghahramani, and Carl Edward Rasmussen (2002). The Infinite Hidden Markov Model. NIPS.
Also discussed in section 7 of last week's paper.
Jan 7 (Jason Eisner)
Long lecture on the Dirichlet process (infinite) mixture model.
Reading: Yee Whye Teh, Michael Jordan, Matthew Beal and David Blei (2005), Hierarchical Dirichlet Processes.
There's also a stack of relevant slides from Jordan's 2005 NIPS tutorial.
Dec 3 (Jason Smith)
Sharon Goldwater and Thomas L. Griffiths (2007), A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging. ACL.
This paper uses a Gibbs sampler. See also the following papers, which compare Gibbs sampling with Variational Bayes and other methods for the same problem:
Nov 26 (Mechanical Turkey)
Mary McGlohon (2007), Fried Chicken Bucket Processes. SIGBOVIK.
Nov 19 (Jason Eisner)
Lecture on Gibbs sampling and variational Bayes for LDA and its finite-state generalizations.
Nov 12 (Jason Eisner)
Yee Whye Teh (2009), Nonparametric Bayesian Models. Video tutorial at Machine Learning Summer School.
Nov 5 (Zhifei Li)
David M. Blei, Andrew Y. Ng, & Michael I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003) 993-1022.

Inference methods

Oct 29 (Markus Dreyer)
Koller & Friedman, Chapter 11, Optimization as Inference
Oct 22 (Puyang Xu)
Koller & Friedman, Chapter 12: Particle-Based Methods
Oct 15 (Ariya Rastrow)
Koller & Friedman, Chapters 3 & 4
Oct 1 (Anoop Deoras), Oct 8 (Carolina Parada)
MacKay (2003), Monte Carlo Methods and Efficient Monte Carlo Methods. Chapters 29-30 of Information Theory, Inference, and Learning Algorithms.

Multilingual/ Cross-lingual learning

Sep 24 (Omar F. Zaidan)
David Burkett and Dan Klein, (2008). Two Languages are Better than One (for Syntactic Parsing). EMNLP, 2008.
Sep 17 (Rachael Richardson)
Alexander Fraser, Renjing Wang, and Hinrich Schütze (2009). Rich Bitext Projection Features for Parse Reranking. EACL 2009.
Sep 10 (Delip Rao)
Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay (2009). Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach. NAACL 2009.
Summary: There are several approaches to learning syntax in an unsupervised fashion but this paper belongs to the growing notion of exploiting multiple languages to reduce ambiguity in the learning task. The most important take-home message from the paper is, it is possible to consistently reduce the gap between supervised and unsupervised learning by progressively adding more languages to the mix. This is akin to the multi-view learning results in machine learning literature. An earlier work by the same authors (EMNLP'08) showed how by carefully selecting pairs of languages in multilingual learning one can achieve better accuracies. The current paper builds on that result and shows that it is not really necessary to hand-pick the bilingual pairs; robust performance is guaranteed by blindly adding more languages.
Well, not so blindly. Adding more languages to the setup means estimating more parameters in the model. Without careful implementation, such a model can become intractable. Section 3 explains in detail about the generative setup and the inference procedure. Starting with Goldwater's monolingual HMM tagging like setup for each language, the HMMs are stitched together using alignment links and latent variables called "superlingual tags" leading to a product of experts model. The superlingual tags can be considered as tags that generate similar kind of syntactic entities in each of the languages. The inference procedure as with any non-trivial npbayes setup involves computing integrals that don't have a closed form solution. Monte Carlo sampling is a standard approach to solve such problems. Gibbs sampling is one such method. The details of the sampling process is in sections 3.5-3.7. This part is a bit technical and will be discussed either tomorrow and/or the sessions on non-parametric bayesian methods. There are other methods one could use, like variational methods and expectation propagation instead.

Summer 2009

Summer conference papers

July 23 (Zhifei Li)
Joris Mooij and Bert Kappen, (2008). Bounds on marginal probability distributions. NIPS, 2008.
July 16 (Markus Dreyer)
Fabien Cromierès, Sadao Kurohashi (2009). An Alignment Algorithm Using Belief Propagation and a Structure-Based Distortion Model. EACL 2009.
June 25 (Markus Dreyer)
Hoifung Poon, Colin Cherry, Kristina Toutanova (2009). Unsupervised Morphological Segmentation with Log-Linear Models. NAACL 2009.
June 19 (Zhifei Li)
David Chiang, Wei Wang and Kevin Knight, (2009). 11,001 new features for statistical machine translation. NAACL 2009.

Spring 2009

Information extraction (relevant to TAC)

Apr 30 (Chuan Liu)
Jun Wang (2009). Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval. European Conference on Information Retrieval.
Apr 23 (Wes Filardo)
Jun Zhu, Zaiqing Nie, Xiaojing Liu Bo Zhang, Ji-Rong Wen (2009). StatSnowball: a Statistical Approach to Extracting Entity Relationships. WWW 2009.
Apr 16 (Carolina Parada)
Julien Ah-Pine, Guillaume Jacquet (2009). Clique-Based Clustering for improving Named Entity Recognition systems. EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30 - April 3, 2009
Apr 9 (Jason Smith)
Marius Pasca (2009). Outclassing Wikipedia in Open-Domain Information Extraction: Weakly-Supervised Acquisition of Attributes over Conceptual Hierarchies. EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30 - April 3, 2009

Domain adaptation across text genres

Apr 2 (Arnab Ghoshal)
Corinna Cortes, Mehryar Mohri, Michael Riley, Afshin Rostamizadeh. Sample Selection Bias Correction Theory. In Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008).
Mar 26 (Ariya Rastrow)
Yishay M, Mehryar M, Afshin R (2008). Domain Adaptation with Multiple Sources. In Proceedings of Advances in Neural Information Processing Systems (NIPS)
Optional Reading John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. Learning Bounds for Domain Adaptation. Neural Information Processing Systems - NIPS 2007
Mar 12 (Delip Rao)
Schweikert G, Widmer C, Scholkopf B, Ratsch G (2008) An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Proceedings of Advances in Neural Information Processing Systems (NIPS)
Optional Reading: Marx Z, Rosenstein MT, Dietterich TG, Kaelbling LP (2008) Two algorithms for transfer learning. In: Inductive Transfer: 10 years later
Mar 5 (Omar F. Zaidan)
Su-In Lee, Vassil Chatalbashev, David Vickrey, and Daphne Koller (2007). Learning a Meta-Level Prior for Feature Relevance from Multiple Related Tasks. ICML 2007.

Recent good papers

Feb 26 (Zhifei Li)
John DeNero, Alex Bouchard, and Dan Klein (2008). Sampling Alignment Structure under a Bayesian Translation Model. EMNLP 2008.
Feb 19 (Jason Eisner)
Impromptu lecture on Dirichlet distributions, Dirichlet processes, etc.
Feb 12 (Markus Dreyer)
Tom Minka (2005). Divergence measures and message passing. Microsoft Research Technical Report. Slides: pdf, ppt.
Feb 5 (Delip Rao)
David J. Hand (2006). Classifier Technology and The Illusion of Progress. Statistical Science.

Fall 2008

Programming languages for AI

Dec 13-14
NIPS workshop on probabilistic programming (see probabilistic-programming.org), which mentioned a number of other languages and libraries.
Dec 4 (Omar F. Zaidan)
Jeff Bilmes (~2002). The Graphical Models Toolkit (GMTK).
The above link includes a draft of the documentation and a tutorial, as well as the binaries.
Nov 20 (Wren Thornton)
Avi Pfeffer (2006). IBAL Tutorial.
Installed in masters*:~wren/local/bin (linux only, so not masters01 or masters02) and clsp:~wren/local/bin. Add this directory to your PATH.
See also other materials, including this paper: Avi Pfeffer (2007). The design and implementation of IBAL: A general-purpose probabilistic language. In Lise Getoor and Ben Taskar (eds.), Introduction to Statistical Relational Learning.
Nov 13 (Nathaniel Filardo)
Marc Sumner and Pedro Domingos (2007). The Alchemy Tutorial. Slides.
System is installed in masters*:~nwf/public/alchemy. There is a tutorial subdirectory. You should be able to follow along in the tutorial by running commands like
~nwf/public/alchemy/bin/infer \
   -i ~nwf/public/alchemy/tutorial/basics/uniform.mln \
   -e ~nwf/public/alchemy/tutorial/empty.db \
   -r uniform.results \
   -q Heads

Miscellaneous

Oct 30, Nov 6
Discussion of the EMNLP 2008 papers.
Oct 23 (Damianos Karakos)
I. Csiszar and G. Tusnady (1984). Information geometry and alternating minimization procedures. Statistics and Decisions, Suppl. Issue 1, pp. 205-237.
The paper is not online, but there are online course notes from Sanjeev Khudanpur.

Probabilistic relational models

Oct 16 (Nathaniel Filardo)
Pedro Domingos et al. (2008). Markov Logic. In L. De Raedt, P. Frasconi, K. Kersting and S. Muggleton (eds.), Probabilistic Inductive Logic Programming (pp. 92-117). New York: Springer.
Oct 1 (Balakrishnan Varadarajan?)
Nir Friedman, Lise Getoor, Daphne Koller, and Avi Pfeffer (1999). Learning Probabilistic Relational Models. In IJCAI.
A longer book chapter version is linked from here, but the link is dead.
Sep 25 (Zhifei Li)
David Smith and Jason Eisner (2008). Dependency Parsing by Belief Propagation. In EMNLP.

Creative uses of classifiers in NLP

Sep 18 (Markus Dreyer)
D. Rosenberg, D. Klein and B. Taskar (2007). Mixture-of-Parents Maximum Entropy Markov Models. Uncertainty in Artificial Intelligence (UAI), Vancouver, BC, July.
Sep 11 (Nikesh Garera)
Yoav Goldberg and Michael Elhadad (2007). SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking. In ACL 2007.
Libin Shen; Aravind K. Joshi (2003) An SVM-based voting algorithm with application to parse reranking. In HLT-NAACL 2003.

Summer 2008

Good current papers

August 19 (Zhifei Li)
Ahmad Emami and Frederick Jelinek (2006). A neural syntactic language model. Journal of machine learning, volume 60, numbers 1-3, September, 2005.
August 5 (Zhifei Li)
Libin Shen, Jinxi Xu and Ralph Weischedel (2008). A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. In ACL 2008.
July 29 (David Smith)
Ronan Collobert and Jason Weston (2008). A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML 2008: Helsinki, Finland.
July 22 (Nikesh Garera)
Zornitsa Kozareva, Ellen Riloff and Eduard Hovy (2008). Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. Proc. of ACL-08: HLT, Columbus, OH.
July 15 (Markus Dreyer)
Sittichai Jiampojamarn, Colin Cherry, and Grzegorz Kondrak (2008). Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. Proc. of ACL-08: HLT, Columbus, OH.
July 8 (Delip Rao)
Liang Sun, Shuiwang Ji, and Jieping Ye (2008). A Least Squares formulation for Canonical Correlation Analysis. Proc. of ICML-08, Helsinki
Hotelling, in 1936, proposed a method to characterize the relationship between two variables which widely became known as "Canonical Correlation Analysis" (CCA). This involves solving the generalized eigenvalue problem of the kind Ax = \lambda Bx, which can further be reduced to the symmetric eigenvalue problem (via Cholesky decomposition) in the CCA case. It is a general interest in statistics literature to connect different statistical models to the least squares problem not only to exploit the simpler solutions for solving such problems but also to relate with other methods. The least squares formulation also allows extending the different models using the regularization framework. The least squares formulation for the CCA model involves tying together an older result showing the equivalence of CCA and the Fisher LDA, and a recent least squares formulation of multi-class LDA.
CCA has been applied traditionally in social sciences and more recently in IR. There is literature applying CCA for problems in cross-lingual IR, image retrieval, and learning lexicons. Interestingly, the ACL'08 paper by Haghighi et. al. on learning bilingual lexicons using CCA is not the first paper to do that. There is at least one paper as early as 2004 by Cancedda & friends from XRCE that does something similar and does not get cited in the ACL paper.
June 12 (Zhifei Li)
Hao Zhang, Chris Quirk, Robert C. Moore and Daniel Gildea (2008). Bayesian Learning of Non-compositional Phrases with Synchronous Parsing. Proc. of ACL-08: HLT, Columbus, OH.
June 5 (Markus Dreyer)
Kuzman Ganchev, João Graça and Ben Taskar (2008). Better Alignments = Better Translations? Proc. of ACL-08: HLT, Columbus, OH.
May 29 (Nikesh Garera)
Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick and Dan Klein (2008). Learning Bilingual Lexicons from Monolingual Corpora. Proc. of ACL-08: HLT, Columbus, OH.

Spring 2008

Dynamic programming speedups

May 15 (David Smith)
Geoffrey Zweig and Mukund Padmanabhan (2000). Exact Alpha-Beta Computation in Logarithmic Space with Application to MAP Word Graph Construction. Proc. of ICSLP, Beijing.
This is a specialization to HMMs of the DBN version given earlier by Binder, Murphy & Russell (1997). See also section 3.7.1 of Kevin Murphy's thesis.
Related work: This kind of trick was really pioneered by D. S. Hirschberg (1975), who cut the space requirements of longest common subsequence from quadratic all the way down to linear. Hirschberg's version can be nicely adapted to edit distance. Now, edit distance (and more generally, multiple sequence alignment) is really just a special case of shortest path in a graph. Hirschberg (1975), above, was generalized by Korf (1999)'s "Divide and Conquer Bidirectional Search, which Korf & Zhang (2000) (who discuss all these algorithms) further improved to "Divide and Conquer Frontier Search." Edelkamp & Meyer (2001) give log-space methods for improving A* search for the shortest path in a graph. (Note that A* search often fits in memory for our DP problems; reducing its memory requirements becomes paramount when we are searching trees that branch without rejoining, e.g., chess.) Bidirectional search, which is distantly related to A*, is also pretty well studied, including recent work at JHU's AMS Dept.
May 1 (John Blatz)
Pedro Felzenswalb and David McAllester (2006). The Generalized A* Architecture. To appear in the Journal of Artificial Intelligence Research.
Apr. 24 (Zhifei Li)
Liang Huang (2008). Forest Reranking: Discriminative Parsing with Non-Local Features. To appear in Proceedings of ACL 2008, Columbus, OH.
Apr. 17 (Arnab Ghoshal)
Liang Huang and David Chiang (2005). Better k-best parsing. Proceedings International Workshop on Parsing Technologies.

Grammatical inference

Apr. 10 (Wren Thornton)
Carl de Marcken (1996), Linguistic structure as composition and perturbation. ACL.
Also see thesis version.
Apr. 3 (Nathaniel Filardo)
A. Clark (2006). Learning Deterministic Context Free Grammars: The Omphalos Competition.
Mar. 27 (Nikesh Garera)
Stolcke, A. and Omohundro, S. (1993). Hidden Markov model induction by Bayesian model merging. Advances in Neural Information Processing Systems (Morgan Kaufmann, San Mateo, CA), 5, 11-18.

Inference in graphical models

Mar. 20 (Delip Rao)
Jonathan Yedidia, William Freeman, and Yair Weiss (2001). Bethe free energy, Kikuchi approximations and belief propagation algorithms. MERL TR-2001-16.
Mar. 6&13 (Markus Dreyer)
M. J. Wainwright, T. Jaakkola and A. S. Willsky (2005). A new class of upper bounds on the log partition function. IEEE Trans. on Information Theory, 51, 2313--2335.
Feb. 28 (David Smith)
David MacKay (2003). Variational methods. Chapter 33 of Information Theory, Inference, and Learning Algorithms.
Feb. 21 (David Smith)
Michael I. Jordan et al. (1999). An Introduction to Variational Methods for Graphical Models Machine Learning, 37, 183–233.
Feb. 7&14 (Delip Rao)
M. I. Jordan and Y. Weiss (2002). Probabilistic Inference in Graphical Models, The Handbook of Brain Theory and Neural Networks (MIT Press).

Fall 2007

Semisupervised learning

Dec. 12 (Delip Rao)
M. Belkin, P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, TechReport, UChicago, TR-2002-01
Mikhail Belkin, Partha Niyogi, Vikas Sindhwani, On Manifold Regularization, AISTATS 2005
Nov. 17 (David Smith)
X. Zhu, Semi-Supervised Learning Literature Survey

Recent parsing papers

Nov. 3 (Christo Kirov)
I. Titov, J. Henderson, Constituent Parsing with Incremental Sigmoid Belief Networks, ACL 2007
Oct. 26 (Christo Kirov)
Seginer, Yoav, Fast Unsupervised Incremental Parsing (syntax induction), ACL 2007
Oct. 17 (Markus Dreyer)
Nakagawa, Tetsuji, Multilingual Dependency Parsing Using Global Features, EMNLP-CoNLL 2007

Text compression

Oct. 10 (Nathaniel W Filardo)
Mahoney, Matthew, Adaptive Weighting of Context Models for Lossless Data Compression, Florida Institute of Technology, CS Department, Technical report CS-2005-16, EMNLP-CoNLL 2007

Some other possible papers that we didn't read (not vetted):

Domain adaptation

Oct. 3 (David Smith)
Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira, Analysis of Representations for Domain Adaptation
Sep. 26 (Omar F Zaidan)
J. Blitzer, R. McDonald, F. Pereira, Domain Adaptation with Structural Correspondence Learning, EMNLP 2006

Summer 2007

Good current papers

Aug. 30 (Delip Rao)
Gideon S. Mann, Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization, Proceedings of the 24 th International Conference on Machine Learning 2007
Aug. 18 (Markus Dreyer)
D. Talbot, M. Osborne, Randomised Language Modelling for Statistical Machine Translation, ACL 2007
They use a space-efficient randomized data structure (Bloom Filter) to store very large n-gram models. There is a companion paper that people might want to have a quick look at as well, for comparison:
D. Talbot, M. Osborne, Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap, ACL 2007
Aug. 11 (Nikesh Garera)
L. Shen, G. Satta, A. Joshi., Guided learning for bidirectional sequence classification, ACL 2007
Aug. 3 (Yi Su)
M. Galley, K. McKeown, Lexicalized Markov Grammars for Sentence Compression, NAACL-HLT 2007
Jul. 18 (David Smith)
P. Liang, S. Petrov, M. Jordan, D. Klein, The Infinite PCFG Using Hierarchical Dirichlet Processes, EMNLP-CoNLL 2007
Jul. 6 (Christopher White)
A. Braunstein, M. Mezard, R. Zecchina., Survey propagation: an algorithm for satisfiability, Random Structures and Algorithms, 2005.
We sent some questions to Zecchina.
Lukas Kroc, Ashish Sabharwal and Bart Selman. Survey propagation revisited: An empirical study. 23rd UAI, 2007.
Jun. 21 (Christopher White)
K. Murphy, Y. Weiss, M. Jordan, Loopy belief propagation for approximate inference: An empirical study, 15th UAI, pages 467-?75, 1999
... discussing (loopy) belief propagation as background for survey propagation, a topic which has been getting more attention lately for its ability to "solve very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas. Chapter 8 of Chris Bishop's textbook is supposed to be a good treatment of graphical models overall. He covers BP in section 8.4.4 after first presenting factor graphs in 8.4.3., David MacKay's treatment of BP, also in terms of factor graphs, is in chapter 26 of his book [10]. It's worth reading this chapter in full, perhaps first reading chapter 16. ... the update equations are given as (26.11) and (26.12) ... [substantial further discussion by Jason was here] Some people may prefer Bishop's style, others MacKay's.
Jun. 14 (David Smith)
X. Zhu, Z. Ghahramani,J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, ICML 2003
Jun. 6 (Nikesh Garera)
A. Alexandrescu, K. Kirchhoff, Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP, HLT/NAACL 2007
Jun. 2 (Erin Fitzgerald)
J. Jiang, C. Zhai, A Systematic Exploration of the Feature Space for Relation Extraction, HLT/NAACL 2007
May 17 (Markus Dreyer)
M. Galley, K. McKeown, Lexicalized Markov Grammars for Sentence Compression, HLT/NAACL 2007
May 10 (David Smith )
M. Johnson, T. Griffiths, and S. Goldwater, Bayesian Inference for PCFGs via Markov Chain Monte Carlo, HLT/NAACL 2007

Spring 2007

Integrating search and learning

Apr. 19 (John Blatz)
A. Prieditis, Machine discovery of Effective Admissible Heuristics , Machine Learning Journal, 1993
Apr. 12 (Markus Dreyer)
A. Haghighi, J. DeNero and D. Klein, Approximate Factoring for A* Search, NAACL-HLT 2007
Mar. 29 & Apr. 5 (Zhifei Li)
H. Daume III, J. Langford, and D. Marcu, Search-based structured prediction, Machine Learning Journal, forthcoming
Mar. 8 (David Smith)
H. Daume III & D. Marcu, Learning as search optimization: approximate large margin methods for structured prediction, ICML 2005

Recent IR/QA papers (with an NLP or multilingual focus)

Mar. 1 (Wei Chen)
M. Kaisser, S. Scheible, and B. Webber, Experiments at the University of Edinburgh for the TREC 2006 QA track, TREC-15
They do some fairly deep interpretation of sentences, extracting their predicate-argument structure.
Feb. 22 (Eric Harley)
K. Kan Lo & W. Lam, Using Semantic Relations with World Knowledge for Question Answering, TREC-15

Unsupervised learning of morphology

Feb. 15 (Nikhil Bojja)
C. Monson et. al., Unsupervised Induction of Natural Language Morphology Inflection Classes, ACL Student Workshop '04
Feb. 8 (Delip Rao)
P. Schone and D. Jurafsky, Knowledge-free induction of morphology using latent semantic analysis , CoNLL 2000
However, there was an extension of this work reported in NAACL-2001 that looks at circumfixes and prefix/affix combinations. [11]
Feb. 1 (Nikesh Garera)
D. Yarowsky and R. Wicentowski, Minimally supervised morphological analysis by multimodal alignment,ACL 2000
For more details refer to Chapter 4 of Wicentowski's thesis.

Fall 2006

Syntax-based MT

Dec. 13 (Delip Rao)
J. Carbonell et. al., Context-based machine translation, AMTA 2006
Dec. 6 (Jason Smith)
M. Galley et. al., Scalable Inference and Training of Context-Rich Syntactic Translation Models, ACL 2006
It may also be helpful to look at:
M. Galley et. al., What's in a translation rule?, HLT/NAACL 2004
Nov. 29 (Balakrishnan V)
D. Marcu et. al., SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , EMNLP 2006
Nov. 15 (Eric Harley)
D. Chiang, An introduction to synchronous grammars, ACL 2006 Tutorial
Slides from the talk are also available. [12]

Linguistics: Syntactic formalisms

Nov. 8 (Elliott Drabek)
K.Shklovsky, A Grammatical Sketch of Petalcingo Tzeltal, Undergraduate Thesis, Reed College, 2005
It is 77 pages long, but not dense, and I will be skipping the following sections: Pages
  • 01-14 Phonetics and phonology
  • 18-18 Polyvalence
  • 21-21 Inherent possession and ...
  • 46-55 Tense and aspect and other sections
Nov. 1 (Yi Su)
M. Steedman, Gapping as Constituent Coordination, Linguistics and Philosophy, Vol. 13, 1990, pp.207-264.
See Yi for photocopies.
Oct. 25 (Markus Dreyer)
S. Reizler et. al., Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques, ACL 2002
Oct. 18 (Erin Fitzgerald)
J. Bresnan & R.M. Kaplan, Lexical-Functional Grammar: A Formal System for Grammatical Representation , The Mental Representation of Grammatical Relations, MIT Press, 1982
The edited collection that this appears in is generally interesting. Bresnan defends and develops lexicalized grammars in general; the idea of separate surface and semantic roles; and Bresnan & Kaplan's LFG in particular. You should know that she originated (in 1978) the extremely influential idea of lexicalized syntax -- the idea that a grammar is simply a collection of lexical entries to be assembled in standard language-independent ways, but that there are also "lexical redundancy rules" that relate, e.g., active and passive entries for the same verb. Some chapters address morphological and cognitive issues pertaining to lexicalization, including an essay by Pinker on lexicalist learning., Slides from Erin's presentation can be found here.

Machine learning: Margin methods and structured classification

Oct. 11 (John Blatz)
L.Xu, D. Wilkinson, F. Southey, & D. Schuurmans, Discriminative Unsupervised Learning of Structured Predictors , ICML 2006
Oct. 4 (Nikesh Garera)
A. Culotta & J. Sorensen, Dependency Tree Kernels for Relation Extraction , ACL 2004
D. Zelenko, C. Aone, & A. Richardella, Kernel Methods for Relation Extraction, JMLR, Volume 3, 2003
Sep. 27 (David Smith)
C. Cortes, P. Haffner, & M. Mohri, Rational Kernels , NIPS 2003
Papers extending rational kernels, including results on positive semidefinite cases, are at: [13], For the record, and not to be read, is an interesting parallel line of research in Fisher Kernels over strings, e.g. this paper by Saunders, Shawe-Taylor and Vinokourov: [14]
Sep. 20 (Elliot Drabek)
K.Q. Weinberger, F. Sha, & L.K. Saul, Learning a kernel matrix for nonlinear dimensionality reduction , ICML 2004
S.T. Roweis & L.K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding , Science, 22 December 2000
J.B. Tenenbaum, V. De Silva, & J.C. Langford, A global geometric framework for nonlinear dimensionality reduction , Science, 22 December 2000
Sep. 13 (Roy Tromble)
L. Xu, J. Neufeld, B. Larson, & D. Schuurmans, Maximum Margin Clustering , NIPS 2004

Summer 2006

Recent HLT-NAACL papers

Aug. 4 (David Smith)
Sharon Goldwater, Thomas L. Griffiths, Mark Johnson, Contextual Dependencies in Unsupervised Word Segmentation, ACL 2006
Anyone looking for a more straight-up language modeling discussion can compare:
More resources:
Jul. 20 (Roy Tromble)
Mehryar Mohri, Brian Roark, Probabilistic Context-Free Grammar Induction Based on Structural Zeros, HLT-NAACL, 2006
Jul. 6 (Keith Hall)
Charles Sutton, Michael Sindelar, Andrew McCallum, Reducing Weight Undertraining in Structured Discriminative Learning, HLT-NAACL, 2006
Jun. 31 (Markus Dreyer)
Joakim Nivre, Johan Hall et al, Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines, CoNLL 2006
J. Nivre, J. Nilsson, Pseudo-Projective Dependency Parsing, ACL 2005
Jun. 24 (David Smith)
Percy Liang, Ben Taskar, Dan Klein, Alignment by Agreement, HLT-NAACL 2006

Spring 2006

Algorithms for NLP (mostly)

May 18 (Markus Dreyer)
Jonathan May, Kevin Knight, A Better N-Best List: Practical Determinization of Weighted Finite Tree Automata, Proc. NAACL-HLT, 2006
May 11 (John Blatz)
M. Gengler, An introduction to parallel dynamic programming, Lecture Notes in Computer Science, 1996
May 4 (David Smith)
C. E. R. Alves, E. N. C′aceres F. Dehne, Parallel dynamic programming for solving the string editing problem on a CGM/BSP, SPAA 2002
Apr. 20 (Balakrishnan V)
Richard M. Karp, Michael 0. Rabin, Efficient randomized Pattern matching Algorithms, IBM Journal of Research and Development, 1987
Mar. 31, Apr. 6 (Eric Harley)
Ben Taskar, Lacoste-Julien Simon, Klein Dan, A Discriminative Matching Approach to Word Alignment, ACL 2005
A related paper is
Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic, Non-projective Dependency Parsing using Spanning Tree Algorithms, HLT-EMNLP 2005
Mar.17 (Elliott Franco Drabek)
Necip Fazil Ayan, Bonnie J. Dorr, Christof Monz, Alignment Link Projection Using Transformation-Based Learning, HLT-EMNLP 2005
Mar.10 (Roy Tromble)
Terry Koo, Michael Collins, Hidden-Variable Models for Discriminative Reranking, HLT-EMNLP 2005
Mar.3 (Jason Riesa)
Hal Daume III, Daniel Marcu, Domain Adaptation for Statistical Classifiers, Journal of Artificial Intelligence Research, 2006
J. Gorman, J. Curran, Approximate Searching for Distributional Similarity, Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, 2005
Feb. 23 (Omar F. Zaidan)
Ravichandran, Pantel, Hovy, Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering, ACL 2005

Consensus decoding

Feb. 16 (Noah A Smith)
Khalil Sima'an, Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars, COLING 1996
Francisco Casacuberta, Colin de la Higuera, Computational complexity of problems on probabilistic grammars and transducers, LNAI 1981
For a longer and more HMM/compbio view and extended results, see
Rune B. Lyngsoe, Christian N. S. Pederson, The Consensus String Problem and the Complexity of Comparing Hidden Markov Models, Journal of Computer and System Sciences 65:545-69, 2002

Extracting idioms

Feb. 9 (John Blatz)
Dominic Widdows, Beate Dorow, Automatic Extraction of Idioms using Graph Analysis and Asymmetric Lexicosyntactic Patterns, Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, 2005
Afsaneh Fazly, Suzanne Stevenson, Automatic Acquisition of Knowledge about Multiword Predicates, Proceedings of the 19th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2005).

Fall 2005

Good recent papers

Nov. 23 (Roy Tromble)
Sutton, Charles and McCallum, Andrew, Composition of Conditional Random Fields for Transfer Learning, HLT-EMNLP 2005
Nov. 16 (Safiullah Shareef)
Hassan Sawaf, Jörg Zaplo, Hermann Ney, Statistical Classification Methods for Arabic News Articles
Nov. 4 (Jason Riesa)
Luke S. Zettlemoyer, Michael Collins., Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial, UAI 2005
Oct. 27 (Markus Dreyer)
D. Roth and W. Yih, Integer Linear Programming Inference for Conditional Random Fields, ICML 2005
Oct. 20 (Roy Tromble)
Sheila M. Reynolds, Jeff A. Bilmes, Part-of-Speech Tagging using Virtual Evidence and Negative Training, HLT-EMNLP 2005

Statistical learning theory

Sep. 21 (Arnab Ghoshal)
M. Jordan,Statistical Learning Theory, Chapters 2-3
Sep. 14 (Nikesh Garera)
M. Jordan,Statistical Learning Theory, Chapter 8 (Exponential family and Generalized linear models)

Summer 2005

Gibbs sampling

Sep. 1 (John, Markus, & Nikesh)
B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581, version 26 April 2004
Aug. 26 (Roy Tromble)
Jenny Rose Finkel, Trond Grenager, Christopher Manning, Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling, ACL 2005

AI

Aug. 19 (John Blatz)
Niyogi, Sourabh, Steps Toward Deep Lexical Acquisition, ACL 2005

Unsupervised or semi-supervised EM

Aug. 5 (Adam)
Duh, Kevin and Kirchhoff, Katrin, Tagging of Dialectal Arabic: A Minimally Supervised Approach, ACL 2005
Jul. 28 (Zak)
Takuya Matsuzaki, Yusuke Miyao, Jun'ichi Tsujii, Probabilistic CFG with Latent Annotations, ACL 2005
Jul. 21 (Keith)
Sharon Goldwater and Mark Johnson, Representational Bias in Unsupervised Learning of Syllable Structure, ACL 2005
Jul. 21 (Damianos)
Ando, Rie and Zhang, Tong, A High-Performance Semi-Supervised Learning Method for Text Chunking, ACL 2005

Learning optimality-theoretic grammars

Jul. 14 (John Blatz)
Ying Lin, Learning Stochastic OT Grammars: A Bayesian Approach using Data Augmentation and Gibbs Sampling, ACL 2005
Jul. 14 (Roy Tromble)
Sharon Goldwater and Mark Johnson, Learning OT Constraint Rankings Using a Maximum Entropy Model, Proceedings of the Workshop on Variation within Optimality Theory, 2003

Spring 2005

May 7 (Markus Dreyer)
M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, M. Gori, Focused Crawling Using Context Graphs, 26th International Conference on Very Large Databases, VLDB 2000
Adam Kilgarriff, Gregory Grefenstette, Introduction to the Special Issue on the Web as Corpus, Computational Lingustics, 2003
Apr. 28 (Damianos Karakos)
Alessandro Moschitti and Roberto Basili, Complex Linguistic Features for Text Classification: A comprehensive study, Proceedings of the 26th European Conference on Information Retrieval Research (ECIR 2004)
Apr. 21 (Omar F. Zaidan)
Tin Kam Ho, Jonathan J. Hull, Sargur N. Stihari, Decision Combination in Multiple Classifier Systems, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.16. No I. Jan. 1994
Dan Klein, Kristina Toutanova, H. Tolga Ilhan, Sepandar D. Kamvar and Christopher D. Manning, Combining Heterogeneous Classifiers for Word-Sense Disambiguation, ACL 2002
Apr. 16 (Brock Pytlik)
V. Lavrenko, S.L Feng, R. Manmatha, Statistical models for automatic video annotation and retrieval, ICASSP 2004
S.L Feng, R. Manmatha, V. Lavrenko, Multiple Bernoulli Relevance Models for Image and Video Annotation
The first is a short paper about the relevance model. The second is a follow up paper that details a subsequent model based on the CRM.
Apr. 9 (Noah A Smith)
G. Elidan, N. Friedman., The Information Bottleneck EM Algorithm, UAI 2003
G. Elidan, N. Friedman, Learning Hidden Variable Networks, JMLR 2005
Feb. 25, Mar. 4, Mar. 11, Apr. 2 (David Smith)
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, Learning in Graphical Models, MIT Press, 1999

Fall 2004

Nov. 27 (Jia Cui)
David M. Blei, Andrew Y. Ng, Michael I. Jordan, Latent Dirichlet Allocation, JMLR 2003
Other papers on LDA: [www.cs.toronto.edu/~ywteh/research/npbayes/report.pdf], [15]
Nov. 20 (David Smith)
Olle Häggström and Karin Nelander, On Exact Simulation of Markov Random Fields Using Coupling from the Past, Foundation of the Scandinavian Journal of Statistics, 1999
James Fill and Mark Huber, The Randomness Recycler: A New Technique for erfect Sampling, FOCS 2000
Nov. 13 (Charles Schafer)
Endika Bengoextea, Inexact Graph Matching Using Estimation of Distribution Algorithms, Chapter 2: The graph matching problem, Ph.D dissertation, 2002
This chapter is general to the field although pretty sweeping and unspecific as a result. It probably makes a good introduction, since it gives an idea of the scope and diversity of the problem and proposed techniques ...
Yakov Keselman, Ali Shokoufandeh, M. Fatih Demirci, Sven Dickinson, Many-to-Many Graph Matching via Metric Embedding, Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE
This is a state of the art paper which is quite dense but quite interesting. solves a very general formulation of inexact graph matching by first imbedding graphs into a normed space ...
Nov. 5 (Michelle Vanni)
Robert S. Swier and Suzanne Stevenson, Unsupervised Semantic Role Labelling, EMNLP 2004
Nianwen Xue and Martha Palmer, Calibrating Features for Semantic Role Labelling, EMNLP 2004
Oct. 29 (Eric Goldlust)
Stephen Clark and James Curran, Parsing the WSJ using CCG and Log-Linear Models, ACL 2004
Oct. 22 (Michelle Vanni)
Dekang Lin and Franz Och, Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence, ACL 2004
Babych and Hartley, Extending the BLEU MT Evaluation Method with Frequency Weightings, ACL 2004
Oct. 15 (John Blatz)
Daichi Mochihashi, Genichiro Kikui, Kenji Kita, Learning Nonstructural Distance Metric by Minimum Cluster Distortions, EMNLP 2004
Oct. 2 (Nguyen Bach)
Background knowledge on SVM and Graphical Models:
Sep. 24, Oct. 7 (Roy Tromble)
B. Taskar, C. Guestrin and D. Koller, Max-Margin Markov Networks, Neural Information Processing Systems Conference (NIPS03), 2003
B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning, Max-Margin Parsing, EMNLP 2004
Sep. 9 (John Blatz)
Pascale Fung and Percy Cheung, Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM, ACL 2004
Dragos Stefan Munteanu, Alexander Fraser and Daniel Marcu, Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora, ACL 2004
Sep. 2 (Gideon Mann)
Xin Li, Paul Morie, and Dan Roth, Robust Reading: Identification and Tracing of Ambiguous Names, ACL 2004
Cheng Niu, Wei Li, Rohini K. Srihari, Weakly Supervised Learning for Cross-Document Person-Name Disambiguation Supported by Information Extraction, ACL 2004
Aug. 27 (David Smith)
I. Dan Melamed, Statistical Machine Translation by Parsing, ACL 2004
Daniel Gildea, Dependencies vs. Constituents for Tree-Based Alignment, ACL 2004
Aug. 20 (Damianos Karakos, Charles Schafer)
P. Pantel and D. Lin, Discovering word senses from text, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002
Diana McCarthy, Rob Koeling, Julie Weeds, John Carroll, Finding Predominant Word Senses in Untagged Text, 2004

Spring 2004

Information extraction

May 15 (Roy Tromble)
Fuchun Peng, Andrew McCallum, Accurate Information Extraction from Research Papers using Conditional Random Fields,2004
May 1 (Izhak Shafran)
Eric J. Friedman, Strong Monotonicity in Surplus Sharing, 1999
Used Tom Dietterich has a web page on probabilistic relational models:, [16]
Apr. 24 (David Smith)
McCallum and Jensen, Extraction and Data Mining using Conditional-Probability Relational Models, IJCAI'03 Workshop on Learning Statistical Models from Relational Data, 2003
The paper is a survey of recent trends in IE and data mining (biased of course towards the authors' work) and a proposal to unify them with conditional random fields.

Combinatorial optimization

Apr. 17 (Elliott Franco Drabek)
Rina Dechter, Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning, 2001
Apr. 10 (Noah Ashton Smith)
Denys Duchier, Axiomatizing Dependency Parsing Using Set Constraints, Sixth Meeting on Mathematics of Language, 2000
Apr. 3 (Roy Tromble)
Roman Bartak, Constraint Programming: In Pursuit of the Holy Grail, 1999

Learning how to search

Mar. 25 (Eric Goldlust)
Boyan and Moore, Learning Evaluation Functions to Improve Optimization by Local Search, Journal of Machine Learning Research, 2000

Discourse, summarization, paraphrase

Mar. 18 (Markus Dreyer)
Eugene Charniak, Niyu Ge, John Hale, A Statistical Approach to Anaphora Resolution, Proceedings of the Sixth Workshop on Very Large Corpora, 1998
Mar. 5 (Charles Schafer)
Daniel Marcu, Theory and Practice of Discourse Parsing and Summarization, Chapters 2 & 3, The MIT Press, 2000
Feb. 19 (David Smith)
Barzilay and Lee, Learning to Paraphrase: An Unsupervise Approach Using Multiple-Sequence Alignment, HLT 2003

Optimality theory

Feb. 12 (Brock Pytlik)
Bob Frank, Giorgio Satta, Optimality theory and the Generative Complexity of Constraint Violability, MIT Press
Feb. 5 (Brock Pytlik)
Jessica A. Barlow and Judith A. Gierut, Optimality theory in phonological acquisition, Journal of Speech, Language and Hearing 42, 1999
Paul Boersma, Joost Dekkers and Jeroen van de WeijerIntroduction. In Optimality Theory: Phonology, Syntax and Acquisition, Oxford University Press 2000

Fall 2003

Dec. 12 (Paola Virga)
Kamal Nigam and Rayid Ghani, Analyzing the Effectiveness and Applicability of Co-training, Ninth International Conference on Information and Knowledge Management 2000
Nov. 20 (Noah A. Smith)
Rebecca Hwa, Miles Osborne, Anoop Sarkar, Mark Steedman, Corrected Co-training for Statistical Parsers, ICML 2003
Nov. 13 (Markus Dreyer)
Goldman and Zhou, Enhancing Supervised Learning with Unlabeled Data, ICML 2000
An additional paper with some experiments:
Clark, Curran and Osborne, Bootstrapping POS taggers using Unlabelled Data, CoNLL 2003
Nov. 6 (Brock Pytlik)
Stuart M. Shieber, Transducers as a Substrate for Natural Language Processing
Oct. 31 (Roy Tromble)
Dekai Wu, An algorithm for simultaneously bracketing parallel texts by aligning words, ACL 1995
Oct. 24 (Markus Dreyer)
Stuart M. Shieber, Yves Schabes, Synchronous Tree-Adjoining Grammars, Coling 1990
An additional closely related paper: Stuart M. Shieber, Yves Schabes, Generation and Synchronous Tree-Adjoining Grammars, Fifth International Workshop on Natural Language Generation
Oct. 10 (David Smith)
Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 6-7, Blackwell (1989)
Oct. 3 (Michelle Vanni)
Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 4-6, Blackwell (1989)
Sep. 18 (David Smith)
Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 2-3, Blackwell (1989)
Sep. 11 (Elliott Franco Drabek)
Bernard Comrie, Language Universals Linguistic Typology: Syntax and Morphology Language Universals, Chapters 1, Blackwell (1989)

Spring 2003

May 15 (Chal Haithaidharm)
V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 7B, 8, 9
May 8 (Noah Smith)
V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 6B - 7A
May 1 (Noah Smith)
V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 5B - 6A
Apr. 24 (Paola Virga)
V. N. Vapnik, The Nature of Statistical Learning Theory, Chapters 4B - 5A
Apr. 17 (Roy Tromble)
V. N. Vapnik, The Nature of Statistical Learning Theory,Chapters 2B - 4A
Apr. 10
V. N. Vapnik, The Nature of Statistical Learning Theory, Intro and Chapters 1, 2A
Mar.20 (Roy Tromble)
Nikita Schmid, Ahmed Patel, Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data
Mar.6 (Paola Virga)
Carl M. Kadie, Christopher Meek, David Heckerman, A Collaborative Filtering System Using Posteriors Over Weights of Evidence, Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.
Feb. 26 (Elliott Drabek)
Steven Abney, Bootstrapping, ACL'02
Feb. 19 (Elliott Drabek)
A. Lopez, M. Nossal, R. Hwa, P. Resnik, Word-level Alignment for Multilingual Resource Acquisition, Proceedings of the 2002 LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data
Feb. 13 (David Smith)
K. Church, Empirical Estimates of Adaptation: The chance of Two Noriega's is closer to p/2 than p2, COLING 2000, pp. 173-179

Fall 2002

Jul. 31 (Paola Virga)
Kenji Yamada, Kevin Knight, A decoder for Syntax-based Statistical MT, ACL 2002
Jul. 24 (Michelle Vanni)
Paola Merlo, A Multilingual Paradigm for Automatic Verb Classification, ACL 2002
Dec. 5 (Silviu Cucerzan)
Darren Pearce, A Comparative Evaluation of Collocation Extraction Techniques, LREC 2002
D. Lin, Automatic identification of non-compositional phrases, ACL 1999
Nov. 21 (Silviu Cucerzan)
Ueda, Nakano, Ghahramani, Hinton, SMEM Algorithm for Mixture Models, Neural Information Processing Systems 1998
Nov. 14 (Michelle Vanni)
Marti Hearst, Untangling Text Data Mining, ACL 1999
Nov. 7 (Neda Khalili)
Yamamoto, Church, Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus, Computational Linguistics 2001
A related paper: Kageura, Bigram Statistics Revisited A Comparative Examination of Some Statistical Measures in Morphological Analysis of Japanese Kanji Sequences
Nov. 1 (Chalaporn Hathaidharm)
J. Gao, J. Goodman, M. Li, K. Lee, Toward A Unified Approach To Statistical Language Modeling For Chinese, ACM Transactions on Asian Language Information Processing, Vol. 1, No. 1, pp 3-33. 2002.
Oct. 24 (Roy Tromble)
Han, Benjamin, Building a Bilingual Dictionary with Scarce Resources: A Genetic Algorithm Approach
Oct. 17 (David Smith)
Cotton, Bird, An Integrated Framework for Treebanks and Multilayer Annotations, LREC 2002
Oct. 8 (Elliott Franco Drabek)
Ravichandran, Hovy, Learning Surface Text Patterns for a Question Answering System, ACL 2001
A similar paper: Lin, Pantel, Discovery of Inference Rules for Question Answering, KDD 2001
Oct. 2 (Gideon Mann)
Gildea, Jurafsky, Automatic Labeling of Semantics Roles, ACL 2001
Sep. 26 (Paul Ruhlen)
Hwa, Resnik, Weinberg, Kolak, Evaluating Translational Correspondence using Annotation Projection, ACL 2002
Sep. 19 (Paola Virga)
Yamada, Knight, A decoder for Syntax-based Statistical MT, ACL 2002
Sep. 10 (Noah A. Smith)
Collins, Duffy., New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron, ACL 2002

Spring 2002

Apr. 25 (Paul Ruhlen)
H. Al-Adhaileh, Kong, Melamed, Malay-English Bitext Mapping and Alignment Using SIMR/GSA Algorithms, Malaysian National Conference on Research and Development on Lingustics 2001
Apr. 18 (Paul Ruhlen)
N. A. Rao, K. Rose, Deterministically annealed design of hidden Markov model speech recognizers, IEEE Trans. on Speech and Audio Processing, vol. 9, (no. 2), Feb. 2001
Apr. 11 (Paola Virga)
Neal, Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in Graphical Models, 1999
And this article builds on the above. It tests an incremental version of EM (carefully choosing how incremental it will be), as well as a "lazy EM" version that visits "significant" cases more often.
Mar. 28 (Swapna Somasundaran)
Crestan, El-Beze, Improving supervised WSD by including rough semantic features in a Multilevel view of the Context, SEMPRO Workshop, Edinburgh, 2001.
Mar. 14 (Noah A. Smith)
Ratnaparkhi, A Simple Introduction to Maximum Entropy Models for NLP, Institute for Research in Cognitive Science, Univ. of Penn.
Feb. 28 (Silviu Cucerzan)
Marcu, Towards a Unified Approach to Memory- and Statistical-Based Machine Translation, Annual Meeting of the ACL, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics '2001
Feb. 21 (Jia Cui)
Barzilay, McKeown, Extracting Paraphrases from a Parallel Corpus, Computer Science Department, Columbia Univ.
Feb. 14 (Charles Schafer)
Yaser, Germann, Translating with Scarce Resources, American Association for Artificial Intelligence 2000
Feb. 7 (Paola Virga)
Knight, Graehl, Machine Transliteration, ACL-EACL 1997

Fall 2001

Dec. 14 (Jia Cui)
Jerome Bellegarda, Exploiting latent semantic information in statistical language models, Proceedings of the IEEE, 88:8, Aug. 2000
Nov. 29 (Silviu Cucerzan)
Mike Collins, Yoram Singer, Unsupervised Models for Named Entity Classification, EMNLP/VLC'99
Nov. 20 (Radu Florian)
Blum, Mitchell, Combining Labeled and Unlabeled Data with Co-Training, COLT 1998
Nov. 16 (Richard Wicentowski)
Eisner, Satta, Efficient parsing for bilexical context-free grammars and head automaton grammars, ACL 1999
Plagiarism detection systems might be relevant to bitext alignment. A message to the Corpora list yesterday announced the following review paper: [17]
Nov. 2 (Paul Ruhlen)
Manning, Schuetze, Foundations of Statistical Natural Language Processing, Section 14 (clustering), pp. 495-527, MIT Press
Oct. 26 (Gideon Mann)
Tishby, Pereira, Bialek, The information bottleneck method
The paper describes a clustering method which is a generalization of their earlier work on "Distributional Clustering of English Words" (Pereira, Tishby and Lee '93).