CS 260

CS 260: Seminar in Text Mining

Fall 2017

Instructor: Vagelis Hristidis (aka Evangelos Christidis)

Seminar time: MWF 3:10-4:00 pm

Location: Watkins Hall | Room 1117

Main Topics:

Information extraction
Text classification
Product reviews analysis
Chatbots

Grading

Presentations: 80%
Participation: 20%

Presentations Schedule

https://docs.google.com/spreadsheets/d/1Hzpz94D9XxmDnBxfx4ePXNpgGLxb3lCPvhI_aGC9AxQ/edit?usp=sharing

Several papers are chapters from:

CC Aggarwal, CX Zhai. Mining text data. Kluwer Academic Publishers, 2012 (to download for free you must be inside UCR network)

Date	Presenter	Paper	Topic
9/29	Vagelis	Intro and presentation assignments; Intro on text mining (clustering, classification, information extraction), reviews analysis (extraction, sentiment), chatbots
10/2		cancelled
10/4		1. Chapter 2 (Information Extraction from Text, Jing Jiang, 11-35)	Information Extraction, Summarization
10/6
10/9		2. Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM	Reviews Analysis
10/11		3. Extracting product features and opinions from reviews AM Popescu, O Etzioni - Natural language processing and text mining, 2007 - Springer
10/13
10/16		4. Chapter 6 (A Survey of Text Classiﬁcation Algorithms, Charu C. Aggarwal and ChengXiang Zhai, 163- 213, double)	Classification
10/18		5. cont'd
10/20
10/23		6. Tutorial on how to use WEKA for text classification (use resources from https://www.youtube.com/watch?v=IY29uC4uem8, https://weka.wikispaces.com/Text+categorization+with+WEKA, and so on)
10/25		7. Chapter 12 (Text Analytics in Social Media, Xia Hu and Huan Liu, 385-408)	Social media
10/27
10/30		8. Chapter 13 (A Survey of Opinion Mining and Sentiment Analysis, Bing Liu and Lei Zhang, 415- 453, double)	Sentiment analysis
11/1		9. cont'd
11/3
11/6		10. Xie, S., Wang, G., Lin, S., & Yu, P. S. (2012, August). Review spam detection via temporal pattern discovery. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 823-831). ACM	Spam reviews
11/8 (11/10 is holiday)		11. Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., & Ghosh, R. (2013, August). Spotting opinion spammers using behavioral footprints. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 632-640). ACM
11/13		12. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. (include short demo and material from https://code.google.com/p/word2vec/, double)	word2vec
11/15		13. cont'd (demo)
11/17
11/20 (Thanksgiving week)		14. Intro to LSTM Neural Networks (http://colah.github.io/posts/2015-08-Understanding-LSTMs/, https://deeplearning4j.org/lstm.html) + demo (https://www.tensorflow.org/tutorials/recurrent) (double)	Deep Learning, Application to email auto-reply
11/27		15. cont'd (demo)
11/29		16. Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. Smart Reply: Automated Response Suggestion for Email. In Proc. of KDD, 2016, 955-964
12/1
12/4		17. Liu, Chia-Wei, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. "How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation." arXiv preprint arXiv:1603.08023 (2016). Harvard	chatbots
12/6		18. Xu, A., Liu, Z., Guo, Y., Sinha, V., & Akkiraju, R. (2017, May). A New Chatbot for Customer Service on Social Media. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 3506-3510). ACM.
12/8

If we have time we can also cover:

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. the Journal of machine Learning research, 3, 993-1022 + demo using http://mallet.cs.umass.edu/topics.php or other (double)

Tutorial on what is Stanford POS Tagger and how to use it (use material from http://nlp.stanford.edu/software/tagger.shtml)

Kiritchenko, S., Zhu, X., & Mohammad, S. M. (2014). Sentiment analysis of short informal texts http://www.jair.org/media/4272/live-4272-8102-jair.pdf. Journal of Artificial Intelligence Research, 723-762. (maybe double)

Ronen Feldman. 2013. Techniques and applications for sentiment analysis. Commun. ACM 56, 4 (April 2013), 82-89. DOI=10.1145/2436256.2436274 http://doi.acm.org/10.1145/2436256.2436274

Quoc V. Le, Tomas Mikolov, Distributed Representations of Sentences and Documents, arxiv, 2014 (also show short demo)

Ritter, Alan, Colin Cherry, and William B. Dolan. "Data-driven response generation in social media." In Proceedings of the conference on empirical methods in natural language processing, pp. 583-593. Association for Computational Linguistics, 2011.

Other interesting papers:

Chapter 3 (A Survey of Text Summarization Techniques, Ani Nenkova and Kathleen McKeown, 43-78, double)

Chapter 4 (A Survey of Text Clustering Algorithms, Charu C. Aggarwal and ChengXiang Zhai, 77-121, double)

Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. JAsIs, 41(6), 391-407.

Chapter 11 (Text Mining in Multimedia, Zheng-Jun Zha, Meng Wang, Jialie Shen and Tat-Seng Chua, 361- 379, shorter)

Bollen, J., Gonçalves, B., Ruan, G., & Mao, H. (2011). Happiness is assortative in online social networks. Artificial life, 17(3), 237-251

Chapter 14 (Biomedical Text Mining: A Survey of Recent Progress, Matthew S. Simpson and Dina Demner-Fushman, 465-495)

Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013, October). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642).

Presentation tips:

There are many sources, but here is just one:

http://www.washington.edu/doit/presentation-tips-0