CS 260: Seminar in Text Mining

Fall 2017


Instructor: Vagelis Hristidis (aka Evangelos Christidis)

Seminar time: MWF 3:10-4:00 pm

Location: Watkins Hall | Room 1117


Main Topics:


Presentations Schedule



Several papers are chapters from:

CC AggarwalCX Zhai. Mining text data. Kluwer Academic Publishers, 2012 (to download for free you must be inside UCR network)

Date Presenter Paper Topic
9/29 Vagelis Intro and presentation assignments; Intro on text mining (clustering, classification, information extraction), reviews analysis (extraction, sentiment), chatbots  
10/2   cancelled  
10/4 1. Chapter 2 (Information Extraction from Text, Jing Jiang, 11-35) Information Extraction, Summarization

2. Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM

Reviews Analysis
10/11   3. Extracting product features and opinions from reviews AM Popescu, O Etzioni - Natural language processing and text mining, 2007 - Springer  
10/16   4. Chapter 6 (A Survey of Text Classification Algorithms, Charu C. Aggarwal and ChengXiang Zhai, 163- 213, double) Classification

5. cont'd

10/23   6. Tutorial on how to use WEKA for text classification (use resources from https://www.youtube.com/watch?v=IY29uC4uem8, https://weka.wikispaces.com/Text+categorization+with+WEKA, and so on)  

7. Chapter 12 (Text Analytics in Social Media, Xia Hu and Huan Liu, 385-408)

Social media

8. Chapter 13 (A Survey of Opinion Mining and Sentiment Analysis, Bing Liu and Lei Zhang, 415- 453, double)

Sentiment analysis
11/1   9. cont'd  

10. Xie, S., Wang, G., Lin, S., & Yu, P. S. (2012, August). Review spam detection via temporal pattern discovery. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 823-831). ACM

Spam reviews
11/8 (11/10 is holiday)   11. Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., & Ghosh, R. (2013, August). Spotting opinion spammers using behavioral footprints. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 632-640). ACM   

12. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. (include short demo and material from https://code.google.com/p/word2vec/, double)

11/15 13. cont'd (demo)
11/20 (Thanksgiving week)   14. Intro to LSTM Neural Networks (http://colah.github.io/posts/2015-08-Understanding-LSTMs/, https://deeplearning4j.org/lstm.html) + demo (https://www.tensorflow.org/tutorials/recurrent) (double) Deep Learning, Application to email auto-reply

15. cont'd (demo)

11/29   16. Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. Smart Reply: Automated Response Suggestion for Email. In Proc. of KDD, 2016, 955-964  
12/4 17. Liu, Chia-Wei, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. "How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation." arXiv preprint arXiv:1603.08023 (2016). Harvard
12/6   18. Xu, A., Liu, Z., Guo, Y., Sinha, V., & Akkiraju, R. (2017, May). A New Chatbot for Customer Service on Social Media. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 3506-3510). ACM.  

If we have time we can also cover:

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. the Journal of machine Learning research, 3, 993-1022 + demo using http://mallet.cs.umass.edu/topics.php or other (double)

Tutorial on what is Stanford POS Tagger and how to use it (use material from http://nlp.stanford.edu/software/tagger.shtml)

Kiritchenko, S., Zhu, X., & Mohammad, S. M. (2014). Sentiment analysis of short informal texts http://www.jair.org/media/4272/live-4272-8102-jair.pdf. Journal of Artificial Intelligence Research, 723-762. (maybe double)

Ronen Feldman. 2013. Techniques and applications for sentiment analysis. Commun. ACM 56, 4 (April 2013), 82-89. DOI=10.1145/2436256.2436274 http://doi.acm.org/10.1145/2436256.2436274

Quoc V. Le, Tomas Mikolov, Distributed Representations of Sentences and Documents, arxiv, 2014 (also show short demo)

Ritter, Alan, Colin Cherry, and William B. Dolan. "Data-driven response generation in social media." In Proceedings of the conference on empirical methods in natural language processing, pp. 583-593. Association for Computational Linguistics, 2011. 

Other interesting papers:

Chapter 3 (A Survey of Text Summarization Techniques, Ani Nenkova and Kathleen McKeown, 43-78, double)

Chapter 4 (A Survey of Text Clustering Algorithms, Charu C. Aggarwal and ChengXiang Zhai, 77-121, double)

Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. JAsIs, 41(6), 391-407. 

Chapter 11 (Text Mining in Multimedia, Zheng-Jun Zha, Meng Wang, Jialie Shen and Tat-Seng Chua, 361- 379, shorter)

Bollen, J., Gonçalves, B., Ruan, G., & Mao, H. (2011). Happiness is assortative in online social networks. Artificial life, 17(3), 237-251

Chapter 14 (Biomedical Text Mining: A Survey of Recent Progress, Matthew S. Simpson and Dina Demner-Fushman, 465-495)

Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013, October). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642).

Presentation tips:

There are many sources, but here is just one: