Open Access   Article

Know Your Doctor: Topic Modeling and Sentiment Analysis Based Approach To Review Doctor

K. Kavya1 , C. Sreejith2

Section:Research Paper, Product Type: Journal Paper
Volume-06 , Issue-06 , Page no. 37-42, Jul-2018


Online published on Jul 31, 2018

Copyright K. Kavya, C. Sreejith . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library


IEEE Style Citation: K. Kavya, C. Sreejith, Know Your Doctor: Topic Modeling and Sentiment Analysis Based Approach To Review Doctor, International Journal of Computer Sciences and Engineering, Vol.06, Issue.06, pp.37-42, 2018.

MLA Style Citation: K. Kavya, C. Sreejith "Know Your Doctor: Topic Modeling and Sentiment Analysis Based Approach To Review Doctor." International Journal of Computer Sciences and Engineering 06.06 (2018): 37-42.

APA Style Citation: K. Kavya, C. Sreejith, (2018). Know Your Doctor: Topic Modeling and Sentiment Analysis Based Approach To Review Doctor. International Journal of Computer Sciences and Engineering, 06(06), 37-42.



Nowadays people tend to search for doctors or firms through business review websites. They naturally opt for doctors that have the very best ratings and an outsized variety of reviews that support those high ratings. Hundreds or perhaps thousands of reviews will be given to the best-rated ones beneath their profiles, and comparing a high rated option to every alternative becomes a tedious task. This paper aims to address this issue by making a summarizer to analyze the doctors review by performing topic modeling using Latent Dirichlet Allocation(LDA) and Word2Vec based sentiment analysis. LDA is a standard Natural Language Processing (NLP) technique to determine topics from a large corpus. Word2vec based sentiment analysis is used to study people`s opinions, attitudes and emotions towards a review. Word2vec is a neural network with two-layer that embeds the text corpus to a set of feature vectors of the words in the corpus. The reviews are taken from Yelp, an online rating website, of doctors across San Francisco. As a result of this study, a snapshot is created for each doctor with most dominant topics and the overall sentiment from their reviews.

Key-Words / Index Term

LDA, NLP, Sentiment Analysis, Topic Modeling, Word2Vec


[1] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning research, 3(Jan), 993-1022.
[2] Anon, (2018). [online] Available at: [Accessed 13 Apr. 2018].
[3] Ma, L., & Zhang, Y. (2015, October). Using Word2Vec to process big text data. In Big Data (Big Data), 2015 IEEE International Conference on (pp. 2895-2897). IEEE.
[4] Santosh, D. T., & Vardhan, B. V. (2015). Obtaining feature-and sentiment-based linked instance RDF data from unstructured reviews using ontology-based machine learning. International Journal of Technology (2015) 2: 198, 2006.
[5] Yang, Y., Downey, D., & Boyd-Graber, J. (2015). Efficient methods for incorporating knowledge into topic models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 308-317).
[6] Wang, Stephanie, & Drach, Max. (2017). Latent Dirichlet Allocation for Identifying Topics in AI.
[7] Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605.
[8] Zhao, D., He, J., & Liu, J. (2014, April). An improved LDA algorithm for text classification. In Information Science, Electronics and Electrical Engineering (ISEEE), 2014 International Conference on (Vol. 1, pp. 217-221). IEEE.
[9] Onan, A., Korukoglu, S., & Bulut, H. (2016). LDA-based Topic Modelling in Text Sentiment Classification: An Empirical Analysis. Int. J. Comput. Linguistics Appl., 7(1), 101-119.
[10] Wang, Z., Ma, L., & Zhang, Y. (2016, June). A Hybrid Document Feature Extraction Method Using Latent Dirichlet Allocation and Word2Vec. In Data Science in Cyberspace (DSC), IEEE International Conference on (pp. 98-103). IEEE.
[11] Esposito, F., Corazza, A., & Cutugno, F. (2016). Topic Modelling with Word Embeddings. CLiC it, 129.
[12] Sharma, R. D., Tripathi, S., Sahu, S. K., Mittal, S., & Anand, A. (2016). Predicting online doctor ratings from user reviews using convolutional neural networks. International Journal of Machine Learning and Computing, 6(2), 149.
[13] Rohani, V. A., Shayaa, S., & Babanejaddehaki, G. (2016, August). Topic modeling for social media content: A practical approach. In Computer and Information Sciences (ICCOINS), 2016 3rd International Conference on (pp. 397-402). IEEE.
[14] " GitHub. (2018). nuwapi/DoctorSnapshot. [online] Available at: [Accessed 12 Apr. 2018]."
[15] " (2018). gmplot 1.0.5 : Python Package Index. [online] Available at: [Accessed 12 Apr. 2018]."
[16] " (2018). wordcloud 1.4.1 : Python Package Index. [online] Available at: [Accessed 12 Apr. 2018]."
[17] "Zhang, D., Xu, H., Su, Z., & Xu, Y. (2015). Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications, 42(4), 1857-1863."
[18] " (2018). BetterDoctor :: BetterDoctor - The Origin of Accurate Provider Data. [online] Available at: [Accessed 9 Apr. 2018]."
[19] "Yelp. (2018). Yelp. [online] Available at: [Accessed 9 Apr. 2018]."
[20] "Alexis Perrier - Data Science. (2018). Segmentation of Twitter Timelines via Topic Modeling. [online] Available at:\_twitter\_timelines\_lda\_vs\_
ewline lsa.html [Accessed 9 Apr. 2018]."
ewline Classifier.html
[22] Paul, M. J., Wallace, B. C., & Dredze, M. (2013, June). What affects patient (dis) satisfaction? Analyzing online doctor ratings with a joint topic-sentiment model. In AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI.