Data Science and Text Analytics (DATA) Lab

The DATA Lab focuses on solving real-world problems by applying techniques from the broad area of data science and data analytics on both structured and unstructured data. The lab also conducts research on applying machine learning techniques to analyze textual and social media data.

Members

Dr. Praveen Madiraju, Director

Current Students

Kevin Chovanec: PhD Student (part-time)
John Fields : PhD Student (part-time)
Sajjad Islam : PhD Student
Lucy Le : PhD Student
Manoj Purohit : PhD Student
Jiawei Wu : PhD Candidate

Graduated PhD students

Dr. Priyanka Annapureddy, 2022
Dr. Paromita Nitu, 2023

Research Projects

Healthcare Analytics

1) PEER SURE App: A Peer Supported Substance Use Research and Education Web Application with Generative AI

Opioid use disorder (OUD) through prescription opioid misuse, heroin and fentanyl use has increased dramatically in the past 20 years, with an estimated 2 to 5 million adults suffering from OUDs each year. OUDs are responsible for significant increases in morbidity and mortality. The consequences of opioid use disorder are significant and can be devastating for individuals, families, and communities. Opioid misuse and addiction can lead to a range of physical and mental health problems, including overdose, chronic pain, depression, anxiety, and other mental health disorders. Opioid use disorder can also have a profound impact on social and economic well-being, leading to unemployment, homelessness, criminal justice involvement, and other negative outcomes.

People who use drugs (PWUD) are often reluctant to seek care because of underlying social, psychological causes of crisis and stigma associated with drug use. PWUD relate more effectively to peer support specialists (PSS) with lived or living experiences of using drugs thus alleviating the stigma and associate a sense of trust and belonging with peer specialists. Hence, there is growing evidence of peer support as an effective harm reduction strategy. The goal of this project is to develop a peer support-based substance use research and education web application (PEER SURE App) enhanced with Generative AI. The PEER SURE App should be simple, easy to use, secure, mobile- and desktop- friendly web application for both the participants and the peer support specialists. The goal of the App is to help the participants with addiction recovery, provide near real-time support and accountability.

PEER SURE App will (i) enable pairing of PWUD with PSS, (ii)support PWUD to check-in regularly by answering short questions, (iii) create separate private channels/rooms between PWUD and their assigned PSS, (iv) enhance generation of support text using generative AI based on the context of text exchange between the participant and peer support specialist. The AI generated text will be thoroughly vetted by PSS with a human in loop before posting to the participant, and (v) provide PSS with a simple and intuitive dashboard to visualize scores and behavior changes of their assigned participants over time. The peer support specialists can call or text their participant based on the responses to questions and change in scores. The App will also support the specialists to save feedback about their participants each week. Data in the App should be stored on a secure HIPAA compliant server with appropriate IRB oversight in place.

2) Identifying PTSD Crisis Events for Veterans using Machine Learning

PTSD is a psychological disorder most seen in individuals who experience trauma. PTSD is more common in veterans who go through trauma in the war zones. A recent study notes that nearly 19-42% of the veterans returning from recent war zones experienced mental illness and of them nearly 31% from Iraq and 11% from Afghanistan were diagnosed with PTSD. Recognizing the early warning signs of PTSD often help in preventing the returning or worsening of PTSD symptoms.

In this work, we collaborate with DryHootch, a community based veteran focused organization. We have recruited veterans to participate in a 12-week peer mentoring program. We use socio demographic, base line PTSD, weekly EMA (Ecological Momentary Assessment) data to predict crisis events and eventually alert peer mentors of a possible crisis event. We also use associative patterns of PTSD to identify crisis rules. Associative rule mining is employed to find these crisis patterns and to build a classification model to predict the likelihood of crisis in veterans. Findings from these models can be integrated with the existing QRF m-health framework to generate text alerts to the mentors when the crisis patterns are observed in their mentees. Such an integrated crisis prediction and alerting system would add benefit mentors to plan intervention.

3) Using Hospital Records of Patients Presenting to Hospital to Predict Risk of Opioid Use Disorder (OUD), Fatal and Non-fatal Opioid Overdose, and ED Readmission.

The goal of this study is to utilize hospital records to identify key factors such as health data, social determinants and others that may be used to predict a patient’s risk of developing Opioid Use Disorder (OUD), overdose, or readmission at various time intervals. By assessing a patient’s risk, providers (physicians, social workers, advanced practice providers) can implement specific prevention strategies aimed at reducing the risk for their patient.

4) Identifying PTSD and Opioid Misuse Crisis Situations among Veterans using Social Media Analysis

Among people suffering from post-traumatic stress disorder (PTSD), veterans form a significant population. The US Department of Veterans Affairs indicates that around 12% to 30% of veterans have PTSD resulting from combat trauma. A PTSD diagnosis significantly increases the propensity to engage high-risk behaviors, including alcohol/substance abuse, impulsivity, and aggression

We plan to analyze platforms like Reddit and Twitter that contain a rich body of text related to PTSD. The project is innovative in that we will use a grounded theory model from social sciences, where the collected social media data will be analyzed and coded into hierarchical emotional categories. The coded categories with the text corpus will then be used to train advanced natural language processing and machine learning models. Once the models are trained, they can then be used to predict emotional categories and potential crisis events for veterans based on a new social media post or a text message.

Social Media and Text Analytics

5) A Stock Prediction System using Social Media Analysis

Investors, traders, stock analysts, pension fund managers and hedge fund managers watch the movement of the stock prices, and their goal is to be able to buy the stocks at lower prices and sell at a higher price. The stock prices move based on numerous factors such as company earnings, specific company news such as launch of a new product, general market conditions and other factors. Predicting the stock price movement in general is a huge challenge.

People on social media talk about different companies and stock price movements. Popular social media sites such as Twitter.com and Stocktwits.com facilitate discussion of stocks. At a given point in time, people post whether they are bullish or bearish on a stock. In this project, we aim to build a model to predict the sentiment of a stock based on such discussions. The model will use predictive analytics and will also consider, how successful a person was in the past, influence of the person, and other such factors. The model also uses sentiment analysis for the context of financial media, as general sentiment analyzers may not work well for financial data.

6) Predicting Eat-out Preferences using Social Media Analysis

People post about their experiences regarding different places they eat, if they liked a restaurant or a food. They may also like some tweets of different restaurants, friends of theirs eating at a different restaurant, or specify their generic taste in the food in their social media posts. The goal of the project is to explore the question: “Can we analyze the different social media posts and recommend to users the restaurants that they might like? “

7) Humor Detection using Text Analytics

Humor is an important aspect in human communication, incorporating humor in conversations improves social connectivity and the activity of those using it. There is increasing volume of humorous texts growing in social media, identifying and understanding humor in those texts can help in understanding the user mood prediction and sentiment analysis on social media. Humor can be in either verbal or non-verbal form. Because of the growing use of computers for communication and work, studies on computational humor have taken importance. Computational humor is more concerned with automatic recognition, understanding and generation of humor. Sometimes it is difficult even for the human beings to understand humor because it varies with the cultural context and different people make different understanding of the same sentence. Thus, automatically recognizing humor in a text is a challenge. Today, in many applications chatbots are evolving as conversational interfaces. If a computer can identify humor when it converses with a human being and understand human intentions, it can improve the human-machine interactions and improve the customer experience.

Data Science/Machine Learning:

8) Identifying Building Accessibility using Image Classification

The Americans with Disabilities Act (ADA) is a civil rights law that was signed into law in 1992 by President George H.W. Bush. The law requires wheelchair access be made available for buildings built after 1992. Buildings under the law include retail stores, hotels, banks and most other public buildings. However, there are large number of buildings built before 1992 that are not wheelchair accessible. In addition, ADA does not require the location of ramp to be at the front of the building. This is an inconvenience for folks in wheelchair to access a building, as they may have to use wheelchair all the way from the front to back of the building where the ramp may be located. Hence, in this project, we propose to build an artificial intelligent system, which takes as input a building image, and gives as output, if the building has a ramp. The proposed system uses a deep learning technique, convolution neural network (CNN) to classify building images.

We did not find any automatic system or a mobile application, which takes a building image and classifies it for accessibility. We have begun the process of collecting training dataset for our machine-learning model. The dataset consists of building images around Milwaukee area. We also plan to collect a number of building images from google maps. It is important to have a diverse set of images such as coffee shops, libraries, restaurants, university buildings, etc. to build a generic system. We will then evaluate the accuracy of our model.

Past Projects

1) Personalizing Places of Interest using Social Media Analysis

We propose that social media can be used to help an application tailor its results for a travel destination for the needs of an individual. People post the activities they enjoy and the places they frequent on social media websites such as Twitter and this data can be mined from a user’s tweets to give our application a better idea of the results the user would want to see most. We propose a ranking algorithm that uses (i) the user preferences implicitly generated from users past tweet history and (ii) tweet content relevant to a place of interest. We propose a prototype application that takes a city name as input and produces as output a ranked list of places of interest using the ranking algorithm.

2) Mobile Interruption Management System

Mobile devices are increasing in an astronomical rate throughout the world. While it is bringing a lot of comfort to the users, it introduces new kinds of challenges. A user is susceptible to mobile call interruptions wherever he is, whether he is in the middle of a very important discussion or in a very important task like performing an emergency operation in a hospital. As a result, researchers have been studying to find ways to minimize cost of mobile interruptions. In this paper, we propose a mobile interruption management system in which callers have been grouped, and time intervals of a day have been classified to ascertain whether a call should be allowed to ring, go to silent or vibrate. We have also included presence of Bluetooth devices and applications the mobile user is using to decide if the user needs to be interrupted.

We proposed a model for calculating predicted level of interruption, based on the contexts the user is in and then this cost is compared with a threshold value. If the cost of receiving the call is less than the threshold value, then device sound profile is set to ring, otherwise it is set to silent or vibrate. We have also evaluated our model with existing models and found that the system performs well.