Senior Research Scientist

Data Science Research

 

  • Currently working on the development of an interpretability system for large language models used in customer-facing Gen-AI applications and regulatory compliance models.

 

Data Science Modeler

Generative AI Team

 

  • Designed the data map to pull the DCC Root Cause flags/notes from the Snowflake environment and performed exploratory data analysis (EDA) to understand the distribution of the various root causes.
  • Replicated the UDAAP regulation compliance model using the GPT2 frameworks to detect agents’ adherence to regulations during customer interactions.
  • Supported the development of the No Contact model, leveraging the LLaMa framework.

 

Anti-Money Laundering (AML) Team

 

  • Developed a hybrid machine learning model – NLP (RoBERTa) and Non-NLP (XGBoost) for fraud detection and bias control.
  • Automated the data preparation workflow in SAS, optimizing the ETL in Snowflake and improving efficiency.
  • Validated the text preprocessing pipeline using RoBERTa and implemented bias control mechanisms, which includes the removal of sensitive business domain information from texts to mitigate model bias.
  • Implemented Principal Component Analysis (PCA) to transform unbalanced numeric data samples and utilized SMOTE to balance oversampled data representation successfully, and Information value to weight the features.