About

Data science professional with 7+ years of experience. I am currently in financial sector driving sales and marketing analytics. I have supervised a team on building products with focus on alternative datasets using latest technologies. Also I have helped teams to pivot from legacy systems though AI and automations. Apart from this I have experience in US healthcare domain as well. I have proven academic records and demonstrated analytical capabilities on competitive platforms. I belong to Muzaffarpur, Bihar in India

Personal Details

  • Birthday: xx - xx - 1994
  • Website: sidharthiimc.github.io
  • Martial Status: Family Man
  • Current Location: Hyderabad, India
  • Age: 30
  • Degree: Master
  • Email: sidharth.kr@hotmail.com
  • Freelance/For Hire: Available

I have experienced living in Hyderabad, Bangalore, Delhi, Noida, Kolkata as well. Watching general studies lectures on YouTube is my favourite free time activity. I like travelling a lot. I have trekked up to 5000m above sea level. I also like to play cricket and badminton. Tea is my favourite beverage.

Facts

I am into Data Science from mid 2016. I have not only learnt but also guided several folks. Many of them have reached even better position than I am today :) . Key highlights in the professional experience -

Years of Experience

Projects delivered

Mentored Students and Colleagues

Followers and growing

Skills

Most of my skill come from my Post Graduation in Business Analytics, Corporate Work experience and Solving challenges on Kaggle. I still visit kaggle to checkout latest development in Data Science and Machine Learning.

Python 99%
R 75%
SQL 90%
Flask - Web App and API 65%
Office (Excel, Powerpoint) 80%
Data Science 90%
Machine Learning - Supervised95%
Machine Learning - Unsupervised95%
Deep Learning 70%
Image and Video Processing 65%
Natural Langugae Processing(NLP) 90%
Generative AI70%

Resume

Summary

Sidharth Kumar

Data Science Professional with 7+ years of industry experience. Currently working in Financial sector with focus on sales and marketing. Also has experience in Market Research (alternative data) and US healthcare domain. Proven academic records and demonstrated analytical capabilities on competitive platform.

  • Based in Hyderabad, India
  • (123) 456-7891
  • sidharth.kr@hotmail.com

Education

Post Graduate Diploma in Business Analytics ( PGDBA )

2016 - 2018

IIM Calcutta  -  ISI Kolkata  -  IIT Kharagpur

Specialization: Data Science, Machine Learning, Business Analytics

Key Courses: Regression, Algorithms Design, Machine learning, Information Retrieval, Healthcare and Product Analytics, Stochastic processes, Statistical Data structures, Inference, Computing for data sciences, Database Systems, Categorical data analysis, Data Mining,Project management, Econometrics, FRM, Computational Finance

CGPA: 8.74

Bachelor of Technology (B.Tech)

2011 - 2015

National Institute of Technology (NIT), Patna

Specialization: Mechanical Engineering

CGPA: 8.80

Class XI- XII

2009 - 2011

Central Board of Secondary Education (C.B.S.E)

Score: 91%

Class X

2008 - 2009

Central Board of Secondary Education (C.B.S.E)

Score: 90.5%

Professional Experience

Data Scientist

Jan 2024 - Present

Franklin Templeton, Hyderabad, India

  • Development of behavioral personas for US financial advisors based on marketing signals
  • Tech stacks exposure - Python, Snowflake, DataBricks, Power BI

Senior Data Scientist

July 2019 - Dec 2023

GlobalData Plc, Hyderabad, India

  • Development of AI based applications using GPT and fine tuning LLMS
  • Development and implemented several predictive anayltics solutions
  • Working on NLP technology stack for alternative datasets like Twitter, Jobs, Company Filings, News, Reddit
  • Contributing to business development by providing innovative solutions
  • Developed several UI and Dashboards for sales and marketing
  • Tech stacks exposure - Python, Transformers, Google-Big Query, Data Studio, Plotly Dash and Flask framework

Data Scientist

April 2018 - June 2019

UnitedHelath Group, Noida, India

  • Provided solution to US-healthcare in Medicare and Marketing business.
  • Deleloped algorithms for NPS star ratings, CAHPS, Retention Programs
  • Analysed Large dataset to find key insights
  • Tech stacks exposure: Python, R, Sci-kit learn ML models, Bigdata- Hive

Internship - Business Analytics

October 2017 - April 2018

UnitedHelath Group, Noida, India

  • Provided solution to US-healthcare in Provider and Digital business.
  • Deleloped algorithms for digital revolution
  • Automated data extraction , transformation and loading Process
  • Tech stacks exposure: Python, Bigdata- Hive and PySpark

Software Engineering Associate

January 2016 - May 2016

Accenture, Bangalore & Hyderabad, India

  • Worked for US-based software client to make web applications accessibility compliant
  • Prepared test cases & scripts, ran test scripts, and logged bugs



Download PDF version of my resume from here: Resume

Projects

I have worked on variety of project throughout my career. I have projects on core data science to data analytics and even data engineering. Majority of my projects are on Tabular data and Natural Language Processing. List of project and roles at various organisations and institutes -

  • All
  • Franklin Tempelton
  • GlobalData
  • UnitedHealth Group
  • Academics
  • Kaggle

OPTUM - Data Science Challenge

  • 1st Runner up world-wide (1st in India) for wining data science challenge among employees
  • Built a PCP recommendation model that can reduce cost by choosing right PCP for members
  • Final Model was built using 10 variables with lots of advanced feature engineering technique
  • Achieved an AUC of 86.66 on public leader board using a single model with tuned parameters
  • Financial Advisors Segmentation

  • Developing segments for US finaical advisors based on their interation with various marketing signals
  • Nine robust clusters were identified using PySpark Kmeans
  • Cluster labels were generated using deep dive interpretation of each cluster by projecing various features and metrics
  • Power BI dashboard was built for sales executive usages and understanding of various segments
  • Multi-Task Private LLM

  • Developing consumer grade generative AI capabilities by fine-tuning LLM for various operational task like summarization, tagging, data extraction.
  • Ensuring private usage of LLMs by for data privacy and safety
  • Trend Pusle

  • Built a daily newsletter and dashboard for real time trend identification from social data along with alert mechanism.
  • Built a trend explanier using GPT and automated timeline peak identification
  • Achieved full automation of the process using mail chimp, elastic search, g-sheets, data studio, email APIs
  • Sense Hub

  • Developed a Question Answers and report writing bot using GPT automated agents
  • Built a semantic search engine for retrieval and used GPT for Q/A. Created multi-layered chat & semantic engine using agent bots to write quality reports using own dataset.
  • Added organisations existing analysis frameworks in the AI search to use it in Q/A response.
  • Patents Company Name Normalisation

  • Created an algorithm to map company name variants with an existing company naming system
  • Utilised text cleaning, patent features like family & inventor details and company features like headquarter, revenue and several user defined metrics for developing the matching algorithm
  • Algorithm was able to cover 95% of all the patent applicants. New variant database can be utilized in various other data sources like news, deals and social for faster information processing
  • Retention Driver Analysis

  • Worked on finding key drivers influencing member retention during annual enrolment period
  • Built logistic, random forest and gradient boosting model for identifying key driving factors
  • Utilising concept of shapley value and partial dependent plots for quantifying the drivers impact
  • Consumer Review Insights

  • Created an automated cluster and topic generation for driving insights from drug reviews data.
  • Used Hugging Face Transformer for vectorising and density based hierarchical clustering.
  • For topic generation used spacy with zero-shot similarity metric
  • Power Point Automation

  • Using Python-pptx libarary, automated 5000+ ppts.
  • This helped in JIT PPT geneartion which helped in cost and space reduction
  • Stock Market Movement Prediction using Social Sentiments

  • Modelled the social sentiments on stock price to predict the direction for the next trading session
  • Extracted features using Information Retrieval based techniques from the top 25 daily news data
  • Achieved an accuracy of 59.52% on test set using SVM and simulated a buy/sell trading strategy
  • PR Reports, POCs, Automations

  • Created a report on Vegan based on social data using text and keyword search on the corpus
  • Created a report & interactive dashboard on beers to show case capabilities of utilising blogs data
  • Automation of several legacy one pager reports using python to save manual time and efforts
  • CAHPS Project

  • Worked for enhancing NPS & Medicare Plan Star ratings by utilising the member survey data
  • Analysed contract & plans star rating and likelihood to reach 4 star rating using logistic classifier
  • Predicted members who are likely to provide low scores in survey using random forest model
  • Retrieval based Chat Bots

  • Developed an algorithm for retrieval based chat bot on public available Ubuntu Dialogue Corpus
  • Applied methods that include Cosine Model-TF-IDF using n-grams, Dual Encoder LSTM (RNN)
  • Achieved a recall measure of 77% using the Cosine model and a recall of 88% using LSTM(RNN)
  • Analytical Dashboards

  • Verdict Media websites traffic monitoring for sales team using Google BigQuery and Data Studio
  • Dashboard using social indicators to analyse the effect of Covid-19 on companies and sectors
  • Competitive Intelligence Dashboard for company performance benchmarking using Plotly dash
  • Created a UI for Analytics Hub dashboard for analysis & metric generation from alternative data
  • Member Tenure Prediction

  • Worked for personalising prospect acquisition, member engagement and retention initiatives
  • Worked on random forest survival model for predicting member churn and its tenure with UHC
  • Performed Customer Life Time Value analysis using the predicted tenure for marketing strategy
  • Disruptor Innovation Identification

  • Created an ML model to classify an article into innovation or not.
  • By fine tuning BERT embedding, an accuracy of 93% was achieved.
  • Also built a second stage model to summarise the article by finding benefits and challenges in concerned innovation
  • Robust Regression using Campbell's Method

  • Assigned proper weights to observations of many public datasets by iterative Campbell's method
  • Found the outliers using the weights and implemented weighted regression for prediction in R
  • Analysed the properties of the Campbell’s method of estimation using Monte Carlo simulations
  • Internship - Provider Digital Analytics

  • Worked for Provider side of the business to improve experience and reduce cost by digitisation
  • Project involved working on Big Data platforms like Hive & Spark for data collection & analysis
  • Created a complete automation of process from data ingestion to data transformation and update
  • Received Bravos 2 times from the Product Owner & the Manager for the collaborative work done
  • Received PPO to serve as full time data scientist & graded excellent in the Internship evaluation
  • Influencer Recognition

  • Created a filter identifying individuals using the profile images from twitter for identifying the social media influencers.
  • Model was built using Convolutional Neural Network (CNN) with an accuracy of 95% and reducing manual effort by more than 90%
  • Sentiment Model

  • Built an ML model for measuring the sentiments of tweets using the internal and public data.
  • Gathered 3 million training data from various open sources like universities & data platforms.
  • Utilised the word2vec approach and deep neural net for training the sentiments
  • Sberbank Russian Housing Market
    Price Prediction of housings in Russian market, used extensive data cleaning and applied an ensemble learning approach for modelling the price

    Booz Allen Hamilton’s Data Science Bowl
    Lung cancer prediction from the medical scans, used Xgboost and Neural Network for the purpose of prediction to save life and to reduce cost

    Quora Question Pairs
    Modelling Semantic Similarity of question asked on Quora, used LSTM and Xgboost along with feature engineering to find similar questions in the test set

    Sea Lion Population Count by NOAA Fisheries
    Used CNN for counting sea lions in multiple categories in drone images

    Fraud transaction detection identification by IEEE
    Analysed data and identified fraud transaction using predictive modelling

    Z-estimate for Zilliow
    Predicted US home price by modelling varibale to upgrade the z-estimate

    Kaggle kernel only competition Instant Gratification
    Worked on artificial dataset to decode the pattern using advace algorithms

    Certifications

    Data Science Professional Certificate

    Issued by Coursera, Authorized by IBM

    Investment Managemnet with Python and Machine Learning Specialization

    Issued by Coursera, Authorized by EDHEC Business School

    Exceutive Data Science Specialization

    Issued by Coursera, Authorized by Johns Hopkins University

    Google Project Management Certificate

    Issued by Coursera, Authorized by Google

    Machine learning for ALL

    Issued by Coursera, Authorized by University of London

    Neural Networks and Deep learning

    Issued by Coursera, Authorized by Deeplearning.AI

    Awards, Recognition & Competitions won

    Star Performer Award - 2021

    Awarded in Business Fundamentals and Alternative Datasets at GlobalData

    Employee Award for Q4 2021

    GlobalData Employee of the Quarter Award

    1st Runner up

    Optum Data Science Challenge

    Top 1% data scientist

    Highest Rank achieved 510 among 150k world-wide on Kaggle

    Team Excellence Award

    Recognition for outstanding work at Internship, UnitedHealth Group

    Best performer

    Recognition for high performing resource by Manager, Accenture

    Rank 7 - Deep Learning

    Skill Test by Analytics Vidya

    Silver: Ranked 56 / 3274 teams

    Russian Housing Market Price Prediction. Organized by Sberbank Russia & Kaggle

    Bronze: Ranked 86 / 385 teams

    Estimated Sea Lion Population. Organized by NOAA Fisheries & Kaggle

    Bronze: Ranked 328 / 6381 participants

    Fraud transaction detection. Organized by IEEE & Kaggle

    Silver: Ranked 88 / 1832 teams

    Synchronous kernel only competition Instant Gratification organized by Kaggle

    Bronze: Ranked 240 / 3775 teams

    US home sales price prediction(Z-estimate). Organized by Zilliow & Kaggle

    Bronze: Ranked 190 /1972 teams

    Lung cancer prediction in Data Science Bowl. Organized by Booz Allen Hamilton & Kaggle

    Contact

    Location:

    Hyderabad, India


    Call:

    123456789


    Sidharth's News

    For the latest happenings follow me on twitter. Latest feed -