For our first Lead Stories post, we spoke to Chia Hooi – an aspiring data scientist with an MBA major in Finance and a student of Lead’s data science boot camp in Malaysia.
Here Chia Hooi tells us how he first started with data science and how he came about his Capstone project.
1. Tell us about yourself.
I used to work for an insurance company, spending some time in Bangkok and Jakarta before I left to pursue my Masters full-time. I graduated late last year, and soon after, started attending several coding courses.
Up to date, I’m currently learning data science with LEAD.
2. What do you do for fun? Tell us something interesting about you!
I like watching movies. On average I watch over 100 movies a year mostly in cinemas! (I think I mentioned this on the first day of the boot camp during the ice-breaking session).
3. How did you first get started with data science? Why did you decide to pursue it?
In my previous job, I used to work with lots of insurance claims data. Preparing statistics dashboards for claims and analyzing claims patterns were all done using Excel.
It was quite mundane and so I’ve always wanted to learn how to automate the whole process. Perhaps developing some algorithms or models to predict claims trends – but I wasn’t sure how.
Then later, when I was doing my Finance Masters, I found out that I needed to analyze huge sets of financial data to construct stock portfolios, calculate portfolio risk (standard deviation) and so on.
That’s when I realized Excel was no longer enough. At the same time, I learned about algorithmic trading from my professor and begin to realize how powerful computer programs can be.
Attending Selangor Smart City Hackathon
It also helped that data science and machine learning became very popular over the last few years. I did some research and realized that it was what I need to learn. It also seemed like a viable career path, so I decided to pursue it.
Hopefully one day, I can apply what I learned in data science and machine learning to the Finance industry.
4. You managed to come out with a Capstone project in just days after the data science boot camp. We’re very impressed! Tell us a bit about that project, your findings and your aspirations behind it.
It was inspired by Facebook – the Cambridge Analytica incident and how the Russians allegedly used social media to influence the 2016 US Election. So I thought it would be interesting if I could do some social media sentiment analysis, given our upcoming election at that time.
At first, I wanted to analyze the social media sentiment for major political figures and study the correlation with the election result to see if it is possible to use social media to predict the election result. But I wasn’t sure how to do this until I attended Day 3 & 4 of the Data Science Bootcamp. I quickly decided to do this as my capstone project.
Since we didn’t have much time for the capstone project during the Bootcamp, I thought I would just do a simple comparison of the Facebook comments sentiment of Tun M and Najib and see what the results show.
The project was quite simple – using mostly web scraping, NLP and sentiment analysis techniques we learned in class. On top of that, I added word cloud visualization that I learned on my own, to make the results more visually-appealing.
You can find the major findings below:
One of the challenges I faced in the project was to filter out meaningless Malay stopwords like “dah”, “yg”, “utk”, etc.
I had to build custom Malay stopwords for filtering purpose, as most comments on Tun M and Najib FB Pages are in the Malay language.
The results are, of course, not 100% accurate, but it does provide a good indication of the sentiments that Facebook users have on Tun M and Najib. And true enough, the election results clearly reflect that sentiment.
5. How has LEAD helped launch your learning and professional career?
My data science learning path started when I first joined the Intro to Data Science class by Dr. Lau in January, followed by Edmund’s Python class in March, then finally the Data Science Bootcamp in May. It is safe to say I learned most of my data science skills from LEAD and I found them very useful and beneficial.
The support from Slack group is great, as we can get help whenever we face problems or just to get advice for our projects.
There are many online data science courses but only LEAD provides this kind of real-life support even after the bootcamp. There’s also a group of classmates/community to keep in touch with.
6. What are some of your favorite tools or softwares?
For data science-related projects, I use Jupyter Notebook on Anaconda. I think that’s arguably the best tool to work with.
I use Sublime as my default text editor, but I’m beginning to like VS Code.
7. Can you share with us your workflow or train of thoughts while starting the capstone project? E.g. the business questions and motivations?
My capstone project isn’t directly related to any business questions. But I do think that for any business-related project, the most important thing is to understand the problem at hand. What is it that you are trying to solve? What impact will your solution bring? And how do you measure success?
While it is nice to play with fancy algorithms, ultimately data science and machine learning (or any technology for that matter) should be used to solve real-world problems for the betterment of the society in my opinion.
At Startup Weekend with my project – “MyHobbi”
8. What is next for your in terms of building your portfolio and career?
In terms of career, I hope I can launch a career related to data science very soon – be it a permanent or freelancing role.
For portfolio-wise, I will try to focus more on data visualization and also machine learning projects where I can apply machine learning to financial data, e.g. predicting stock price, building algorithm to automate trading, etc.
Did you enjoy this edition of Lead Stories?
Leave a comment below and let us know your thoughts. In the next Lead Stories, we’ll interview another student from an E-Commerce startup.