Reconfigure your journey to Data Science

Aditya Raman
bellatrixdata
Published in
7 min readMay 26, 2020

--

It was my utmost desire to become a Data Scientist. I have good programming skills in python and worked on most of its libraries. I worked on many freelancing, open-source and personal projects, but then it was not necessary to use them and it is not a good programming skill to complicate the simple logics and programs. Although the projects are the most effective way of learning, still you will be unaware whether you will be implementing the specific skills in the project or not. That’s why I am writing here to let you know what is required to gain this skill.

The reason(s) why I (because you are smart) was not able to learn efficiently was/were (you might be able to relate a few with yourself)

  • StackOverflow and GitHub helped me in the crucial times and I never bothered to know the things and their working.
  • a lot of materials on the web and probably I was not smart enough to choose the better one for me.
  • always got into technical glitches, and were not able to execute further statements or piece of code.
  • video lectures from resource persons were quite expensive most of the times.
  • I was too busy gathering certificates and I never focussed on learning.
  • documents are too large to refer. (Although this is the best place to start and continue)

I found I was trying to fake my self and my knowledge. Wait, what? Do you care about me? No certainly not. I wish I would have known your opinion a little early in my life.

The time you will get this, You will find someone, who will think you are wrong.

How did I Start Learning?

While I was too curious to learn but quite sad and depressed because I can’t. My learning process never started with a specific book, blog, or any teacher. The best thing that I could do for myself then was to list out all the important websites, books, lectures and blogs on a piece of paper. It was around 3 pages of my notebook and I failed again. I searched every topic over the internet but I never got the best content at one place.
By this time, I was able to talk about data science with any person and could easily explain the basics.

Every time you fail, you gain something but it’s time consuming and you should not afford to lose Time.

If you faced similar problems while learning, I would recommend you to make a list of these points and start filling it analyzing yourselves. Now let me briefly tell you how I started (maybe you find it useful).

  • I listed out my strengths and weaknesses. These are a few important techniques that you should try to incorporate within yourself.
    Multi-tasking, you should be frequent enough to jump from maths logic to computer programming and then from data to logic.
    Elaborative Rehearsal, try to learn by referring stuff instead of memorizing.
    — Believe in Reading books and writing down important information on the paper or notebook instead of using the digital notebook.
    — Don’t try to learn everything in a day. Because you should understand, instead of just doing. Even though you are getting everything there should be a break.
    Debugging on the notebook, We should try to use paper and pen for debugging it will give you another insight into the code.
    Pareto Principle, this rule suggests that 20 % of your activities will account for 80 % of your results.
  • Learn as you go — You can always refer to the StackOverflow, GitHub or any book when you get stuck at any topic so put a comma and start working on the project. But when you require them, read them thoroughly.
  • Mathematics — I liked maths from the early school days, but if you didn’t like it then you should try to learn a few of the important topics, Linear Algebra, Calculus, Statistics and Probability and Geometrical Figures. Check out this medium post if you want to refresh your knowledge.
  • Python — You should at least know Data-types, Data-structures, Functions, Class, File-handling, Connecting the databases with Python and Using Web Services with Python. If you want to learn or recapitulate you can check out this site https://py4e.com, I admire Dr Charles Severance.
  • Database — We need a database to store information, retrieve and query whenever required. I learned SQLite and MySQL, this is quite easy to use and implement in python.
  • Data Structures and Algorithms — This is quite important, you should know about each of the data structures and frequently used algorithms. You can practice on LeetCode. You can also refer to my repo on GitHub, https://github.com/ramanaditya/data-structure-and-algorithms
  • Git and GitHub — Learn about Git and GitHub. I created my private repository to work on because I didn’t want the world to know, how bad I was with the code. If you are shy with the mistakes try using private repository.
  • REST API — REST API is really useful and quite effective when you will be dealing with the data that needs to be gathered from different sources. Try learning about JSON.
  • Learn less but every day — You don’t need to learn all at once learn less but effective and most importantly don’t let your day pass without a commit. keep your GitHub green.

People generally say build more projects rather than working on Data Structures, but I believe that will only force you to look up StackOverflow more frequently and moreover you won’t be able to write simplified queries and logics.

Don’t move ahead until you are confident about all of the topics. Be slow but effective, refer to the Pareto Principle.
This was indeed a good plan, but I had a question when to know I should move ahead and get into actual Data Science. The answer is really simple, answer these questions, if you can answer them all correctly then YES you can move ahead.

  1. Data Structures
    — Can you solve at least 15–20 questions on Leetcode related to the dictionary, list, search, sort, linked-list, binary tree, graphs?
  2. Python and Databases
    — Can you define Class and inherit another class?
    — Are you able to open a file, read the file and write to that file?
    — Can you build a simple project where you have to scrap any web page and store the information in the database?
    — Are you able to apply CRUD on the database?
  3. REST API
    — Are you able to read the response using GitHub API?
    — Can you fetch my details using GitHub API?
  4. Mathematics
    — Are you able to solve problems on Matrices and please make a note of their properties?
    — Can you find, mean, median, mode, standard deviation, variance from the data?
    — Are you comfortable with Bayes’s theorem and can you find probabilities?
    — How comfortable are you with Bayesian networks and Markov networks?
    — Do you know about the properties of geometrical figures like lines, triangles, circle etc with their equations?
  5. Git and GitHub
    — Are you able to make PR and make changes in the repository from your local system?
    — Can you merge conflicts?
    — How comfortable are you with the Git Flow Model?

Remember, you can always go ahead, without this, but it’s you who need to learn. It’s you and everything is about you. Even if you moved ahead, it’s never too late to be on your track.

What’s Next?

If you have come this far, you are already better than most of the people and I can guarantee, you will find it quite easy to understand and you will be visualizing the data.

Data Science is all about visualizing the data. Once you know them they will reveal everything.

I will be writing a series of blogs related to data science covering all the important concepts along with the code snippets and all the important aspects related to the topic. Everything will be available for you under MIT License which can be further reused and reciprocated even without adding me in your credit. Next blog will be on Data Manipulation, using pandas and NumPy.

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the python programming language.

NumPy is the fundamental package for scientific computing with Python. It contains among other things:
— a powerful N-dimensional array object
— sophisticated (broadcasting) functions
— tools for integrating C/C++ and Fortran code
— useful linear algebra, Fourier transform, and random number capabilities

To eradicate the barrier to learn we have set up an open-source community without any fees — everything is FREE, where we open projects for the people and you get guided by the Subject Experts, we believe in showing the way rather than spoon-feeding. You will incorporate the following skills by joining the community

  • Team Work and networking
  • Communicational Skills
  • Topic-wise expertise
  • Project report instead of certificates
  • Enhance your developer’s profile and many more

Join the community at Discord, Slack, Telegram, Twitter, Instagram, LinkedIn

About the Author

Aditya Raman

I am Aditya Raman, Software Engineer and Developer. Also, I am Microsft Student Partner, CA at bitgrit Inc, and Open Source Lead at GirlScript Bangalore and managing team at the uplift project.
I have mentored in several projects under GSSoC ’20, EOP-2 by GirlScript.

I am working on several live projects which extensively use Data Science.
To contribute to the projects, or if you want me to collaborate in your project, or if you want to get connected with me, please follow the links

--

--

Aditya Raman
bellatrixdata

Back End Developer | Software Engineer | DevOps Engineer | Data Science | Microsoft Learn Student Ambassador | Mentor MLH | Full Stack Developer