Mohammed Sameeruddin
Sameer

Sameer

#1 - Getting started with Data Analytics

Clear idea to get started

#1 - Getting started with Data Analytics
Mohammed Sameeruddin

Published on Mar 21, 2021

2 min read

Subscribe to my newsletter and never miss my upcoming articles

Listen to this article

Table of contents

  • What is Data?
  • Data Analytics - Meaning
  • Tools / Languages
  • Packages
  • End

What is Data?

Data is the knowledge gained from a factual basis. It can be related to an object or a person. An explanation obtained from processing the data is called information. Thus, data and information are two different things.

  • Data → Facts and Figures
  • Information → Processed data which is understood better

Credits of Cover Image - Photo by Hunter Harritt on Unsplash

Data Analytics - Meaning

From the above two, we can get clarity of the term " Data Analysis ". Data Analysis is a field of Statistics, Mathematics, and Computer Science combined together in processing the raw data to produce insightful or valuable information. You might have this question - Statistics and Mathematics are fine but why Computer Science? The knowledge of programming helps in different ways in analyzing data. Some of which are -

  • Process Automation
  • Handling Large Datasets
  • Querying Databases
  • Creating Models
  • Data Visualization
  • Dashboard Development

Tools / Languages

Of course, we cannot just analyze the given data with a piece of paper and pencil. We need to find one such platform to do all three - Stats, Math, and Programming.

Tools involved in Data Analysis -

  • Python
  • R
  • Julia
  • Matlab

Note - There are so many languages or tools available. But here, I talk about Python. If you want to know the list then do refer to this article.

Packages

To get started in the field of data, learning Python would benefit in many ways. Python has a wide variety of packages that have been developed over the years. From data collection to data modeling, Python has everything set for you.

List of Packages or libraries:

  • NumPy
  • Pandas
  • Statsmodels
  • Matplotlib
  • OpenCV
  • Scikit-Learn
  • Pytorch
  • Tensorflow
  • Plotly
  • Py-Spark, etc

These packages have been extensively used for data-related problems. There is no requirement to learn all the packages as long as one is curious enough to understand the problem and implement the method. But the deeper one goes the deeper knowledge of using these are a must.

End

Well, that's all for now. This article is included in the series Exploratory Data Analysis, where I share tips and tutorials helpful to get started. We will learn step-by-step how to explore and analyze the data. As a pre-condition, knowing the fundamentals of programming would be helpful. The link to this series can be found here.