How Big MNC’s Manages and Manipulate bulk amount of data with High end speed and efficiency?

Raj Kumar Vishwakarma
4 min readSep 17, 2020

--

Did you ever think that how the top MNC’s like Google ,Microsoft , Amazon, Facebook, Apple etc. Manages and Manipulate the thousand of terabytes of data received/generate per day with high end speed and efficiency?

In this era of social world where the number of users are drastically increasing day by day and data surfing over the internet are increasing per bit of seconds so how companies are handling those bulk of data with high speed and efficiency ,lets deep dive into it and try to find out the problem , causes and solution of it.

According some experts the data in the world is projected to grow 50 times in 2020 compared to the data in 2011.Now you may think that how data is drastically growing with period of time ,with this challenges to handling the data comes in .

Problem -Big Data?

Big Data is the term used to describe the collection of data that is huge in amount and yet growing.

  1. Problem of Volume:

Big data implies enormous volumes of data. It used to be employees created data. Now that data is generated by machines, networks and human interaction on systems like social media the volume of data to be analyzed is massive.

Lets understand with simple example ,suppose a company name XYZ have total capacity of storage is 1000 TB and lets suppose they have received the data for storage is 1100 TB now in this case it is not possible to store it .so they have to buy Extra storage every time When they required ,but there might be problem of too much costing and problem of speed arises as we are going to discuss it next.

2.Problem of Velocity:

Velocity is the measure of how fast the data is coming in.Let’s understand with the previous example when we use a large storage to store this may leads I/O problems. Which means to perform read/write operation may takes time.

3.Problem of Variety:

Variety refers to the many sources and types of data both structured and unstructured. We used to store data from sources like spreadsheets and databases. Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc.

How MNC’s handling to this problem called Big Data?

Lets understand what google and facebook used to do to handle such a huge chunks of data;

Google

Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.Google doesn’t hold the biggest of data centers but still it handles a huge amount of data.

Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters.

The user requests are processed in Google’s application servers. The application server searches results in GFS (Google File System) and logs the search queries in logs cluster for quality testing. Google uses Dremel which is a query execution engine to run almost near real-time, ad-hoc queries from search engines. This kind of advantage is not present in MapReduce. Google launched BigQuery which runs queries based on aggregation over billions row tables in a matter of seconds. Google is really advanced in its implementation of big data technologies.

Facebook:

According to Facebook official release Facebook upload 500+ terabytes of data per day.To process such large chunks of data, Facebook uses Hive for parallel map-reduce opertions and Hadoop for its data storage.Facebook uses Hadoop cluster which is the largest in the world. Employees also use Cassandra which is fault-tolerant, distributed storage system aiming to manage large amount of structured data across variety of commodity servers. Facebook also uses Scuba to carry out real-time ad-hoc analysis on massive data sets. Hive is used to store large data in Oracle data warehouse. Prism is used to bring out and manage multiple namespaces instead of a single one managed by Hadoop.

Sources:

Mr.Vimal Daga

--

--

No responses yet