Back to Blog
June 21, 2018
Suryanarayana Murthy
en

Business Analytics And Data Science Webinar

Data Science
Business Analytics And Data Science Webinar

We shall talk about Business Analytics here. We will try to spend a little time on each of these slides and make sure that you understand every bit of it. We can pause the training in case there is any confusion. Nowadays, we usually hear a lot about buzzwords like data mining, machine learning, artificial intelligence, robot, etc. I have seen many companies talking about robotic process automation too. So, where do they come from? The statistical analysis of robotic process automation is one of the sidelines or facets on their website, which they boast of, and other stuff. Let's not really care about what those different concepts are at this point in time. Let's first concentrate on what exactly exists in the market (also, consider checking out this perfect parcel of information for a data science degree).

 

Skills Required for Data Science Domain

 

Let's understand what exactly exists in the market and what are the different areas of analytics, data visualization and data that exists in each of those businesses, and see what is the importance of them.

Let me give a small example; I worked in a very big retail organization called Target. I believe Target is the third or fourth-largest retail organization in the USA and they have recently commenced business in Canada with a company called Zellers. They acquired Zellers. I shall try to give you my real-time perspective experience on what is analytics and this definitely covers a lot of ground that I'm talking about as far as the slide is concerned. Now I'm talking about Target which is the third or fourth biggest retail organization, they have a humongous amount of data.

Now if you take the example of some 35th or 36th opposition of a retail organization or a healthcare company like Sanofi for example, which is a pharmaceutical company. Now Sanofi is probably standing at the 50th position or maybe at 40th position, they do have a humongous amount of information. I was quite surprised when I met some of these business teams at Target and asked if we could do some analytics, data discovery and manage some data. They were quite surprised to know that they have data that is also being spoken about their own products or about their own business processes somewhere in Canada or somewhere in Jerusalem. They don't even know that their data exists there, which means that if there is a Target-owned private brand like Archer Farms. This brand is spoken of in India and Target never did any business in India, so it was quite surprising for them to know that their data exists all over the world.

Nowadays social media is so disruptive that almost every single human being is aware of it. It wouldn't be surprising the factor if I say that maybe after ten years from now, even people at the age of 90 or 95, who are on the verge of completing a century, will be available on social media. So given the data and social media are so ripped and disruptive, it's quite important for us to really see the concept of data discovery and management. This means how important data plays an important role in the entire gamut of the business process. We have not even reached the concept of analytics yet, we are just talking about data. So it is quite important for us to know how to acquire data which is called data acquisition and how to clean the data.

You might have actually read in many books or journals or articles or news that the majority of the stuff that we Data scientists drew about 75 to 80 percent of it, is data management and discovery. Now earlier it was just data management which was used to hit the internal databases and they were doing a lot of data management. But now that social media and data have become so disruptive. If you are talking about web 2.0 now it means that you have more data outside than what you have inside. So obviously, since that data is so disruptive, data management becomes very difficult and it's no more than 75% now, it is probably touching 90%, this is the amount of work that a data management consultant has to do.

Now you don't need to be an analytics consultant to be a part of the data science acumen or data science 360 degrees that I’m talking about. If you really have the skills to write codes and perform data discovery, data acquisition, data extraction, and management, I think you are the right fit for this particular kind of job (also consider checking out this career guide for data science jobs).

Learn more about Data Visualization – How Best To Do It?

Interactive Feedback Dashboard

 

We are talking about dashboards here where data is so disruptive, data is present everywhere. So obviously, it's quite important for us to really combine all the data that you have. This is because the data is loosely coupled which means people who are sitting in supply chain teams may not really understand what is really happening in the social media background.

In social media tweets, for example, if someone is talking bad about your product they don't know what is happening because they are from the supply chain team, they simply don't really care what is happening about their product. They only care whether the product is delivered as per the order. So it's quite important that we have all the teams connected together and this is the reason why we are talking about 360 degrees so that there's no there is no breakage, data leakage, etc.

Obviously, everyone has to be aligned with the business process of the organization and every single team of an organization is part of the entire game that I'm talking about. So we need to have that kind of dashboard that can connect all the dots in an organization. It should be interactive so that someone can quickly view what is happening there.

For example, if we talk about Amazon godown, a place where all the stuff is stored and it travels from that godown to another city if people really want to know what is happening, whether the truck is stuck somewhere or moving or there is an accident that took place, etc. So one has to be very interactive so that they can quickly take an intelligent decision which is also quite informative.

Real-Time Analytics

 

We are talking about real-time analytics now. When I say real-time analytics, someone tweeted somewhere, someone wrote a Facebook post stating that they just ordered a crib for their small baby just a couple of minutes back and they got an error message on their website stating that the website is down. This could happen probably for someone who is ordering a product somewhere in Namibia and they want that product to be delivered from the US.

Now people who are sitting in the US and the ones sitting in Namibia, there could be some time differences. There may not be such a real-time analytics algorithm which will quickly give insights to the backend team stating that there is some problem with the website and the customer is cribbing about it. This is what real-time analytics that we are talking about.

Stakeholder 360

 

Now we have the concept of stakeholder 360. Since everyone is connected, the data obviously should be flowing freely among all the teams of an organization. It's quite important that not only are we talking about data 360 as we said about data discovery and management as the first green box there, but we are also talking about supplier 360, vendor 360 and customer 360. This is because your customer could be anywhere in the world and he can talk good or bad about your product.

The data obviously is disruptive so you need to really be so proactive, reactive and interactive on a real-time basis so that you can understand what is really happening and the connectivity is quite important. As a Data Scientist I believe, it's a primary role for us to be really involved in that kind of analytics which is very important and consistent (Here's the perfect parcel of information to learn data science).

Consistent Insights

 

We are talking about consistent insights across organizations here. I will give you an example if there was a product that was losing its sales and it was reported by the sales team as they monitored the product. In Target, we set up a Center of Excellence and the idea behind it is they are the ones who monitor the real-time analytics and are responsible for gathering data across the organization and they then try to connect the different dots.

Now one of the teams in the business. discovered and monitored the product which is losing sales over a period of time and the other one, they were not really worried or because they didn't really receive the signals from that team. So if the insights are not really consistent across the organization, it's a problem. So how consistent do we make the inside set run across the organization which means you can identify who are your primary and secondary consumers? You should make sure that you deliver the data and insights to them in the best possible manner.

Data-Driven Decisions

 

Everything is data-driven. Earlier they were not taken a decision based on the data, they were just going through their gut feeling. They know that the data and sales are on a declining trend, so they said let’s do something else like start a new product because the old product is not deriving sales. It’s no more of that kind. We are not talking about data-driven informed decision-making, so everything is depending on data. What is really making you depend on the data? It is that which is churning the data to a maximum extent, so that's very important.

How Much Data is being Generated?

 

You know data never sleeps. As we all know that data is disruptive, it is extracted, derived, and generated so disruptively and actively. I have brought statistics that I was able to get from a website which continuously monitors this, it's called Domo. You can go back and check Wikipedia on Domo, what is the kind of business they do. They have a humongous amount of data that is generated and some of the statistical examples that you can see right here. Say, for example, LinkedIn is a professional network and almost every day you see your network is growing in size. I mean it's no more a professional network I would believe that people are also talking about their personal stuff and then the amount of information that is being generated almost every second is humongous.

There are new social media websites like Sify or Snapchat where people exchange digital content, which is one of those drivers which tells you how good is your business making. If there is someone who is sharing your photos like your business photos, product photos and then they mention a small text below and then there are many comments that are flowing that itself is an important data point for you to analyze whether your business is doing good or not. That's the reason why digital content websites like Snapchat, Instagram and Tumblr and many businesses are subscribing to all this. It’s not just people like you and me, but many businesses are also subscribing to it and they are trying to talk and discover about their products.

Twitter is one such example where many companies talk about their deals, coupons, etc. This great bit of information I would see where data scientists can extract and derive a lot of insights. So you name a website here and then, almost every minute of the day, every second I would believe the humongous amount of information is generated. Now there is one lacuna I still believe that we, data scientists over a period of time could definitely manipulate ourselves, can improve ourselves and the lacuna is the interactivity between different digital devices. So you have something which is generated in Snapchat, which may not talk with LinkedIn, but I know that Facebook and YouTube are integrated or maybe Twitter and YouTube are integrated, I don't know much about that.

But if you talk about Skype or Snapchat or Netflix which is typically streaming video content, that may not be related much to the weather channel. So these digital sources may not be really integrated among themselves and the day is not far, where we would see that all these digital devices would be connected with each other and that would definitely complete the concept of device 360. We didn't even complete device 360 and we started talking about IoT and all that stuff.

A machine starts generating data, and there is another machine which receives the data and interprets the data, and then does some kind of analytics which is more applied to healthcare. But I would believe that it may take some more time for this device 360 to be real to turn itself into a reality. I know there is something called machine-to-machine analytics which is coming and there are some companies in India which are heavily working on that but that will really take some more time and that itself will probably be complete analytics I would say at the end of the day.

Big Data

 

 

We shall talk about Big Data now. As I conduct a lot of webinars, there is one question that many people ask me i.e, how do you define Big Data? Is it just based on the volume of data that is being generated? Let me explain with the help of an example, I have a machine - a laptop which has 100 GB or maybe 1 TeraByte(TB) of space. Now for me, there is data which is more than 1 TeraByte, then it is Big Data for me. I would say that it is too big for my laptop to accommodate. But that's not what Big Data is all about.

Earlier I used to talk about only 4 V’s, now we are talking about an 8 V’s. So it’s not just the volume of data that is coming inside your organization which is flowing freely, but it is also value that you generate from the data. For example, there are too many tweets that are being discussed and none of the tweets is valuable to your organization. None of them is talking about your business processes, how important it is to transform your business process and stuff like that, so it may not be important for you at all. So if it is not valuable and some of the organizations have a benchmark stating that if the value of the data is less than 60% or 70%, this means the actual value is not 50% or lesser, it of no value and may not be Big Data at all. 

The concept that I’m talking about today is visualization. If you say that you can make probably some sense out of data, it may or may not be Big Data, there may be a variety that you might be talking about. If you remember I talked about a digital content. Earlier we were talking about the text being transferred between the different devices and now we are talking about transferring digital content. Probably after few years of time, there will be devices which would transform or transmit data into some kind of a signal which can be transmitted. So we are talking about signals as well and also about textual data, images, maps, blogs, binary large objects, videos that could be transformed and transmitted across the devices. So if there is a platform which can really manage the variety of information and process it in real time(remember the one we discussed in the first slide) it can really transform and realize to the business, that this kind of data exists, I believe that this is Big Data. So it need not be too much big so that your machine cannot really sustain it. The reason why I'm talking about this is we are not talking about cloud computing here. So the data is no more hosted on your laptop or in you on your local desktop. It's now being hosted on a cloud platform and it is the headache of companies like Microsoft or Amazon to manage that kind of humongous amount of data, so they have lots of complex data warehouses, hardware, software etc. to manage your data and data of numerous businesses. It’s a big cloud platform. 

We shall also talk about velocity now wherein we’ll see how quickly the data is generated in real time and how is the outlook that you have for today. So we are not only talking about today, we are also talking about yesterday and there is an opportunity for you to be proactive about what could happen next moment. For example, if there is an AC machine which was making some sound for the last one week and sufficient data is not outputted, when I say data I mean the throughput. For some reason, it started giving you more warmth than what it is supposed to give, then you can proactively think or probably predict that after a few weeks from now, this machine is going to die. That is just a gut feeling of yours. But how much time does it really take to die is something that probably your big data analytics can really bring in? 

We will talk about viscosity: does it really stick with you? This means that does it really give you the same value that you really intended that data to give you? That means there is some amount of data that you put some analytics on, but it didn't really give you the sufficient insights for your action. I will share one example, you know down the line after a couple of slides. But viscosity is all about whether the data can really give you sufficient actionable decision-making or not. 

About virality, whether your data can travel and transmit itself from one source to another destination and how quickly can it really go? So it's not just velocity like how real-time the data is generated internal to your database; but in real time, how far this particular data has reached from a particular source to a particular destination.

Types of Analytics

 

 

Two metrics: (i) What is the business value that each of this bucket will give you? and (ii) The difficulty. There are many other metrics, we would not be talking about it, but value and difficulty only. So you can see this 45° line, which is cutting the X axis and Y axis exactly at the center. You can see there are four different areas and we will actually call this as D2 and P2 mechanism, which means Descriptive, Diagnostic, Predictive and Prescriptive. You can find the same thing probably in a different visualization in Wikipedia or Google or what not, any other search engine, but Gartner is the first one which came up with this idea. How do I differentiate between all these as the questions are asked, what really happened and that's what is your descriptive analytics is all about.

Descriptive Analytics

 

Since you are from the data scientists team, your business comes and says that look I have a problem; my sales are decreasing. They ask you what happened because they have data but they don't know how to analyze it. So it is you who is going to drill down and see what really happened. So you may have some basic dashboard, some pivot charts and stuff like that because you would ask them some kind of a structured data and then you do some kind of basic statistical dashboarding and you would come up with some answers as to what really happened. For example, you can come up with a high-level analysis stating that maybe some stores in Texas have lost sales, that's all you can come up with. That's the basic descriptive analysis, you don't know what exactly in those stores in Texas are really undergoing through. When you don't know whether it is the problem with these sales customers, sales representatives or it is probably the store in Texas which is probably unreachable or probably is situated on a hill where people can't really walk and they can make a purchase or maybe that store is not available online. There could be any reason but what you did at a very high level is you just said that the store in Texas probably around that Bay Area is losing sales. That's all you were able to tell.

Diagnostic Analytics

 

The question: Why did it happen? This is all about diagnostic analysis because you're trying to do some diagnosis there. You double-click now because you know that there is a store which is present in some area which is losing sales. Now you say that why did that particular thing happen? This means you are doing some kind of a competitive thing here and there is another store in same Texas Bay Area which is not losing sales, but this one did. So you probably do some kind of competitive analysis as to what and why did or didn’t that happen, etc. There are some graphs which we will discuss further and help you to understand in a better way.

Predictive Analytics

 

The third one is predictive analytics, what will happen if I just leave it like this; which means that you are able to do enough groundwork in the first two analysis, which is the historical data analysis. If you are able to come up to a conclusion to state that what kind of analysis I can do in future or what will happen in future if I do not take care of what happened in the past. That's what predictive analytics is. That means you are able to identify that yes there were some customers who were dissatisfied in this particular store because the store manager was not good, he or she was, I hate to use the racist words but you know in the US you can't use the word black. And there could be some possibility, in fact, there are many videos that are being so viral these days on Facebook. Walmart is one example, where Walmart came into the picture because you know there are some stores where some of the some of the store managers use the words black and racist and you know sorry to use but some religions were also being used.

An unfortunate incident, but that was one of the reasons why sales were on a downtrend, people stopped going to those stores and stuff like that. So, it is important that we also, we not only look at what happened in the past but we also need to uncover what would happen in the future, if sufficient measures were not taken. So, if you don't know that so and so store did this kind of stuff, which should be stopped. Because, if you don't stop then that could impact the future sales, that's important. So, you need to have some kind of predictive, I just take an example to connect this, but there could be many dimensions to it.

Prescriptive Analytics

Related Articles