Making sense of statistics on Covid-19 cases
Since Covid-19 started, there have been many numbers or to be precise, statistics that have been thrown about or regularly published and referred to. This is fine so long as we know how to interpret them and use them wisely. But are we really doing that?
In my consultancy work, I have dealt with lots of information and statistics.
Collecting passenger traffic data, analysing their profiles and developing behavioral models across different modes of transport are the common ones. Investigating freight volumes or cargo throughput for import and export, is another regular major task.
Some of the recorded data and its subsequent analysis are vital in decision-making for many operating companies or businesses. Having raw data or a database without undertaking any form of data analysis is deemed useless.
As the American humorist Mark Twain wrote: “There are three kinds of lies; lies, damn lies and statistics.”
Covid-19 data
There are good and important reasons for data to be collected.
A key element in dealing with raw data is the method that leads to data analysis itself, which will then produce meaningful statistics, the results of which are crucial and vital in the drawing of conclusions and the subsequent decision-making process.
In the case of Covid-19 data, we are presented with the number of Covid-19 positive cases on a daily basis. We have to assume that the testing methods are correct and will take the positive cases at its face value.
The main question that we should be asking then is this: What is the total number of people that are being tested daily? Are these numbers recorded on a daily basis before a portion is classified as positive?
For instance, if there are 3,000 positive cases, what is the total number that tested negative? Is it 27,000 people, which means 30,000 people were tested in total?
Or is it 3,000 cases against 50,000 people or 3,000 positives from a total of 100,000 people? It is important to know the number of people tested and the breakdown between positive and negative cases.
Table A: A daily data set presented in a declining percentage format.
In Table A, without the second column (the number of people tested), the seriousness of the positive figures (third column) recorded over three days is misleading and should not be presented on its own.
Even though the daily positive number is increasing, the rate of infection is actually declining (fourth column, by percentage). The figures could also be presented in a ratio context, shown in the last column, which is also declining.
A total of 3,000 positive cases when measured against a testing size of 30,000 will produce an infected value of 10%, whereas over a testing size of 50,000, it is only 6%.
If 3,000 positive cases come from a 100,000 testing size, the value drops further to only 3%. Different implications could be drawn from such a statistical analysis.
Increasing or declining?
The daily figures, in this sample of 3,000 to 5,000, have to be measured against the total number of people tested, whatever number that has been recorded for that day. Then only, would it represent meaningful data.
In the case of our daily published data, this number of tested figures is not revealed. This is indeed a major flaw in the presentation of the statistics.
Let us take a look at another hypothetical table, Table B, below.
Table B: A daily data set presented in whole numbers.
The illustration indicates that the number of people tested positive increased rapidly from 3,000 to 10,000 within three days.
On its own, these numbers are alarming. But when measured against the total number of people tested, in percentage format, it is not so shocking. In fact, it is fairly constant at about 10% for three days in a row.
When the data on the total number of people being tested is revealed, a different story appears. Perhaps it is best for this sort of data transparency to be adopted, if such data is available.
Biased data
It has also been mentioned that our authorities do not undertake random checks and testing. Therefore, a true picture or scientific analysis that could be based on a random sampling method cannot be applied here.
Perhaps also, the cost of having to do random sampling is too expensive for the country to bear.
Furthermore, it has been reported that the number of those who tested positive are people who were suspected to be exposed to certain clusters. They were therefore tested purely on the basis of contact or suspicion due to their association with certain clusters.
In that sort of scenario, positive cases are expected to be high. In statistical analysis, it is referred to as biased data.
In fact, the whole process of testing and the sharing of that data, captured in an identified zone, area or district, over a given sample is very biased indeed.
In this sort of scenario, perhaps it would have been better for the authorities to take immediate steps or action in that particular location, without getting other districts involved.
This is exactly a point of contention in the management and administration of the pandemic thus far.
The inability to contain a small area has led to a spread of the virus into a much larger area, a manifestation of the authorities’ level of competence in dealing with data and fast decision-making.
Another glaring issue resulting from the lack of understanding of these published data is the subsequent interpretation upon its publication or daily announcement.
Data from different areas or districts are lumped together into state-based data, thus we have Selangor, for example, reporting over 1,000 cases daily. This data obtained from a particular local area (and also the sample population) is not representative across the country, especially when extrapolated over those areas that are not experiencing any cluster at all.
There is a definite flaw in using such data to represent a district, state or geographical location that does not possess similar basic characteristics such as a cluster zone.
That data should only be used to measure a rate of infection for that particular location at that particular period of time.
It must not be used as a yardstick for other areas, even for areas that are adjacent to that particular site or location. That decision, in using it as the main sample for a particular district or state, is definitely biased.
To put it simply, Klang district data, for instance, should not be used to represent Selangor. Similarly, Selangor’s (or Perlis’) set of data is not representative of Malaysia’s data.
Vaccines
Issues with vaccines have been circulating all over the world. The first one is the issue of vaccinated people who can still contract Covid-19 and pass it on to others.
The second issue concerns the number of people dying or suffering from the adverse effects of vaccines. There are many reports on this issue which must not be ignored by our health authorities.
In fact, in the US, there is a Senate proposal to re-examine and reinvestigate events surrounding Covid-19 and vaccination issues. Similarly, debates on these issues are ongoing in the UK, France, Germany, Denmark and a few other democratic countries.
It would be useful for the authorities therefore, to catalogue this new data and share it with the public, similar to the data set on Covid-19. There have been many cases of people suffering adverse reactions and possibly dying due to them.
The authorities must come clean on this issue and provide the relevant data as and when it happens. This is one way to counter the negativity of vaccinations. The number could be small but it remains useful nevertheless.
To quote Aaron Levenstein, a business professor from the US: “Statistics are like a bikini. What they reveal is interesting, but what they hide is vital.” - FMT
The views expressed are those of the writer and do not necessarily reflect those of MMKtT.
✍ Credit given to the original owner of this post : ☕ Malaysians Must Know the TRUTH
🌐 Hit This Link To Find Out More On Their Articles...🏄🏻♀️ Enjoy Surfing!
Post a Comment