Quantitative Content Analysis (QCA) is a research method in which characteristics of textual, visual, or auditory material are systematically classified and recorded for analysis. Widely used in the field of communication, it is also useful in other fields. Central to content analysis is the coding process, which consists of following a series of instructions about what features to look for in a text and then making the designated annotation when that feature appears.
To carry out content analysis successfully, it is necessary to pay close attention to unitization (segmentation of texts for analysis), sampling (selection of an appropriate collection of units for analysis), reliability (that different researchers make the codes consistently) and validity (use a coding scheme that adequately represents the specified phenomena).
Background
Content analysis stems from work by theorist Alfred Lindesmith, who devised a means of disproving a hypothesis known as the “Constant Comparative Method of Qualitative Analysis,” in 1931. Quantitative analysis built on these qualitative research tools and applied techniques more rigorous statistical and scientific Dr. Klaus Krippendorff created a series of six questions, based on Lindesmith’s work, that should be considered in any content analysis:
What data is analyzed?
How are they defined?
What is the population from which they are drawn?
What is the context in which the data is analyzed?
What are the limits of the analysis?
What is the purpose of inferences?
Analyzing this data, it is assumed that the words and phrases that are mentioned most frequently are those that reflect important concerns in all communication. Therefore, quantitative content analysis begins with word frequencies, space measures (column inches in the case of newspapers), time counts (for radio and television time), and keyword frequencies. However, content analysis goes far beyond simple word counts. For example, with the «Keyword In Context» routines, words can be analyzed in their specific context to disambiguate them.
These forms of quantitative analysis have been used to study social media, corporate communications, website visits, elections, etc. With the exponential increase in both the amount of data available and the capabilities of computers, quantitative research is used in an increasing number of fields.
Methodology
Quantitative analysis requires formal properties such as word frequencies, space measurements, time counts, hashtags, number of people tagged in an image, number of friends, or “liked” pages. The analysis objects can range from traditional textual content (messages, bibliometrics, citation analysis/indexing, web pages, trending topics on twitter), to any media object with specified formal properties or metadata (video, photographs, telephone conversations). As a result, at least three important distinctions emerge with respect to qualitative content analysis.
First, unlike qualitative analysis, quantitative (or computer-based and automated) analysis is more suited to closed investigations and often results in pop-up categories rather than manually assigned categories, which also makes this type of analysis is useful for deriving likely predictions about the future.
Second, by focusing only on formal properties, quantitative content analysis is usually applied to manifest content (literal content) and not to its latent meaning (implicit content).
Finally, a distinction is made between prescriptive analysis (which has a set of narrowly defined specific parameters) and open analysis (which can be applied to many moments of the texts and contents and in which the dominant messages in the analysis are identified). Furthermore, since the researcher often needs instruments to measure and count (for example, a computer), the reliability (all investigations will yield the same results) and validity (measures what it is supposed to measure) of the devices and techniques (for example, your software) should always be the subject of reflection as part of the investigation.
Components of Quantitative Content Analysis
In his seminal work Content Analysis: An Introduction to Its Methodology, Krippendorff (2004) presents an outline of the components of content analysis and identifies its main principles. Performing quantitative content analysis involves content analysis design, definition of units, sampling, recording, and data coding and language.
Design
As a starting point, the researcher should design the analysis based on existing theoretical frameworks and experiences relevant to the research question. This essential phase aims to plan each step of the process to produce a robust answer to the initial research question. This phase usually involves the development of hypotheses to which the results of the analysis can be tested or related. The design and the research question will then guide the other six components of the actual content analysis.
Unification
Once the researcher has mapped out each step of the process, the data needed for analysis can be created (eg, collected, captured, or produced). In general, units are sets that analysts distinguish and treat as independent elements. The unitization frames the object of analysis to be able to process the discrete elements.
Although text is not quantitative, its number of characters, lines of text, words, or pages can be counted, measured, compared, and visualized. To do this, there is a wide variety of ways to define the units. In addition, different types of units are used for sampling, recording, and coding, and to provide context there are ways to define the units to increase the productivity, efficiency, and reliability of content analysis.
Sampling
Sampling reduces the set of available analysis objects to a manageable and representative corpus of the set. As such, it is about constructing the sample size as much as it is about finding suitable sampling techniques. Different objects of analysis may require different sampling techniques applicable to that object (for example, text-based, such as web pages or transcripts, or visual, such as photographs or movies). Statistical sampling theory offers so-called probability sampling techniques, which are designed to ensure that all sampling units have an equal chance of being included in the sample.
The techniques that he identifies as applicable to texts are random sampling, systematic sampling, stratified sampling, variable probability sampling, cluster sampling, snowball sampling, relevance sampling, census, and random sampling. of convenience. The last one on this list is at odds with some key features of statistical sampling theory, namely that it is motivated by the available corpus even though it is known to be incomplete. This is a question that is often raised in relation to contemporary analyzes of Big Data.
encoding/recording
Registration and coding are procedures that aim to capture the object of study in such a way that it is possible to search for patterns in it again. Research should be recorded in a way that is durable and capable of withstanding recurrent examination. Recording or coding thus encodes the investigation in a certain way, so that other investigators can reliably execute the same process and arrive at the same results.
Four types of required logging instructions are recommended: the qualifications coders should have, the training coders should receive to prepare for the logging task, the syntax and semantics of the data language, including preferably the cognitive procedures that encoders must apply to record texts and images efficiently and reliably; and finally, the nature and management of the records to be produced.
In addition, it is important to facilitate the interpretation of research results by ensuring access to their meanings. Data languages also play an important role in this component of the analysis. Data languages are descriptive devices and for content analysts, starting from textual material, images, verbal exchanges, transmissions and records of observed phenomena, a data language describes how all categories, variables, annotations, formal transcripts and stories computer-readable are put together to form a system.
As descriptive devices, they treat different types of variables differently, such as binary variables, categorical variables, ordinal metrics, interval metrics, ratio metrics.
Reduction
To cognitively access the meaning of large quantitative content analyses, reduction techniques are often needed. They are usually computational or automatic techniques to summarize the body of the recorded text and the justifications for these techniques in relation to what is known about the context of the texts. For example, statistical visualizations are often used as a simplifying function to create such a summary (for example, to show the correlation between two variables). Typically, statistics are represented as relational tables. These tables can be viewed in many different ways, allowing you to get different perspectives on the same set of data (in this sense, it’s reductive).
Infer
Rely on analytical constructs or models of the chosen context to abductively infer contextual phenomena. That is, draw conclusions about specific phenomena with only statistical or probable certainty. For example, by extrapolating the trajectory of a specific variable over a given time, the future evolution of that variable can be concluded with statistical certainty. Other inferences may require relating the results of the analysis to other data sets. For example, a comparative analysis could be made between different sets of content. In such a case, content analysis becomes part of a larger research effort.
Narration
As the final component of the quantitative content analysis process, the narrative involves answering the initial research question that guided the investigation. The narrative builds on the narrative traditions or discursive conventions established within the discipline of the content analyst and does so so that the results are understandable and accessible to others. This may involve arguing about practical importance, reflecting on the appropriateness of the methods used, arguing about the importance of the results, or making recommendations for future research. Contemporary quantitative content analyzes sometimes omit this latter component, if published directly in a visual format such as a news graphic.
Purpose
Five main purposes are suggested for quantitative content analysis: to describe the background characteristics of the message content, to describe the form characteristics of the message content, to make inferences to the producers of the content, to make inferences to the audiences of the content, and finally, predict the effects of content on audiences.
In this regard, fifteen different uses have been grouped into three categories: first, to make inferences about the background of the communications, second, to describe and make inferences about the characteristics of the communications, and third, to make inferences about the consequences of the communications.
Example of Quantitative Content Analysis
In our society, terms like dying, dying and death are still taboo. Society refers to passing away, going to a better place, etc. Acknowledging the problem of the lack of use of explicit terms could hinder effective communication between physicians and patients.
Researcher Z wants to know how often healthcare professionals, patients, or family members use explicit terms versus euphemisms. In the same way, she intends to determine:
In what circumstances are these explicit terms used?
How are the terms dying, dying, and death used when discussing palliative care, and what alternative terms are used?
Researcher Z develops a sampling plan to maximize the diversity of the sample around demographic characteristics.
Two types of communication events were sampled with patients who had received a terminal diagnosis:
(A) One was discharge teaching for inpatients who were to be transferred to hospice at home.
(B) The other communication event was doctor-patient-family conferences in outpatient or inpatient settings.
The data was extracted from the recordings:
Data analysis began with a computer-assisted search for the terms “die”, “death”, and “die” in the transcripts. The word frequency of each of the three death-related terms in a transcript was calculated and compared to the total duration of the communicative event.
Subsequently, alternative terms or expressions are identified instead of death, die or die.
Finally, frequencies of euphemisms versus direct terms were compared for speaker type, clinician demographics, and patient demographics within each act of communication and across the sample.
Tools Used in Quantitative Content Analysis
In quantitative research, it is common to use graphs, tables, charts, and other non-textual elements to help the reader understand the data. There are many different tools and software packages used to organize raw data and help show correlations between variables in the data. Statistics are usually represented in the form of relational tables. These tables can be displayed in many different ways (graphs and charts), allowing you to get different perspectives on the same set of data.
The following computer programs allow you to use statistical methods to organize and examine quantitative data. The software discussed below is commonly used to manage raw quantitative data, to analyze survey results, historical data and content analysis. They can also be used to forecast or determine the probability of a particular event.
STATA
Stata is a widely used statistical software package for managing, analyzing, and graphing data. It runs on a variety of platforms, including Windows, Mac, and Unix. Sata can be used both with a graphical user interface (through the use of associated menus and dialogs) and through command line syntax for more powerful and complex operations. With Sata you can generate graphics that can be exported to EPS or TIF for publication, to PNG for the web, or to PDF for viewing. With the use of a script it is also possible to automatically produce graphs in a reproducible manner.
SPSS
SPSS is one of the most popular quantitative analysis programs, especially among social science researchers. With SPSS you can perform many data management and statistical analysis tasks. SPSS can take data from almost any type of file and use it to generate tabular reports, graphs, distribution plots, and descriptive statistics. Or to perform complex statistical analysis. SPSS is relatively easy to use and allows the incorporation of additional modules. For this reason, its use is widespread among market researchers, health researchers, survey companies and academic researchers.
SPSS offers a user interface that makes statistical analysis more intuitive for all levels of users. Simple menus and dialog box selections allow you to perform complex analysis without using command syntax. The latter can be used for more specialized statistical procedures. IBM SPSS is available for platforms: Windows, Macintosh, and UNIX systems.
SAS
SAS Analytics provides an integrated environment for quantitative analysis, which means that the software is based on different modules that can be added according to your own preferences. The software enables predictive analytics, data mining, text mining, forecasting, and many different graphical visualizations. SAS helps you organize raw quantitative data and offers a wide range of techniques and processes for data collection, classification, and analysis. The software mainly uses a command syntax with built-in programs for the most common tasks. An additional module (SAS/ASSIST) can be installed for a task-oriented visual interface. Since SAS works with an intuitive command syntax,
SAS is available for Windows and Linux, but they only offer licenses to students through universities that have already purchased SAS.
Tableau Desktop
Tableau Desktop is the most visually appealing statistical analysis tool. It allows you to interact with data through an easy-to-use drag and drop system. You can connect to data in a few clicks and then visualize it by selecting and adjusting one of the preset interactive dashboards. The full version of Tableau Desktop allows you to work directly from a database. You can manage your own data connections and metadata, without disturbing the original database. This makes the software easy to use. However, it’s difficult or even impossible to customize analytics exactly the way you want them to be.
The tool is only available for Windows. It offers an online environment with all the basics that works with all modern Internet browsers (thus also on Windows, Mac and Unix systems).
R-Analysis
R Analysis is an open source GNU project for statistical computing and graphics. It makes data manipulation, calculation, and graphical display easy with the use of easy-to-learn command syntax. The fact that R uses its own open source command syntax makes it adaptable to all kinds of quantitative analysis. However, the lack of a graphical user interface makes R difficult for the novice user to use.
Disadvantages of Quantitative Content Analysis
Validity and Reliability
Most of the criticisms of Quantitative Content Analysis focus on the problems of validity and reliability of the method. However, content analysis by counting specific keywords leads to a very reliable and consistent result. When keywords have been used a certain number of times, any purely quantitative statements that can be drawn from those calculations are easily traceable.
The problem arises when the meaning of the quantitative results is inferred. Themes may be under-represented in a specific keyword selection and sentiments tied to keywords may be overlooked. This compromises the validity of the method.
Image Content
When the content of images is investigated using quantitative content analysis, a different problem arises. In order to quantify images you have to look at the metadata rather than the images themselves to create meaning. Where was an image taken, by whom was it taken, what tags does it have, and who liked the image? Even when looking at the image, it will have to be reduced to a set of calculable variables, such as hue, saturation, grain, etc. It is useful to think about what the real object of study is and whether it is possible to get close enough to it through qualitative content analysis.
Software
In ACC when large amounts of data are used, the researcher relies more and more on the programs that handle the data. One problem here is the lack of understanding of such programs. It usually takes a bit of investment to understand how to get the program to process the data you give it in the way you want.
Finally, when research is carried out with data collected from users of social networks, the problem of privacy arises. Many users are not fully aware of what they have signed up for when creating an account on Facebook, Twitter or Instagram. They may not be aware that the content they produce is to some extent available to researchers.
These are all issues that a researcher must take into account when choosing methods and evaluating or interpreting the results. It is useful to combine an ACC with other techniques to cover the criticisms that would arise from using this method alone.