in insight, Uncategorized

Rethinking keyword clustering through a semantic and cognitive lens 

Keyword clustering remains an important task for search marketers and is key to driving organic traffic, improving rankings and enhancing the authority of the website on a given topic.

There are a handful of practical and great tools that exist in clustering keywords such as Keywordsinsights.io, SurferSEO, Inlinks and the incumbent greats such as Semrush and Ahref. For clustering people also asked questions the likes of AlsoAsked and AnswerThePublic will come in handy. The former clusters by PAA questions from SERP while the latter uses Google Autosuggests and has also added PAA to its collection.

These and a few more that haven’t been mentioned are conventional tools used for keyword clustering and have yielded ranking benefits and an inspiring source of content generation opportunities for most search marketers. But, I have a problem with the approaches and logic utilised by these tools. They almost pigeonhole human search behaviour into some lexical and linear route which ignores how searchers think, reason and act. They are more focused on clustering based on word cooccurrence and cosine similarity of the lexical nature of words but can often lack the semantic and sequential depth akin to human cognition and behaviour. 

Generally, marketers accept the logic from these tools and fail to understand that some of the reasoning behind the tooling might be outdated or too simplistic to mirror or provide sufficient context to the modern-day searcher. 

Looking at some of the comments on Reddit regarding the current keyword clustering tools and approaches, I shared a lot of sentiment with a comment from someone affiliated with Inlinks who said that verbal words or signals are used as a key part for clustering keywords and also challenged the existing intent classification buckets used by existing tools. 

I’ve long emphasised that keyword intent classification in the form of informational, navigational, transactional and commercial are overly simplistic and fail to capture the motive or motivation of the modern-day searcher, and Mike King in his insightful piece on how AI Mode works also shed some like on how this classification is also out-of-date and the original creator of this classification Andrei Broder, a distinguished scientist at Google added “hedonic” intents to the list and this further revelation from Mark Williams-Cook highlighted more actionable internal classifications currently used within Google such as Short_fact, Booleans, consequence, reason, instruction, comparison and other. 

Evaluating Keyword Clustering of People Also Asked Questions 

There are amazing keyword clustering tools related to People Also Asked Questions (PAA), like Answer The Public and AlsoAsked. 

I typed a seed query “student loan” and the results generated on AnswerThePublic is great and clearly indicate the modifiers used to segment the questions with investigative like why, which, will, who, what, can and so on. 

There is also a section for propositions such as without, near, with and a good number of other options. 

Looking at the output from Answer The Public, the output first generates and clusters questions from Google Autocomplete and uses question or interrogative words such as will, are, can, how, what, when what, when, which, who and why to cluster the 123 questions. The next stage of the output clusters Google PAA data and generates it in a tree format that is very similar to the AlsoAsked.com tool. The following feature focuses on propositions with 85 questions containing words such as without, can, for, with, is, near and to.  The next stage focuses on comparisons that utilise vs, and, like, or, and versus to generate 48 questions. The additional features are not aligned to questions or clustering as they are made up of alphabetical keywords related to student loans, search volume and related terms of over 1595 search terms. 

Overall, my assessment of AnswerThePublic is that it is a very good tool for gaining insights into questions people are asking either sourced via Google Autocomplete or PAA (People Also Asked Questions). The clustering from tools like AnswerThePublic is done on lexical and linguistic dimensions and those offer a good breath but there is an opportunity for a semantic and cognitive layer that is patterned to how people think, reason and make decisions. A paradigm shift from clustering words on a surface and lexical basis to capturing deep intent, mental models and expected user outcomes. 

In the book, “Women, Fire and Dangerous Things: What Categories Reveal about the Mind,” American cognitive linguist and philosopher George Lakoff went deep on embodied cognition and how it impacts how we categorise things. In the book, he referred to how categorisation at the basic level involves the use of cognitive dimensions such as form, function and motor activity or engagement. It dawned on me that this was a crucial element to how humans in general categorise or cluster things that further shape our thoughts and influence our interaction with the world. This was a lightbulb moment for me and instantly felt this could help the clustering of keywords and in this instance PAA or search-based questions more cognitively and semantically that are deep and uncover more insights than classical clustering methods. In the weeks to come, I will be delving more into clustering PAA in this manner and also building a simple app that can start bringing this process to life and be more actionable. This will allow for more inference, explainability and predictability of clusters. 

As a result, instead of solely clustering by how, why, versus e.t.c we can group queries by cognitive dimensions like:

  • Form (e.g., “What is a uk student loan?”),
  • Function (e.g., “Why do student loans affect credit score?”), and
  • Engagement (e.g., “How to apply for student loan forgiveness?”)