I was reminded that quite a few celebrities have made a foray into the art world last week when Brad Pitt and Nick Cave made headlines with their exhibition of sculptures at the Sara Hildén Art…
Here we are going to explore a case in which we have a website with relatively high bounce rate (meaning that high percent of the readers are leaving the site after visiting only single page).
There are several reasons why our readers may be leaving so early in their journey. Here we’re going to look into specific case where the reader leaves, not engaged with the context we are providing. Meaning that our proposals what to read next are not relative to the content which brought her in first place. Ideally, when a reader finishes reading an article, she is going to visit and read some of the related articles too.
Let’s imagine for a second that we have a blog post about time management. A reader which came to our site to read that post may be interested in reading other posts about time management and personal development in general, but on the other side if our context is offering posts about cooking, soccer or space her engagement is going to be smaller and this is going to increase our bounce rate. The assumption is that engaging context contains articles which share the same common traits with the current one. Articles on the same topic or speaking about the same events, persons and ideas.
Here i’m going to share my thoughts about building a system which autonomously to process the article, to pick the best keywords and to cross-reference it with similar articles. Using such system, we can engage the reader to explore our site further. We can measure how the system is doing, by following the bounce rate. If it is dropping, we are providing engaging context.
So we want to stop the reader from leaving after visiting only single page, so we need to make sure that the context we are provide is relative and useful. In order to do just that we need to find the best keywords in any given article and then use them to connect all articles in a meaningful networks.
The best keywords are the main words in a given article. The system is going to evaluate every word in the article and give it a score. Bad keywords are going to have a negative score and good keywords are going to have a positive score.
A author can analyse an article in her head and picks the keywords. The challenge we are going to tackle is how to subtract the human element and to make a machine pick the best keywords in every given article distinguishing between the good and the bad keywords.
Picking keywords is not easy task. Many sites currently don’t do it and just show the next/prev article or in other sites authors manually cross reference the articles. In the first approach the next/prev article can be about anything and is not going to engage with a reader which came to read specifically on single topic. In the second case there are several reasons because of which human input is not going to produce engaging enough content.
All of those factors are going to harm the reader’s experience because she is not going to be presented with the best content we have for her.
Even if we assume that non of the above is going to be true, still picking keywords takes time and the system may save that time for the author. Based on all of those i think that involving system to create the networks between the articles by picking up the best keywords is better for both the author and the reader.
The whole process have to start with a human feeding the system with a set of important keywords on which it to focus. Once the system has those keywords it is going to search for them and create networks based on them between the articles.
All articles about specific topic will be linked together and all the articles about specific person are going to be in the same network. Every article belongs to many networks, but not all networks are with the same relativity index. In some networks the connection is stronger because the keyword by which the connection is made is stronger for both articles. When one of the articles has a lower index of relativity to the keyword or both articles have small indexes there is still a connection but it is weaker. Also the more shared keywords the more strong the relation is. Articles with 12 shared keywords are much stronger related then articles with only 2 shared keywords.
Similar articles with stronger connections and many shared keywords are proposed on the first visit of the user and on the most visible places. Articles with weaker connections may be shown to the reader on subsequent visits and on less visible place.
With the knowledge what we want to achieve and the benefits of achieving it, now we need to explore the question how such system can be build.
There are many way how such system can be built. Here i’m going to explore how we can achieve with the help of a technique called TF*IDF.
First we need to prepare a list of important keywords on which we want the system to focus. Those keywords have to be aligned with our strategy for how we want the content to be cross-referenced. Let’s imagine that our system is providing sports related content. We want to link the content by the different sports, teams and players. This means that we need to provide to the system a list of keywords which includes the sports, teams and players names. Those keywords are going to change and new ones are going to be added over time, so we need to build that part of our system to allow those actions to be performed easy. We need a database where to store the keywords and UI to manage them.
Having the set of keywords in our system now we need to build the next part which is going to analyse all of the content and find those keywords within it. Here is where the TF*IDF technique comes into place.
Having those three parts (our content, the important keywords and analysis system) we now have to make them work together. We have to tell the analysis system which are our important keywords (show it how to get them from our database) and then to loop all of our content though it, so it can evaluate all words inside every content and find the best keywords which to compare to our list of important keywords. We need to store the found keywords for every article in our database with their TF*IDF score. Note that operation need to be performed every time new peace of content is added or when the set of important keywords change.
Now the actual showing of relevant context is going to be easy. Once a reader opens article on our site, we need to get it’s keywords and use them to search for other articles having the same keywords ordering them by their relevance score. Finding an article by many related keywords with high score, means that this article is strongly linked so we need to place it with higher priority (more visible) on our page.
I’m reframing from showing actual code or mentioning specific implementation because i want this article to be applicable in different programming contexts.
In this article we saw how with adding a little bit of automation to our system we can improve the overall user experience. There is a lot directions in which that system can be extended and improved. I think this is a good place to start if you want to better engage your readers. I hope that my thoughts on the subject can fire the spark in you to explore further.
API testing is an essential part of the software development process, as it ensures that the API functions correctly and meets the requirements set out in the design phase. In this guide, we will… Read more
You were hired as a SOC Analyst for one of the biggest Juice Shops in the world and an attacker has made their way into your network. An IT team has sent you a zip file containing logs from the… Read more
2023 may seem like a long way off, but it’s never too early to start planning how to get a higher GPA. With the right strategies and tactics, you can be well on your way to achieving your academic… Read more