top of page

What a Website Ranking Algorithm Taught Me About Risk Detection in Healthcare

In Today’s Read:

  • 🚨Disclaimer

  • 😵The Problem

  • 💡The Inspiration: PageRank Algorithm

  • 💊AI-Powered Disease Trajectory & Prediction Model:

    • Function 1: Transition Function

    • Function 2: Sampling Function

    • Function 3: Iteration Function

  • 🌀 Modelling Uncertainty: The Role of Randomness

  • 🪢Tying Everything Together

  • 🤔Lessons & Reflections

  • 🎯Limitations & Areas of Improvement


🚨Disclaimer


(I do not claim diagnostic accuracy or medical expertise through this model. This model is meant for demonstration purposes to simulate how a real patient trajectory model that meets regulatory and health standards could work.)



😵The Problem


Late diagnosis remains one of the greatest contributors to a cascade of health problems and increased mortality. Globally, 43% of adults have diabetes but are unaware they have it, and 49% of people living with hypertension are undiagnosed. Many health conditions tend to be intricately linked, increasing the risk of multiple organ failure when left untreated. About 31% of people with undiagnosed diabetes also have undiagnosed hypertension.


Most health-related deaths are the result of late-stage complications rather than the disease itself. Without an early warning model, the ‘first symptom’ for these patients becomes a permanent disability or death. Inspired by this problem, I created an AI-informed system that models disease progression and risk prioritization under uncertainty.



💡The Inspiration: PageRank Algorithm


PageRank is an algorithm developed by the co-founders of Google (including Larry Page, after whom the algorithm was named) to determine the importance of a webpage so that it appears first in a search result. It uses a ranking method to determine the hierarchy of webpages based on the number of links pointing to each page. The more links pointing to a webpage, the higher quality it is and the higher it appears in Google’s search results. This simple reasoning, however, exposes a weakness that could be exploited: one can inflate the importance of a webpage by creating many other pages that link to it, making it appear higher in search results even when it’s not an ‘important’ page. To avoid this, the PageRank algorithm marks a website as highly important if it is linked to by other important websites. The more important pages link to another page, the higher that page scores in ranking and visibility.



💊AI-Powered Disease Trajectory & Prediction Model


Drawing inspiration from the PageRank algorithm, I created a stochastic Markov Model that models disease progression and ranks the likelihood of disease occurrence. The model aims to facilitate early diagnosis, consequently averting preventable deaths and irreversible health outcomes. To achieve this, these three main functions were essential.


Function 1: Transition Function


This function predicts what happens next, given a current state of health. It answers the question: if the model is on symptom A, what is the health outcome a patient is likely have in the future? By analyzing conditions a symptom is commonly associated with, it defines how the system moves from one health signal to another. To make connections between a symptom and a health condition, the function follows known medical relationships. This results in an intelligent model that maps a patient’s health status over time.


This Markov Chain illustrates associations and transitions between health states.

Disease progression pathway
Disease progression pathway

Programmatically, the transition function looks like:

Transition function
Transition function

Function 2: Sampling Function


This function simulates biodata recordings monitored by a sensor. By analysing the entire health network, the function simulates many possible symptom progressions and observes what shows up most often. Each simulation represents how symptoms, conditions, and risks might unfold. The more simulations are run, the more patterns emerge, strengthening the ultimate health prediction. In my example, I ran the simulation 10,000 times (imagine this as the sensor taking 10,000 readings continuously), allowing the model to learn trends within the health network and reveal true clinical pathways as the most frequent states. It’s similar to how clinicians build intuition over time by seeing many cases. This algorithm uses the transition function to determine the next health state in a disease trajectory.


To account for sensor noise or an unobserved event (e.g., a patient went for a run and experienced a spike in their heart rate unrelated to their primary condition), I introduced a “Sudden health change” variable. Technically, it is referred to as a damping factor and is typically set to 0.85. It simply says, “85% of the time, the model follows the progression of a disease modeled in the health network. 15% of the time, a recorded symptom could be due to unforeseen factors outside the health network.” By using this variable, we capture the effect of unpredictable and acute events that tend to occur in the human body, making the model robust.


Programmatically, the sampling model is written as:

Sampling function
Sampling function

After running 10,000 samples, the model may predict an outcome like the one below.

Results from the sampling function
Results from the sampling function

Function 3: Iteration Function


This function is a recursive scoring model that runs prediction simulations repeatedly until the new probability score differs from the previous one by a small, predefined threshold, e.g., 0.001. It answers the question: given the current health state, what is the patient’s health destiny? It transforms the model from a simple map of health state transitions into a forecast of ultimate health results, not just the immediate outcome. Central to the iteration function is the concept of stabilization/convergence. Initially, we assume that all conditions in the health network have equal risk probability, as we are unsure which state is most likely. In this scenario, if there are n signals, each health state receives a risk score of 1/n. In each round of consequent iterations, each state’s risk is updated based on how many states point to it. The health signal (state) pointed at the most by other health signals is ranked the highest in that iteration. Early on, the risk values of states change drastically between iterations. The model is considered to have reached a point of convergence when the risk scores stop changing significantly between one iteration and the next.


Programmatically, this is written as:

Iteration function
Iteration function

Since convergence in this context represents the point where a patient’s health readings stop fluctuating, it can symbolize chronicity, stabilization of a health condition, or a steady-state diagnosis.



🌀Modeling Uncertainty: The Role of Randomness


Throughout, you may have noticed some themes of randomness. Why is this important? Randomness reflects the messy realities in healthcare, e.g., a sensor might lose its connection for an hour. In my model, random events are accounted for by a 15% probability. Mathematically, this is represented as the balance of the damping factor, 1-d, where d is 0.85. In each of the functions, randomness serves a different purpose; however, they collectively strengthen the system’s buffer for uncertainty. In the model, randomness represents:

  • Incomplete medical records.

  • Unseen variables from external factors not modeled in the algorithm, e.g., a new, unrelated diagnosis that can alter disease progression.

  • A boundary that marks the point at which the model’s job is done, and human intervention takes over. In technical terms, this is a sink, i.e., a state that has no outgoing links.


In the transition function, the sink is coded as:

A sink in the transition function
A sink in the transition function

In the iteration function, the sink is coded as:

A sink in the iteration function
A sink in the iteration function

While the sampling function represents observed events, as recorded by a sensor, the iteration function represents predicted results, as forecast by the AI model. The closer the results from both models are, the more accurate our model is.


In one scenario, disease rankings as predicted by the iteration function are:

Iteration function graph
Iteration function graph

Side by side, projections from the two models for the same scenario are close(as seen in the image below), justifying the model’s accuracy.

Comparison: Predicted vs Observed Disease Predictions
Comparison: Predicted vs Observed Disease Predictions

Based on the likelihood of a disease being high (more than 20%), moderate (between 10% and 20%), or low (less than 10%), the model provides a recommendation accordingly. For example, the model provides these recommendations from the probabilities represented in the graph above:

Clinical recommendations
Clinical recommendations

🤔Lessons & Reflections


  • Proactive healthcare beats reactive healthcare: applying the principles and logic of a search engine algorithm to develop an AI model that predicts health outcomes demonstrates the potential of future healthcare systems to be largely proactive rather than reactive.

  • Sinks prevent the algorithm from overdiagnosing: Sinks keep the model from crashing due to risk accumulation by equally redistributing probabilities across all states. Essentially, they reset a patient’s journey once they hit a terminal point.

  • Larry Page and co. unlocked premium levels of aura farming with the PageRank algorithm.


🎯Limitations & Areas of Improvement


Like the PageRank algorithm, my disease trajectory model heavily applies the Markov Model. Primarily, Markov Models are defined by this rule: the probability of the next state depends only on the current state, not the sequence of events that come before it. This introduces limitations in disease modeling.

  • Memorylessness: The Markov Model condenses the history of a disease into the present condition, ignoring past medical history. In reality, a symptom’s duration, order, frequency, and treatment histories matter. The Markov Model can capture none of these.

  • A patient can only be in one state at a time. In reality, people tend to have more than one symptom and diagnosis. The Markov Model becomes too complex when attempting to track multiple symptoms simultaneously.


Outside of the Markov Model, my model also has these loopholes:


  • Cold Starts: To begin calculating, my model requires a current state, which is chosen randomly by the algorithm. Realistically, if the first sensor reading, for example, is erroneous, the initial risk output will be dangerously wrong until enough samples are collected to bring the readings closer to accurate predictions.

  • Risk of False Positive Outcomes: A sensor could miss a key symptom, causing the model to converge on a ‘healthy prediction. If this happens, a patient might fail to get the help they need.


This is still an early exploration of an unfinished solution. However, it reinforced the importance of developing health-tech solutions that expose risks early and factor in environmental and biological uncertainties.


How would you improve this model?


Until next time, toodooloos!



 
 
 

Comments


bottom of page