
PATH launches landmark AI study in Africa exploring LLMs' potential in health diagnoses
PATH has launched the largest study of its kind in Africa, recruiting 9,000 participants to test whether artificial intelligence can help primary care clinicians make better diagnoses and treatment decisions in resource-limited settings.
The Seattle-based global health nonprofit is deploying large-language model (LLM) technology at clinics in Nairobi to analyze patient symptoms, health histories, provider notes and lab results, and then assist with diagnosis and treatment planning.
Bilal Mateen, PATH's first chief AI officer, is leading the organization's wide-ranging AI initiatives that include using tools to accelerate vaccine development and deploying chatbots to discuss sensitive health topics like HPV vaccination with teenage girls. Mateen is proceeding with both excitement and caution as he navigates what he calls 'potentially very risky technology' in vulnerable populations.
The medical AI-assistant study, conducted in partnership with the Kenya Paediatric Research Consortium, the University of Birmingham, and Nairobi clinic operator Penda Health, aims to provide the kind of rigorous evidence that has been missing from digital healthcare initiatives in low- and middle-income countries.
Bilal Mateen, chief of AI at PATH. (PATH Photo)
'This trial marks an important milestone for our health sector. AI has the potential to bridge health care gaps, particularly in underserved regions,' said Dr. Deborah Mlongo Barasa, Kenya's cabinet Secretary of Health, in a release announcing the study. 'We look forward to the insights it will generate to guide responsible and effective AI adoption.'
While organizations have pursued diagnostic assistance tools for years, most get stuck in pilot phases without proving their real-world value, Mateen said.
'Does this tool reduce the rate of treatment failures, people having to come back with unresolved symptoms, people being admitted to hospital as an emergency, people dying?' Mateen said. 'I don't know the answer yet.'
Results from the trial are expected by the end of the year.
PATH recently launched a second, smaller trial in Nigeria that features a toll-free hotline the provides responses to health inquiries using generative AI. The tool is called the Community Health Extension Worker Assistant (CHEWA) and is meant to serve healthcare workers who don't have access to the internet.
The study will run until providers log 3,000 patient encounters. The work is being done in partnership with Viamo, a Canadian social enterprise.
GeekWire recently spoke to Mateen about PATH's broader AI efforts. Here are some highlights.
Challenging misconceptions
Mateen calls out two misconceptions about AI and healthcare.
While AI could boost the efficiency and effectiveness of providers by using diagnostic assistants, that doesn't always equal to lower healthcare costs. Better-performance could identify more healthcare needs for lab tests, treatments, etc.
Though LLMs are typically trained on information from higher-income nations, AI tools don't necessarily need to be customized for local communities, depending on the use case. A patient with high blood pressure readings, for example, points to hypertension no matter where they live.
Faster, cheaper breakthroughs
PATH is testing whether Google's AI co-scientist can identify correlations in immune response and vaccine effectiveness that normally require multimillion-dollar trials to prove, potentially shortcutting research for new vaccines.
The nonprofit is also using AI to search scientific literature for 'unicorn biomarkers' — rare biological signals that could help fight deadly diseases including rotavirus, gastroenteritis and Respiratory Syncytial Virus.
AI on touchy subjects
Mateen is interested in chatbots taking the lead in uncomfortable conversations about sensitive issues such as vaccinating against human papilloma virus, for example, which is sexually transmitted and can cause cervical cancer.
It can be awkward discuss these serious issues with teenage girls and in some countries these topics are strictly taboo, said Mateen. 'We've discovered it's much easier to get that 14-year-old to speak to an empathetic chatbot, than it is a teacher or some other authority figure in their lives.'
Supporting regulation creation
PATH is hoping to land a grant to support the establishment of healthcare related AI-regulations in low- and middle-income countries.
AI-based technology poses potentially heightened risks for these populations, Mateen said, given their limited access to healthcare, minimal regulatory oversight in this area, and lack of recourse if the AI goes awry.
PATH has spent decades helping these nations strengthen regulations for vaccines, drugs and diagnostic testing, he said. 'As much as we want to be the pioneers delivering the next thing, we also recognize a responsibility for us to make sure that there is a mechanism by which us and others are held to account.'
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
2 hours ago
- Yahoo
Oncologist hires may 'finish the battle for radiation' in Corner Brook, says advocate
Cancer patients on the west coast of Newfoundland will soon be able to receive radiation therapy closer to home. On Friday the Newfoundland and Labrador government announced two radiation oncologists have been recruited to work at the Western Memorial Regional Hospital in Corner Brook. Gerald Parsons, who has been fighting for better cancer care on the west coast for decades as co-chair of the Western Regional Hospital Action Committee, said he'll be pleased as soon as the first oncologist starts work. "Hopefully this will finish the battle for radiation," he told CBC Radio's Newfoundland Morning. Years ago he had no choice but to take his wife to St. John's for cancer treatment and said radiation services in Corner Brook would have made a world of difference. WATCH | Radiation in Corner Brook means west coast families can stay close to home: "They say one of the biggest healing processes with cancer is having family around you," he said. "We spent five weeks in St. John's at a hotel." For Parsons, radiation services on the west coast has been a long time coming. "It's been a year since [the Western Memorial Regional Hospital] opened up and the service is still not there. All the equipment is here," said Parsons. In Friday's statement, the Newfoundland and Labrador Health Services said the two oncologists will start work in the summer and fall, and radiation oncology services will be rolled out in three phases. The ongoing first phase provides CT simulations to eligible patients for radiation planning purposes. Phase two is hoped to begin in late August and include consultations and follow-up appointments in Corner Brook. The third and final phase of the radiation service roll-out "includes radiation services beginning and continuing to scale up as recruitment and onboarding of specialized staff progresses," said the statement. Parsons said the announcement is a step closer to his ultimate goal of better health-care services in western Newfoundland. Download our free CBC News app to sign up for push alerts for CBC Newfoundland and Labrador. Sign up for our daily headlines newsletter here. Click here to visit our landing page.
Yahoo
2 hours ago
- Yahoo
Studying a galaxy far, far away could become easier with help from AI, says researcher
A recent Memorial University of Newfoundland graduate says his research may help study galaxies more efficiently — with help from Artificial Intelligence. As part of Youssef Zaazou's master's of science, he developed an AI-based image-processing technique that generates predictions of what certain galaxies may look like in a given wavelength of light. "Think of it as translating galaxy images across different wavelengths of light," Zaazou told CBC News over email. He did this by researching past methods for similar tasks, adapting current AI tools for his specific purposes, finding and curating the right dataset to train the models, along with plenty of trial and error. "Instead of … having to look at an entire region of sky, we can get predictions for certain regions and figure out, 'Oh this might be interesting to look at,'" said Zaazou. "So we can then prioritize how we use our telescope resources." Zaazou recently teamed up with his supervisors Terrence Tricco and Alex Bihlo to co-author a paper on his research in The Astrophysical Journal, which is published by The American Astronomical Society. Tricco says this research could also help justify allocation of high-demand telescopes like the Hubble Space Telescope, which has a competitive process to assign its use. A future for AI in astronomy Both Tricco and Zaazou emphasised the research does not use AI to replace current methods but to augment them. Tricco says that Zaazou's findings have the potential to help guide future telescope development, and predict what astronomers might expect to see, making for more efficient exploration. Calling The Astrophysical Journal the "gold standard" for astronomy journals in the world, Tricco hopes the wider astronomical community will take notice of Zaazou's findings. "We want to have them be aware of this because as I was mentioning, AI, machine learning, and physics, astronomy, it's still very new for physicists and for astronomers, and they're a little bit hesitant about these tools," said Tricco. Tricco praised the growing presence of space research in general at Memorial University. "We are here, we're doing great research," he said. He added growing AI expertise is also transferable to other disciplines. "I think that builds into our just tech ecosystem here as well." 'Only the beginning' Though Zaazou's time as a Memorial University student is over, he hopes to see research in this area continue to grow. "I'm hoping this is the beginning of further research to be done," he said. Though Zaazou described his contribution to the field as merely a "pebble," he's happy to have been able to do his part. "I'm an astronomer. And it just feels great to be able to say that and to be able to have that little contribution because I just love the field and I'm fascinated by everything out there," said Zaazou. Download our free CBC News app to sign up for push alerts for CBC Newfoundland and Labrador. Sign up for our daily headlines newsletter here. Click here to visit our landing page.


Forbes
4 hours ago
- Forbes
The Number Of Questions That AGI And AI Superintelligence Need To Answer For Proof Of Intelligence
How many questions will we need to ask AI to ascertain that we've reached AGI and ASI? In today's column, I explore an intriguing and unresolved AI topic that hasn't received much attention but certainly deserves considerable deliberation. The issue is this. How many questions should we be prepared to ask AI to ascertain whether AI has reached the vaunted level of artificial general intelligence (AGI) and perhaps even attained artificial superintelligence (ASI)? This is more than merely an academic philosophical concern. At some point, we should be ready to agree whether the advent of ASI and ASI have been reached. The likely way to do so entails asking questions of AI and then gauging the intellectual acumen expressed by the AI-generated answers. So, how many questions will we need to ask? Let's talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here). Heading Toward AGI And ASI First, some fundamentals are required to set the stage for this weighty discussion. There is a great deal of research going on to further advance AI. The general goal is to either reach artificial general intelligence (AGI) or maybe even the outstretched possibility of achieving artificial superintelligence (ASI). AGI is AI that is considered on par with human intellect and can seemingly match our intelligence. ASI is AI that has gone beyond human intellect and would be superior in many if not all feasible ways. The idea is that ASI would be able to run circles around humans by outthinking us at every turn. For more details on the nature of conventional AI versus AGI and ASI, see my analysis at the link here. We have not yet attained AGI. In fact, it is unknown whether we will reach AGI, or that maybe AGI will be achievable in decades or perhaps centuries from now. The AGI attainment dates that are floating around are wildly varying and wildly unsubstantiated by any credible evidence or ironclad logic. ASI is even more beyond the pale when it comes to where we are currently with conventional AI. About Testing For Pinnacle AI Part of the difficulty facing humanity is that we don't have a surefire test to ascertain whether we have reached AGI and ASI. Some people proclaim rather loftily that we'll just know it when we see it. In other words, it's one of those fuzzy aspects and belies any kind of systematic assessment. An overall feeling or intuitive sense on our part will lead us to decide that pinnacle AI has been achieved. Period, end of story. But that can't be the end of the story since we ought to have a more mindful way of determining whether pinnacle AI has been attained. If the only means consists of a Gestalt-like emotional reaction, there is going to be a whole lot of confusion that will arise. You will get lots of people declaring that pinnacle AI exists, while lots of other people will insist that the declaration is utterly premature. Immense disagreement will be afoot. See my analysis of people who are already falsely believing that they have witnessed pinnacle AI, such as AGI and ASI, as discussed at the link here. Some form of bona fide assessment or test that formalizes the matter is sorely needed. I've extensively discussed and analyzed a well-known AI-insider test known as the Turing Test, see the link here. The Turing Test is named after the famous mathematician and early computer scientist Alan Turing. In brief, the idea is to ask questions of AI, and if you cannot distinguish the responses from those of what a human would say, you might declare that the AI exhibits intelligence on par with humans. Turing Test Falsely Maligned Be cautious if you ask an AI techie what they think of the Turing Test. You will get quite an earful. It won't be pleasant. Some believe that the Turing Test is a waste of time. They will argue that it doesn't work suitably and is outdated. We've supposedly gone far past its usefulness. You see, it was a test devised in 1949 by Alan Turing. That's over 75 years ago. Nothing from that long ago can apparently be applicable in our modern era of AI. Others will haughtily tell you that the Turing Test has already been successfully passed. In other words, the Turing Test has been purportedly passed by existing AI. Lots of banner headlines say so. Thus, the Turing Test isn't of much utility since we know that we don't yet have pinnacle AI, but the Turing Test seems to say that we do. I've repeatedly tried to set the record straight on this matter. The real story is that the Turing Test has been improperly applied. Those who claim the Turing Test has been passed are playing fast and loose with the famous testing method. Flaunting The Turing Test Part of the loophole in the Turing Test is that the number of questions and type of questions are unspecified. It is up to the person or team that is opting to lean into the Turing Test to decide those crucial facets. This causes unfortunate trouble and problematic results. Suppose that I decide to perform a Turing Test on ChatGPT, the immensely popular generative AI and large language model (LLM) that 400 million people are using weekly. I will seek to come up with questions that I can ask ChatGPT. I will also ask the same questions of my closest friend to see what answers they give. If I am unable to differentiate the answers from my human friend versus ChatGPT, I shall summarily and loudly declare that ChatGPT has passed the Turing Test. The idea is that the generative AI has successfully mimicked human intellect to the degree that the human-provided answers and the AI-provided answers were essentially the same. After coming up with fifty questions, some that were easy and some that were hard, I proceeded with my administration of the Turing Test. ChatGPT answered each question, and so did my friend. The answers by the AI and the answers by my friend were pretty much indistinguishable from each other. Voila, I can start telling the world that ChatGPT has passed the Turing Test. It only took me about an hour in total to figure that out. I spent half the time coming up with the questions, and half of the time getting the respective answers. Easy-peasy. The Number Of Questions Here's a thought for you to ponder. Do you believe that asking fifty questions is sufficient to determine whether intellectual acumen exists? That somehow doesn't seem sufficient. This is especially the case if we define AGI as a form of AI that is going to be intellectually on par with the entire range and depth of human intellect. Turns out that the questions I came up with for my run of the Turing Test didn't include anything about chemistry, biology, and many other disciplines or domains. Why didn't I include those realms? Well, I had chosen to compose just fifty questions. You cannot ask any semblance of depth and breadth across all human knowledge in a mere fifty questions. Sure, you could cheat and ask a question that implores the person or the AI to rattle off everything they know. In that case, presumably, at some point, the 'answer' would include chemistry, biology, etc. That's not a viable approach, as I discuss at the link here, so let's put aside the broad strokes questions and aim for specific questions rather than smarmy catch-all questions. How Many Questions Is Enough I trust that you are willing to concede that the number of questions is important when performing a test that tries to ascertain intellectual capabilities. Let's try to come up with a number that makes some sense. We can start with the number zero. Some believe that we shouldn't have to ask even one question. The AI has the onus to convince us that it has attained AGI or ASI. Therefore, we can merely sit back and see what the AI says to us. We either are ultimately convinced by the smooth talking, or we aren't. A big problem with the zero approach is that the AI could prattle endlessly and might simply be doing a dump of everything it has patterned on. The beauty of asking questions is that you get an opportunity to jump around and potentially find blank spots. If the AI is only spouting whatever it has to say, the wool could readily be pulled over your eyes. I suggest that we agree to use a non-zero count. We ought to ask at least one question. The difficulty with being constrained to one question is that we are back to the conundrum of either missing the boat and only hitting one particular nugget, or we are going to ask for the entire kitchen sink in an overly broad manner. None of those are satisfying. Okay, we must ask at least two or more questions. I dare say that two doesn't seem high enough. Does ten seem like enough questions? Probably not. What about one hundred questions? Still doesn't seem sufficient. A thousand questions? Ten thousand questions? One hundred thousand questions? It's hard to judge where the right number might be. Maybe we can noodle on the topic and figure out a ballpark estimate that makes reasonable sense. Let's do that. Recent Tests Of Top AI You might know that every time one of the top AI makers comes out with a new version of their generative AI, they run a bunch of various AI assessment tests to try and gleefully showcase how much better their AI is than other competing LLMs. For example, Grok 4 by Elon Musk's xAI was recently released, and xAI and others used many of the specialized tests that have become relatively popular to see how well Grok 4 compares. Tests included the (a) Humanity's Last Exam or HLE, (b) ARC-AGI-2, (c) GPQA, (d) USAMO 2025, (e) AIME 2025, (f) LiveCodeBench, (g) SWE-Bench, and other such tests. Some of those tests have to do with the AI being able to generate program code (e.g., LiveCodeBench, SWE-Bench). Some of the tests are about being able to solve math problems (e.g., USAMO, AIME). The GPQA test is science-oriented. Do you know how many questions are in the GPQA testing set? There is a total of 546 questions, consisting of 448 questions in the Main Set and another 198 questions in the harder Diamond Set. If you are interested in the nature of the questions in GPQA, visit the GPQA GitHub site, plus you might find of interest the initial paper entitled 'GPQA: A Graduate-Level Google-Proof Q&A Benchmark' by David Rein et al, arXiv, November 20, 2023. Per that paper: 'We present GPQA, a challenging dataset of 448 multiple choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are 'Google-proof').' Please be aware that you are likely to hear some eyebrow-raising claims that a generative AI is better than PhD-level graduate students across all domains because of particular scores on the GPQA test. It's a breathtakingly sweeping statement and misleadingly portrays the actual testing that is normally taking place. In short, any such proclamation should be taken with a humongous grain of salt. Ballparking The Questions Count Suppose we come up with our own handy-dandy test that has PhD-level questions. The test will have 600 questions in total. We will craft 600 questions pertaining to 6 domains, evenly so, and we'll go with the six domains of (1) physics, (2) chemistry, (3) biology, (4) geology, (5) astronomy, and (6) oceanography. That means we are going to have 100 questions in each discipline. For example, there will be 100 questions about physics. Are you comfortable that by asking a human being a set of 100 questions about physics that we will be able to ascertain the entire range and depth of their full knowledge and intellectual prowess in physics? I doubt it. You will certainly be able to gauge a semblance of their physics understanding. The odds are that with just 100 questions, you are only sampling their knowledge. Is that a large enough sampling, or should we be asking even more questions? Another consideration is that we are only asking questions regarding 6 domains. What about all the other domains? We haven't included any questions on meteorology, anthropology, economics, political science, archaeology, history, law, linguistics, etc. If we want to assess an AI such as the hoped-for AGI, we presumably need to cover every possible domain. We also need to have a sufficiently high count of questions per domain so that we are comfortable that our sampling is going deep and wide. Devising A Straw Man Count Go with me on a journey to come up with a straw man count. Our goal will be an order-of-magnitude estimate, rather than an exact number per se. We want to have a ballpark, so we'll know what the range of the ballpark is. We will begin the adventure by noting that the U.S. Library of Congress has an extensive set of subject headings, commonly known as the LCSH (Library of Congress Subject Headings). The LCSH was started in 1897 and has been updated and maintained since then. The LCSH is generally considered the most widely used subject vocabulary in the world. As an aside, some people favor the LCSH and some do not. There are heated debates about whether certain subject headings are warranted. There are acrimonious debates concerning the wording of some of the subject headings. On and on the discourse goes. I'm not going to wade into that quagmire here. The count of the LCSH as of April 2025 was 388,594 records in size. I am going to round that number to 400,000, for the sake of this ballpark discussion. We can quibble about that, along with quibbling whether all those subject headings are distinctive and usable, but I'm not taking that route for now. Suppose we came up with one question for each of the LCSH subject headings, such that whatever that domain or discipline consists of, we are going to ask one question about it. We would then have 400,000 questions ready to be asked. One question per realm doesn't seem sufficient. Consider these possibilities: If we pick the selection of having 10,000 questions per the LCSHs, we will need to come up with 4 billion questions. That's a lot of questions. But maybe only asking 10,000 questions isn't sufficient for each realm. We might go with 100,000 questions, which then brings the grand total to 40 billion questions. Gauging AGI Via Questions Does asking a potential AGI a billion or many billions of questions, i.e., 4B to 40B, that are equally varied across all 'known' domains, seem to be a sufficient range and depth of testing? Some critics will say that it is hogwash. You don't need to ask that many questions. It is vast overkill. You can use a much smaller number. If so, what's that number? And what is the justification for that proposed count? Would the number be on the order of many thousands or millions, if not in the billions? And don't try to duck the matter by saying that the count is somehow amorphous or altogether indeterminate. In the straw man case of billions, skeptics will say that you cannot possibly come up with a billion or more questions. It is logistically infeasible. Even if you could, you would never be able to assess the answers given to those questions. It would take forever to go through those billions of answers. And you need experts across all areas of human knowledge to judge whether the answers were right or wrong. A counterargument is that we could potentially use AI, an AI other than the being tested AGI, to aid in the endeavor. That too has upsides and downsides. I'll be covering that consideration in an upcoming post. Be on the watch. There are certainly a lot of issues to be considered and dealt with. The extraordinarily serious matter at hand is worthy of addressing these facets. Remember, we are focusing on how we will know that we've reached AGI. That's a monumental question. We should be prepared to ask enough questions that we can collectively and reasonably conclude that AGI has been attained. As Albert Einstein aptly put it: 'Learn from yesterday, live for today, hope for tomorrow. The important thing is not to stop questioning.'